Re: [RFC PATCH 8/8] powerpc/papr_scm: Use FORM2 associativity details

2021-06-14 Thread Aneesh Kumar K.V
David Gibson  writes:

> On Mon, Jun 14, 2021 at 10:10:03PM +0530, Aneesh Kumar K.V wrote:
>> FORM2 introduce a concept of secondary domain which is identical to the
>> conceept of FORM1 primary domain. Use secondary domain as the numa node
>> when using persistent memory device. For DAX kmem use the logical domain
>> id introduced in FORM2. This new numa node
>> 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  arch/powerpc/mm/numa.c| 28 +++
>>  arch/powerpc/platforms/pseries/papr_scm.c | 26 +
>>  arch/powerpc/platforms/pseries/pseries.h  |  1 +
>>  3 files changed, 45 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 86cd2af014f7..b9ac6d02e944 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -265,6 +265,34 @@ static int associativity_to_nid(const __be32 
>> *associativity)
>>  return nid;
>>  }
>>  
>> +int get_primary_and_secondary_domain(struct device_node *node, int 
>> *primary, int *secondary)
>> +{
>> +int secondary_index;
>> +const __be32 *associativity;
>> +
>> +if (!numa_enabled) {
>> +*primary = NUMA_NO_NODE;
>> +*secondary = NUMA_NO_NODE;
>> +return 0;
>> +}
>> +
>> +associativity = of_get_associativity(node);
>> +if (!associativity)
>> +return -ENODEV;
>> +
>> +if (of_read_number(associativity, 1) >= primary_domain_index) {
>> +*primary = of_read_number([primary_domain_index], 
>> 1);
>> +secondary_index = of_read_number(_ref_points[1], 1);
>
> Secondary ID is always the second reference point, but primary depends
> on the length of resources?  That seems very weird.

primary_domain_index is distance_ref_point[0]. With Form2 we would find
both primary and secondary domain ID same for all resources other than
persistent memory device. The usage w.r.t. persistent memory is
explained in patch 7.

With Form2 the primary domainID and secondary domainID are used to identify the 
NUMA nodes
the kernel should use when using persistent memory devices. Persistent memory 
devices
can also be used as regular memory using DAX KMEM driver and primary domainID 
indicates
the numa node number OS should use when using these devices as regular memory. 
Secondary
domainID is the numa node number that should be used when using this device as
persistent memory. In the later case, we are interested in the locality of the
device to an established numa node. In the above example, if the last row 
represents a
persistent memory device/resource, NUMA node number 40 will be used when using 
the device
as regular memory and NUMA node number 0 will be the device numa node when 
using it as
a persistent memory device.


>
>> +*secondary = of_read_number([secondary_index], 1);
>> +}
>> +if (*primary == 0x || *primary >= nr_node_ids)
>> +*primary = NUMA_NO_NODE;
>> +
>> +if (*secondary == 0x || *secondary >= nr_node_ids)
>> +*secondary = NUMA_NO_NODE;
>> +return 0;
>> +}
>> +
>>  /* Returns the nid associated with the given device tree node,
>>   * or -1 if not found.
>>   */
>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>> b/arch/powerpc/platforms/pseries/papr_scm.c
>> index ef26fe40efb0..9bf2f1f3ddc5 100644
>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>> @@ -18,6 +18,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include "pseries.h"
>>  
>>  #define BIND_ANY_ADDR (~0ul)
>>  
>> @@ -88,6 +89,8 @@ struct papr_scm_perf_stats {
>>  struct papr_scm_priv {
>>  struct platform_device *pdev;
>>  struct device_node *dn;
>> +int numa_node;
>> +int target_node;
>>  uint32_t drc_index;
>>  uint64_t blocks;
>>  uint64_t block_size;
>> @@ -923,7 +926,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>  struct nd_mapping_desc mapping;
>>  struct nd_region_desc ndr_desc;
>>  unsigned long dimm_flags;
>> -int target_nid, online_nid;
>>  ssize_t stat_size;
>>  
>>  p->bus_desc.ndctl = papr_scm_ndctl;
>> @@ -974,10 +976,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>>  mapping.size = p->blocks * p->block_size; // XXX: potential overflow?
>>  
>>  memset(_desc, 0, sizeof(ndr_desc));
>> -target_nid = dev_to_node(>pdev->dev);
>> -online_nid = numa_map_to_online_node(target_nid);
>> -ndr_desc.numa_node = online_nid;
>> -ndr_desc.target_node = target_nid;
>> +ndr_desc.numa_node = p->numa_node;
>> +ndr_desc.target_node = p->target_node;
>>  ndr_desc.res = >res;
>>  ndr_desc.of_node = p->dn;
>>  ndr_desc.provider_data = p;
>> @@ -1001,9 +1001,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv 
>> *p)
>>  ndr_desc.res, p->dn);
>>  goto err;
>>  }
>> -if (target_nid != online_nid)
>> -

[PATCH] cpufreq:powernv: Fix init_chip_info initialization in numa=off

2021-06-14 Thread Pratik R. Sampat
In the numa=off kernel command-line configuration init_chip_info() loops
around the number of chips and attempts to copy the cpumask of that node
which is NULL for all iterations after the first chip.

Hence adding a check to bail out after the first initialization if there
is only one node.

Fixes: 053819e0bf84 ("cpufreq: powernv: Handle throttling due to Pmax capping 
at chip level")
Signed-off-by: Pratik R. Sampat 
Reported-by: Shirisha Ganta 
---
 drivers/cpufreq/powernv-cpufreq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e439b43c19eb..663f9c4b5e3a 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -1078,6 +1078,8 @@ static int init_chip_info(void)
INIT_WORK([i].throttle, powernv_cpufreq_work_fn);
for_each_cpu(cpu, [i].mask)
per_cpu(chip_info, cpu) =  [i];
+   if (num_possible_nodes() == 1)
+   break;
}
 
 free_and_return:
-- 
2.30.2



Re: [RFC PATCH 7/8] powerpc/pseries: Add support for FORM2 associativity

2021-06-14 Thread Aneesh Kumar K.V
David Gibson  writes:

> On Mon, Jun 14, 2021 at 10:10:02PM +0530, Aneesh Kumar K.V wrote:
>> Signed-off-by: Daniel Henrique Barboza 
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>>  Documentation/powerpc/associativity.rst   | 139 
>>  arch/powerpc/include/asm/firmware.h   |   3 +-
>>  arch/powerpc/include/asm/prom.h   |   1 +
>>  arch/powerpc/kernel/prom_init.c   |   3 +-
>>  arch/powerpc/mm/numa.c| 149 +-
>>  arch/powerpc/platforms/pseries/firmware.c |   1 +
>>  6 files changed, 290 insertions(+), 6 deletions(-)
>>  create mode 100644 Documentation/powerpc/associativity.rst
>> 
>> diff --git a/Documentation/powerpc/associativity.rst 
>> b/Documentation/powerpc/associativity.rst
>> new file mode 100644
>> index ..58abedea81d7
>> --- /dev/null
>> +++ b/Documentation/powerpc/associativity.rst
>> @@ -0,0 +1,139 @@
>> +
>> +NUMA resource associativity
>> +=
>> +
>> +Associativity represents the groupings of the various platform resources 
>> into
>> +domains of substantially similar mean performance relative to resources 
>> outside
>> +of that domain. Resources subsets of a given domain that exhibit better
>> +performance relative to each other than relative to other resources subsets
>> +are represented as being members of a sub-grouping domain. This performance
>> +characteristic is presented in terms of NUMA node distance within the Linux 
>> kernel.
>> +From the platform view, these groups are also referred to as domains.
>> +
>> +PAPR interface currently supports two different ways of communicating these 
>> resource
>
> You describe form 2 below as well, which contradicts this.

Fixed as below.

PAPR interface currently supports different ways of communicating these resource
grouping details to the OS. These are referred to as Form 0, Form 1 and Form2
associativity grouping. Form 0 is the older format and is now considered 
deprecated.

Hypervisor indicates the type/form of associativity used via 
"ibm,arcitecture-vec-5 property".
Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
Form 0 or Form 1.
A value of 1 indicates the usage of Form 1 associativity. For Form 2 
associativity
bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.



>
>> +grouping details to the OS. These are referred to as Form 0 and Form 1 
>> associativity grouping.
>> +Form 0 is the older format and is now considered deprecated.
>> +
>> +Hypervisor indicates the type/form of associativity used via 
>> "ibm,arcitecture-vec-5 property".
>> +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
>> Form 0 or Form 1.
>> +A value of 1 indicates the usage of Form 1 associativity.
>> +
>> +Form 0
>> +-
>> +Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
>> +
>> +Form 1
>> +-
>> +With Form 1 a combination of ibm,associativity-reference-points and 
>> ibm,associativity
>> +device tree properties are used to determine the NUMA distance between 
>> resource groups/domains. 
>> +
>> +The “ibm,associativity” property contains one or more lists of numbers 
>> (domainID)
>> +representing the resource’s platform grouping domains.
>> +
>> +The “ibm,associativity-reference-points” property contains one or more list 
>> of numbers
>> +(domain index) that represents the 1 based ordinal in the associativity 
>> lists of the most
>> +significant boundary, with subsequent entries indicating progressively less 
>> significant boundaries.
>> +
>> +Linux kernel uses the domain id of the most significant boundary (aka 
>> primary domain)
>
> I thought we used the *least* significant boundary (the smallest
> grouping, not the largest).  That is, the last index, not the first.
>
> Actually... come to think of it, I'm not even sure how to interpret
> "most significant".  Does that mean a change in grouping at that "most
> significant" level results in the largest perfomance difference?

PAPR defines "most significant" as below

When the “ibm,architecture-vec-5” property byte 5 bit 0 has the value of one, 
the “ibm,associativ-
ity-reference-points” property indicates boundaries between associativity 
domains presented by the
“ibm,associativity” property containing “near” and “far” resources. The
first such boundary in the list represents the 1 based ordinal in the
associativity lists of the most significant boundary, with subsequent
entries indicating progressively less significant boundaries

I would interpret it as the boundary where we start defining NUMA nodes.

>
>> +as the NUMA node id. Linux kernel computes NUMA distance between two 
>> domains by
>> +recursively comparing if they belong to the same higher-level domains. For 
>> mismatch
>> +at every higher level of the resource group, the kernel doubles the NUMA 
>> distance between
>> +the comparing domains.
>> +
>> +Form 2
>> +---
>> +Form 2 associativity 

[PATCH v2] selftests/powerpc: Always test lmw and stmw

2021-06-14 Thread Jordan Niethe
Load Multiple Word (lmw) and Store Multiple Word (stmw) will raise an
Alignment Exception:
  - Little Endian mode: always
  - Big Endian mode: address not word aligned

These conditions do not depend on cache inhibited memory. Test the
alignment handler emulation of these instructions regardless of if there
is cache inhibited memory available or not.

Commit dd3a44c06f7b ("selftests/powerpc: Only test lwm/stmw on big
endian") stopped testing lmw/stmw on little endian because newer
binutils (>= 2.36) will not assemble them in little endian mode. The
kernel still emulates these instructions in little endian mode so use
macros to generate them and test them.

Signed-off-by: Jordan Niethe 
---
v2: Use macros for lmw/stmw
---
 .../powerpc/alignment/alignment_handler.c | 101 +-
 .../selftests/powerpc/include/instructions.h  |  10 ++
 2 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 33ee34fc0828..26878147f389 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "utils.h"
 #include "instructions.h"
@@ -453,11 +454,6 @@ int test_alignment_handler_integer(void)
STORE_DFORM_TEST(stdu);
STORE_XFORM_TEST(stdux);
 
-#ifdef __BIG_ENDIAN__
-   LOAD_DFORM_TEST(lmw);
-   STORE_DFORM_TEST(stmw);
-#endif
-
return rc;
 }
 
@@ -602,6 +598,99 @@ int test_alignment_handler_fp_prefix(void)
return rc;
 }
 
+int test_alignment_handler_multiple(void)
+{
+   int offset, width, r, rc = 0;
+   void *src1, *dst1, *src2, *dst2;
+
+   rc = posix_memalign(, bufsize, bufsize);
+   if (rc) {
+   printf("\n");
+   return rc;
+   }
+
+   rc = posix_memalign(, bufsize, bufsize);
+   if (rc) {
+   printf("\n");
+   free(src1);
+   return rc;
+   }
+
+   src2 = malloc(bufsize);
+   if (!src2) {
+   printf("\n");
+   free(src1);
+   free(dst1);
+   return -ENOMEM;
+   }
+
+   dst2 = malloc(bufsize);
+   if (!dst2) {
+   printf("\n");
+   free(src1);
+   free(dst1);
+   free(src2);
+   return -ENOMEM;
+   }
+
+   /* lmw */
+   width = 4;
+   printf("\tDoing lmw:\t");
+   for (offset = 0; offset < width; offset++) {
+   preload_data(src1, offset, width);
+   preload_data(src2, offset, width);
+
+   asm volatile(LMW(31, %0, 0)
+"std 31, 0(%1)"
+:: "r"(src1 + offset), "r"(dst1 + offset), "r"(0)
+: "memory", "r31");
+
+   memcpy(dst2 + offset, src1 + offset, width);
+
+   r = test_memcmp(dst1, dst2, width, offset, "test_lmw");
+   if (r && !debug) {
+   printf("FAILED: Wrong Data\n");
+   break;
+   }
+   }
+
+   if (!r)
+   printf("PASSED\n");
+   else
+   rc |= 1;
+
+   /* stmw */
+   width = 4;
+   printf("\tDoing stmw:\t");
+   for (offset = 0; offset < width; offset++) {
+   preload_data(src1, offset, width);
+   preload_data(src2, offset, width);
+
+   asm volatile("ld  31, 0(%0) ;"
+STMW(31, %1, 0)
+:: "r"(src1 + offset), "r"(dst1 + offset), "r"(0)
+: "memory", "r31");
+
+   memcpy(dst2 + offset, src1 + offset, width);
+
+   r = test_memcmp(dst1, dst2, width, offset, "test_stmw");
+   if (r && !debug) {
+   printf("FAILED: Wrong Data\n");
+   break;
+   }
+   }
+   if (!r)
+   printf("PASSED\n");
+   else
+   rc |= 1;
+
+   free(src1);
+   free(src2);
+   free(dst1);
+   free(dst2);
+   return rc;
+}
+
 void usage(char *prog)
 {
printf("Usage: %s [options] [path [offset]]\n", prog);
@@ -676,5 +765,7 @@ int main(int argc, char *argv[])
   "test_alignment_handler_fp_206");
rc |= test_harness(test_alignment_handler_fp_prefix,
   "test_alignment_handler_fp_prefix");
+   rc |= test_harness(test_alignment_handler_multiple,
+  "test_alignment_handler_multiple");
return rc;
 }
diff --git a/tools/testing/selftests/powerpc/include/instructions.h 
b/tools/testing/selftests/powerpc/include/instructions.h
index 4efa6314bd96..60605e2bbd3c 100644
--- a/tools/testing/selftests/powerpc/include/instructions.h
+++ 

Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec

2021-06-14 Thread Herbert Xu
On Fri, Jun 11, 2021 at 09:07:43PM -0700, Ira Weiny wrote:
>
> More recently this was added:
> 
> 7e34e0bbc644 crypto: omap-crypto - fix userspace copied buffer access
> 
> I'm CC'ing Tero and Herbert to see why they added the SLAB check.

Probably because the generic Crypto API has the same check.  This
all goes back to

commit 4f3e797ad07d52d34983354a77b365dfcd48c1b4
Author: Herbert Xu 
Date:   Mon Feb 9 14:22:14 2009 +1100

crypto: scatterwalk - Avoid flush_dcache_page on slab pages

It's illegal to call flush_dcache_page on slab pages on a number
of architectures.  So this patch avoids doing so if PageSlab is
true.

In future we can move the flush_dcache_page call to those page
cache users that actually need it.

Reported-by: David S. Miller 
Signed-off-by: Herbert Xu 

But I can't find any emails discussing this so let me ask Dave
directly and see if he can tell us what the issue was or might
have been.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH v9 01/14] swiotlb: Refactor swiotlb init functions

2021-06-14 Thread Claire Chang
On Mon, Jun 14, 2021 at 2:16 PM Christoph Hellwig  wrote:
>
> On Fri, Jun 11, 2021 at 11:26:46PM +0800, Claire Chang wrote:
> > + spin_lock_init(>lock);
> > + for (i = 0; i < mem->nslabs; i++) {
> > + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> > + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> > + mem->slots[i].alloc_size = 0;
> > + }
> > +
> > + if (memory_decrypted)
> > + set_memory_decrypted((unsigned long)vaddr, bytes >> 
> > PAGE_SHIFT);
> > + memset(vaddr, 0, bytes);
>
> We don't really need to do this call before the memset.  Which means we
> can just move it to the callers that care instead of having a bool
> argument.
>
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig 

Thanks for the review. Will wait more days for other reviews and send
v10 to address the comments in this and other patches.


Re: [RFC PATCH 6/8] powerpc/pseries: Add a helper for form1 cpu distance

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:10:01PM +0530, Aneesh Kumar K.V wrote:
> This helper is only used with the dispatch trace log collection.
> A later patch will add Form2 affinity support and this change helps
> in keeping that simpler. Also add a comment explaining we don't expect
> the code to be called with FORM0
> 
> Signed-off-by: Aneesh Kumar K.V 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/mm/numa.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 64caaf07cf82..696e5bfe1414 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -166,7 +166,7 @@ static void unmap_cpu_from_node(unsigned long cpu)
>  }
>  #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
>  
> -int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
> +static int __cpu_form1_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
>  {
>   int dist = 0;
>  
> @@ -182,6 +182,14 @@ int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
>   return dist;
>  }
>  
> +int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
> +{
> + /* We should not get called with FORM0 */
> + VM_WARN_ON(affinity_form == FORM0_AFFINITY);
> +
> + return __cpu_form1_distance(cpu1_assoc, cpu2_assoc);
> +}
> +
>  /* must hold reference to node during call */
>  static const __be32 *of_get_associativity(struct device_node *dev)
>  {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 8/8] powerpc/papr_scm: Use FORM2 associativity details

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:10:03PM +0530, Aneesh Kumar K.V wrote:
> FORM2 introduce a concept of secondary domain which is identical to the
> conceept of FORM1 primary domain. Use secondary domain as the numa node
> when using persistent memory device. For DAX kmem use the logical domain
> id introduced in FORM2. This new numa node
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/numa.c| 28 +++
>  arch/powerpc/platforms/pseries/papr_scm.c | 26 +
>  arch/powerpc/platforms/pseries/pseries.h  |  1 +
>  3 files changed, 45 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 86cd2af014f7..b9ac6d02e944 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -265,6 +265,34 @@ static int associativity_to_nid(const __be32 
> *associativity)
>   return nid;
>  }
>  
> +int get_primary_and_secondary_domain(struct device_node *node, int *primary, 
> int *secondary)
> +{
> + int secondary_index;
> + const __be32 *associativity;
> +
> + if (!numa_enabled) {
> + *primary = NUMA_NO_NODE;
> + *secondary = NUMA_NO_NODE;
> + return 0;
> + }
> +
> + associativity = of_get_associativity(node);
> + if (!associativity)
> + return -ENODEV;
> +
> + if (of_read_number(associativity, 1) >= primary_domain_index) {
> + *primary = of_read_number([primary_domain_index], 
> 1);
> + secondary_index = of_read_number(_ref_points[1], 1);

Secondary ID is always the second reference point, but primary depends
on the length of resources?  That seems very weird.

> + *secondary = of_read_number([secondary_index], 1);
> + }
> + if (*primary == 0x || *primary >= nr_node_ids)
> + *primary = NUMA_NO_NODE;
> +
> + if (*secondary == 0x || *secondary >= nr_node_ids)
> + *secondary = NUMA_NO_NODE;
> + return 0;
> +}
> +
>  /* Returns the nid associated with the given device tree node,
>   * or -1 if not found.
>   */
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index ef26fe40efb0..9bf2f1f3ddc5 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include "pseries.h"
>  
>  #define BIND_ANY_ADDR (~0ul)
>  
> @@ -88,6 +89,8 @@ struct papr_scm_perf_stats {
>  struct papr_scm_priv {
>   struct platform_device *pdev;
>   struct device_node *dn;
> + int numa_node;
> + int target_node;
>   uint32_t drc_index;
>   uint64_t blocks;
>   uint64_t block_size;
> @@ -923,7 +926,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   struct nd_mapping_desc mapping;
>   struct nd_region_desc ndr_desc;
>   unsigned long dimm_flags;
> - int target_nid, online_nid;
>   ssize_t stat_size;
>  
>   p->bus_desc.ndctl = papr_scm_ndctl;
> @@ -974,10 +976,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   mapping.size = p->blocks * p->block_size; // XXX: potential overflow?
>  
>   memset(_desc, 0, sizeof(ndr_desc));
> - target_nid = dev_to_node(>pdev->dev);
> - online_nid = numa_map_to_online_node(target_nid);
> - ndr_desc.numa_node = online_nid;
> - ndr_desc.target_node = target_nid;
> + ndr_desc.numa_node = p->numa_node;
> + ndr_desc.target_node = p->target_node;
>   ndr_desc.res = >res;
>   ndr_desc.of_node = p->dn;
>   ndr_desc.provider_data = p;
> @@ -1001,9 +1001,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   ndr_desc.res, p->dn);
>   goto err;
>   }
> - if (target_nid != online_nid)
> - dev_info(dev, "Region registered with target node %d and online 
> node %d",
> -  target_nid, online_nid);
>  
>   mutex_lock(_ndr_lock);
>   list_add_tail(>region_list, _nd_regions);
> @@ -1096,7 +1093,7 @@ static int papr_scm_probe(struct platform_device *pdev)
>   struct papr_scm_priv *p;
>   const char *uuid_str;
>   u64 uuid[2];
> - int rc;
> + int rc, numa_node;
>  
>   /* check we have all the required DT properties */
>   if (of_property_read_u32(dn, "ibm,my-drc-index", _index)) {
> @@ -1119,11 +1116,20 @@ static int papr_scm_probe(struct platform_device 
> *pdev)
>   return -ENODEV;
>   }
>  
> -
>   p = kzalloc(sizeof(*p), GFP_KERNEL);
>   if (!p)
>   return -ENOMEM;
>  
> + if (get_primary_and_secondary_domain(dn, >target_node, _node)) {
> + dev_err(>dev, "%pOF: missing NUMA attributes!\n", dn);
> + rc = -ENODEV;
> + goto err;
> + }
> + p->numa_node = numa_map_to_online_node(numa_node);
> + if (numa_node != p->numa_node)
> + dev_info(>dev, "Region 

Re: [RFC PATCH 7/8] powerpc/pseries: Add support for FORM2 associativity

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:10:02PM +0530, Aneesh Kumar K.V wrote:
> Signed-off-by: Daniel Henrique Barboza 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  Documentation/powerpc/associativity.rst   | 139 
>  arch/powerpc/include/asm/firmware.h   |   3 +-
>  arch/powerpc/include/asm/prom.h   |   1 +
>  arch/powerpc/kernel/prom_init.c   |   3 +-
>  arch/powerpc/mm/numa.c| 149 +-
>  arch/powerpc/platforms/pseries/firmware.c |   1 +
>  6 files changed, 290 insertions(+), 6 deletions(-)
>  create mode 100644 Documentation/powerpc/associativity.rst
> 
> diff --git a/Documentation/powerpc/associativity.rst 
> b/Documentation/powerpc/associativity.rst
> new file mode 100644
> index ..58abedea81d7
> --- /dev/null
> +++ b/Documentation/powerpc/associativity.rst
> @@ -0,0 +1,139 @@
> +
> +NUMA resource associativity
> +=
> +
> +Associativity represents the groupings of the various platform resources into
> +domains of substantially similar mean performance relative to resources 
> outside
> +of that domain. Resources subsets of a given domain that exhibit better
> +performance relative to each other than relative to other resources subsets
> +are represented as being members of a sub-grouping domain. This performance
> +characteristic is presented in terms of NUMA node distance within the Linux 
> kernel.
> +From the platform view, these groups are also referred to as domains.
> +
> +PAPR interface currently supports two different ways of communicating these 
> resource

You describe form 2 below as well, which contradicts this.

> +grouping details to the OS. These are referred to as Form 0 and Form 1 
> associativity grouping.
> +Form 0 is the older format and is now considered deprecated.
> +
> +Hypervisor indicates the type/form of associativity used via 
> "ibm,arcitecture-vec-5 property".
> +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
> Form 0 or Form 1.
> +A value of 1 indicates the usage of Form 1 associativity.
> +
> +Form 0
> +-
> +Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
> +
> +Form 1
> +-
> +With Form 1 a combination of ibm,associativity-reference-points and 
> ibm,associativity
> +device tree properties are used to determine the NUMA distance between 
> resource groups/domains. 
> +
> +The “ibm,associativity” property contains one or more lists of numbers 
> (domainID)
> +representing the resource’s platform grouping domains.
> +
> +The “ibm,associativity-reference-points” property contains one or more list 
> of numbers
> +(domain index) that represents the 1 based ordinal in the associativity 
> lists of the most
> +significant boundary, with subsequent entries indicating progressively less 
> significant boundaries.
> +
> +Linux kernel uses the domain id of the most significant boundary (aka 
> primary domain)

I thought we used the *least* significant boundary (the smallest
grouping, not the largest).  That is, the last index, not the first.

Actually... come to think of it, I'm not even sure how to interpret
"most significant".  Does that mean a change in grouping at that "most
significant" level results in the largest perfomance difference?

> +as the NUMA node id. Linux kernel computes NUMA distance between two domains 
> by
> +recursively comparing if they belong to the same higher-level domains. For 
> mismatch
> +at every higher level of the resource group, the kernel doubles the NUMA 
> distance between
> +the comparing domains.
> +
> +Form 2
> +---
> +Form 2 associativity format adds separate device tree properties 
> representing NUMA node distance
> +thereby making the node distance computation flexible. Form 2 also allows 
> flexible primary
> +domain numbering. With numa distance computation now detached from the index 
> value of
> +"ibm,associativity" property, Form 2 allows a large number of primary domain 
> ids at the
> +same domain index representing resource groups of different
> performance/latency characteristics.

The meaning of "domain index" is not clear to me here.

> +
> +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 
> in the
> +"ibm,architecture-vec-5" property.
> +
> +"ibm,numa-lookup-index-table" property contains one or more list numbers 
> representing
> +the domainIDs present in the system. The offset of the domainID in this 
> property is considered
> +the domainID index.

You haven't really introduced the term "domainID".  Is "domainID
index" the same as "domain index" above?  It's not clear to me.

The distinction between "domain index" and "primary domain id" is also
not clear to me.

> +prop-encoded-array: The number N of the domainIDs encoded as with 
> encode-int, followed by
> +N domainID encoded as with encode-int
> +
> +For ex:
> +ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index for 
> 

Re: [RFC PATCH 3/8] powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:09:58PM +0530, Aneesh Kumar K.V wrote:
> Also make related code cleanup that will allow adding FORM2_AFFINITY in
> later patches. No functional change in this patch.
> 
> Signed-off-by: Aneesh Kumar K.V 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/firmware.h   |  4 +--
>  arch/powerpc/include/asm/prom.h   |  2 +-
>  arch/powerpc/kernel/prom_init.c   |  2 +-
>  arch/powerpc/mm/numa.c| 35 ++-
>  arch/powerpc/platforms/pseries/firmware.c |  2 +-
>  5 files changed, 26 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/firmware.h 
> b/arch/powerpc/include/asm/firmware.h
> index 7604673787d6..60b631161360 100644
> --- a/arch/powerpc/include/asm/firmware.h
> +++ b/arch/powerpc/include/asm/firmware.h
> @@ -44,7 +44,7 @@
>  #define FW_FEATURE_OPAL  ASM_CONST(0x1000)
>  #define FW_FEATURE_SET_MODE  ASM_CONST(0x4000)
>  #define FW_FEATURE_BEST_ENERGY   ASM_CONST(0x8000)
> -#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001)
> +#define FW_FEATURE_FORM1_AFFINITY ASM_CONST(0x0001)
>  #define FW_FEATURE_PRRN  ASM_CONST(0x0002)
>  #define FW_FEATURE_DRMEM_V2  ASM_CONST(0x0004)
>  #define FW_FEATURE_DRC_INFO  ASM_CONST(0x0008)
> @@ -69,7 +69,7 @@ enum {
>   FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
>   FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
>   FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
> - FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
> + FW_FEATURE_FORM1_AFFINITY | FW_FEATURE_PRRN |
>   FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
>   FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
>   FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
> diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
> index 324a13351749..df9fec9d232c 100644
> --- a/arch/powerpc/include/asm/prom.h
> +++ b/arch/powerpc/include/asm/prom.h
> @@ -147,7 +147,7 @@ extern int of_read_drc_info_cell(struct property **prop,
>  #define OV5_MSI  0x0201  /* PCIe/MSI support */
>  #define OV5_CMO  0x0480  /* Cooperative Memory 
> Overcommitment */
>  #define OV5_XCMO 0x0440  /* Page Coalescing */
> -#define OV5_TYPE1_AFFINITY   0x0580  /* Type 1 NUMA affinity */
> +#define OV5_FORM1_AFFINITY   0x0580  /* FORM1 NUMA affinity */
>  #define OV5_PRRN 0x0540  /* Platform Resource Reassignment */
>  #define OV5_HP_EVT   0x0604  /* Hot Plug Event support */
>  #define OV5_RESIZE_HPT   0x0601  /* Hash Page Table resizing */
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 41ed7e33d897..64b9593038a7 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -1070,7 +1070,7 @@ static const struct ibm_arch_vec 
> ibm_architecture_vec_template __initconst = {
>  #else
>   0,
>  #endif
> - .associativity = OV5_FEAT(OV5_TYPE1_AFFINITY) | 
> OV5_FEAT(OV5_PRRN),
> + .associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | 
> OV5_FEAT(OV5_PRRN),
>   .bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT),
>   .micro_checkpoint = 0,
>   .reserved0 = 0,
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 5941da201fa3..192067991f8a 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -53,7 +53,10 @@ EXPORT_SYMBOL(node_data);
>  
>  static int primary_domain_index;
>  static int n_mem_addr_cells, n_mem_size_cells;
> -static int form1_affinity;
> +
> +#define FORM0_AFFINITY 0
> +#define FORM1_AFFINITY 1
> +static int affinity_form;
>  
>  #define MAX_DISTANCE_REF_POINTS 4
>  static int max_domain_index;
> @@ -190,7 +193,7 @@ int __node_distance(int a, int b)
>   int i;
>   int distance = LOCAL_DISTANCE;
>  
> - if (!form1_affinity)
> + if (affinity_form == FORM0_AFFINITY)
>   return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
>  
>   for (i = 0; i < max_domain_index; i++) {
> @@ -210,7 +213,7 @@ static void initialize_distance_lookup_table(int nid,
>  {
>   int i;
>  
> - if (!form1_affinity)
> + if (affinity_form != FORM1_AFFINITY)
>   return;
>  
>   for (i = 0; i < max_domain_index; i++) {
> @@ -289,6 +292,17 @@ static int __init find_primary_domain_index(void)
>   int index;
>   struct device_node *root;
>  
> + /*
> +  * Check for which form of affinity.
> +  */
> + if (firmware_has_feature(FW_FEATURE_OPAL)) {
> + affinity_form = FORM1_AFFINITY;
> + } else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) {
> + dbg("Using form 1 affinity\n");
> + affinity_form = FORM1_AFFINITY;
> + } else
> + 

Re: [RFC PATCH 1/8] powerpc/pseries: rename min_common_depth to primary_domain_index

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:09:56PM +0530, Aneesh Kumar K.V wrote:
> No functional change in this patch.

I think this needs a rationale as to why 'primary_domain_index' is a
better name than 'min_common_depth'.  The meaning isn't obvious to me
from either name.

> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/numa.c | 38 +++---
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index f2bf98bdcea2..8365b298ec48 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -51,7 +51,7 @@ EXPORT_SYMBOL(numa_cpu_lookup_table);
>  EXPORT_SYMBOL(node_to_cpumask_map);
>  EXPORT_SYMBOL(node_data);
>  
> -static int min_common_depth;
> +static int primary_domain_index;
>  static int n_mem_addr_cells, n_mem_size_cells;
>  static int form1_affinity;
>  
> @@ -232,8 +232,8 @@ static int associativity_to_nid(const __be32 
> *associativity)
>   if (!numa_enabled)
>   goto out;
>  
> - if (of_read_number(associativity, 1) >= min_common_depth)
> - nid = of_read_number([min_common_depth], 1);
> + if (of_read_number(associativity, 1) >= primary_domain_index)
> + nid = of_read_number([primary_domain_index], 1);
>  
>   /* POWER4 LPAR uses 0x as invalid node */
>   if (nid == 0x || nid >= nr_node_ids)
> @@ -284,9 +284,9 @@ int of_node_to_nid(struct device_node *device)
>  }
>  EXPORT_SYMBOL(of_node_to_nid);
>  
> -static int __init find_min_common_depth(void)
> +static int __init find_primary_domain_index(void)
>  {
> - int depth;
> + int index;
>   struct device_node *root;
>  
>   if (firmware_has_feature(FW_FEATURE_OPAL))
> @@ -326,7 +326,7 @@ static int __init find_min_common_depth(void)
>   }
>  
>   if (form1_affinity) {
> - depth = of_read_number(distance_ref_points, 1);
> + index = of_read_number(distance_ref_points, 1);
>   } else {
>   if (distance_ref_points_depth < 2) {
>   printk(KERN_WARNING "NUMA: "
> @@ -334,7 +334,7 @@ static int __init find_min_common_depth(void)
>   goto err;
>   }
>  
> - depth = of_read_number(_ref_points[1], 1);
> + index = of_read_number(_ref_points[1], 1);
>   }
>  
>   /*
> @@ -348,7 +348,7 @@ static int __init find_min_common_depth(void)
>   }
>  
>   of_node_put(root);
> - return depth;
> + return index;
>  
>  err:
>   of_node_put(root);
> @@ -437,16 +437,16 @@ int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   int nid = default_nid;
>   int rc, index;
>  
> - if ((min_common_depth < 0) || !numa_enabled)
> + if ((primary_domain_index < 0) || !numa_enabled)
>   return default_nid;
>  
>   rc = of_get_assoc_arrays();
>   if (rc)
>   return default_nid;
>  
> - if (min_common_depth <= aa.array_sz &&
> + if (primary_domain_index <= aa.array_sz &&
>   !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < 
> aa.n_arrays) {
> - index = lmb->aa_index * aa.array_sz + min_common_depth - 1;
> + index = lmb->aa_index * aa.array_sz + primary_domain_index - 1;
>   nid = of_read_number([index], 1);
>  
>   if (nid == 0x || nid >= nr_node_ids)
> @@ -708,18 +708,18 @@ static int __init parse_numa_properties(void)
>   return -1;
>   }
>  
> - min_common_depth = find_min_common_depth();
> + primary_domain_index = find_primary_domain_index();
>  
> - if (min_common_depth < 0) {
> + if (primary_domain_index < 0) {
>   /*
> -  * if we fail to parse min_common_depth from device tree
> +  * if we fail to parse primary_domain_index from device tree
>* mark the numa disabled, boot with numa disabled.
>*/
>   numa_enabled = false;
> - return min_common_depth;
> + return primary_domain_index;
>   }
>  
> - dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
> + dbg("NUMA associativity depth for CPU/Memory: %d\n", 
> primary_domain_index);
>  
>   /*
>* Even though we connect cpus to numa domains later in SMP
> @@ -919,14 +919,14 @@ static void __init find_possible_nodes(void)
>   goto out;
>   }
>  
> - max_nodes = of_read_number([min_common_depth], 1);
> + max_nodes = of_read_number([primary_domain_index], 1);
>   for (i = 0; i < max_nodes; i++) {
>   if (!node_possible(i))
>   node_set(i, node_possible_map);
>   }
>  
>   prop_length /= sizeof(int);
> - if (prop_length > min_common_depth + 2)
> + if (prop_length > primary_domain_index + 2)
>   coregroup_enabled = 1;
>  
>  out:
> @@ -1259,7 +1259,7 @@ int cpu_to_coregroup_id(int cpu)
>   goto out;
> 

Re: [RFC PATCH 4/8] powerpc/pseries: Consolidate DLPAR NUMA distance update

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:09:59PM +0530, Aneesh Kumar K.V wrote:
> The associativity details of the newly added resourced are collected from
> the hypervisor via "ibm,configure-connector" rtas call. Update the numa
> distance details of the newly added numa node after the above call. In
> later patch we will remove updating NUMA distance when we are looking
> for node id from associativity array.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/numa.c| 41 +++
>  arch/powerpc/platforms/pseries/hotplug-cpu.c  |  2 +
>  .../platforms/pseries/hotplug-memory.c|  2 +
>  arch/powerpc/platforms/pseries/pseries.h  |  1 +
>  4 files changed, 46 insertions(+)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 192067991f8a..fec47981c1ef 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -287,6 +287,47 @@ int of_node_to_nid(struct device_node *device)
>  }
>  EXPORT_SYMBOL(of_node_to_nid);
>  
> +static void __initialize_form1_numa_distance(const __be32 *associativity)
> +{
> + int i, nid;
> +
> + if (of_read_number(associativity, 1) >= primary_domain_index) {
> + nid = of_read_number([primary_domain_index], 1);
> +
> + for (i = 0; i < max_domain_index; i++) {
> + const __be32 *entry;
> +
> + entry = 
> [be32_to_cpu(distance_ref_points[i])];
> + distance_lookup_table[nid][i] = of_read_number(entry, 
> 1);
> + }
> + }
> +}

This logic is almost identicaly to initialize_distance_lookup_table()
- it would be good if they could be consolidated, so it's clear that
coldplugged and hotplugged nodes are parsing the NUMA information in
the same way.

> +
> +static void initialize_form1_numa_distance(struct device_node *node)
> +{
> + const __be32 *associativity;
> +
> + associativity = of_get_associativity(node);
> + if (!associativity)
> + return;
> +
> + __initialize_form1_numa_distance(associativity);
> + return;
> +}
> +
> +/*
> + * Used to update distance information w.r.t newly added node.
> + */
> +void update_numa_distance(struct device_node *node)
> +{
> + if (affinity_form == FORM0_AFFINITY)
> + return;
> + else if (affinity_form == FORM1_AFFINITY) {
> + initialize_form1_numa_distance(node);
> + return;
> + }
> +}
> +
>  static int __init find_primary_domain_index(void)
>  {
>   int index;
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 7e970f81d8ff..778b6ab35f0d 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -498,6 +498,8 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
>   return saved_rc;
>   }
>  
> + update_numa_distance(dn);
> +
>   rc = dlpar_online_cpu(dn);
>   if (rc) {
>   saved_rc = rc;
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 8377f1f7c78e..0e602c3b01ea 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -180,6 +180,8 @@ static int update_lmb_associativity_index(struct 
> drmem_lmb *lmb)
>   return -ENODEV;
>   }
>  
> + update_numa_distance(lmb_node);
> +
>   dr_node = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
>   if (!dr_node) {
>   dlpar_free_cc_nodes(lmb_node);
> diff --git a/arch/powerpc/platforms/pseries/pseries.h 
> b/arch/powerpc/platforms/pseries/pseries.h
> index 1f051a786fb3..663a0859cf13 100644
> --- a/arch/powerpc/platforms/pseries/pseries.h
> +++ b/arch/powerpc/platforms/pseries/pseries.h
> @@ -113,4 +113,5 @@ extern u32 pseries_security_flavor;
>  void pseries_setup_security_mitigations(void);
>  void pseries_lpar_read_hblkrm_characteristics(void);
>  
> +void update_numa_distance(struct device_node *node);
>  #endif /* _PSERIES_PSERIES_H */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [RFC PATCH 2/8] powerpc/pseries: rename distance_ref_points_depth to max_domain_index

2021-06-14 Thread David Gibson
On Mon, Jun 14, 2021 at 10:09:57PM +0530, Aneesh Kumar K.V wrote:
> No functional change in this patch

As with 1/8 an explanation of what this actually means and therefore
why this is a better name would be very helpful.

> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/mm/numa.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 8365b298ec48..5941da201fa3 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -56,7 +56,7 @@ static int n_mem_addr_cells, n_mem_size_cells;
>  static int form1_affinity;
>  
>  #define MAX_DISTANCE_REF_POINTS 4
> -static int distance_ref_points_depth;
> +static int max_domain_index;
>  static const __be32 *distance_ref_points;
>  static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
>  
> @@ -169,7 +169,7 @@ int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
>  
>   int i, index;
>  
> - for (i = 0; i < distance_ref_points_depth; i++) {
> + for (i = 0; i < max_domain_index; i++) {
>   index = be32_to_cpu(distance_ref_points[i]);
>   if (cpu1_assoc[index] == cpu2_assoc[index])
>   break;
> @@ -193,7 +193,7 @@ int __node_distance(int a, int b)
>   if (!form1_affinity)
>   return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
>  
> - for (i = 0; i < distance_ref_points_depth; i++) {
> + for (i = 0; i < max_domain_index; i++) {
>   if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
>   break;
>  
> @@ -213,7 +213,7 @@ static void initialize_distance_lookup_table(int nid,
>   if (!form1_affinity)
>   return;
>  
> - for (i = 0; i < distance_ref_points_depth; i++) {
> + for (i = 0; i < max_domain_index; i++) {
>   const __be32 *entry;
>  
>   entry = [be32_to_cpu(distance_ref_points[i]) - 1];
> @@ -240,7 +240,7 @@ static int associativity_to_nid(const __be32 
> *associativity)
>   nid = NUMA_NO_NODE;
>  
>   if (nid > 0 &&
> - of_read_number(associativity, 1) >= distance_ref_points_depth) {
> + of_read_number(associativity, 1) >= max_domain_index) {
>   /*
>* Skip the length field and send start of associativity array
>*/
> @@ -310,14 +310,14 @@ static int __init find_primary_domain_index(void)
>*/
>   distance_ref_points = of_get_property(root,
>   "ibm,associativity-reference-points",
> - _ref_points_depth);
> + _domain_index);
>  
>   if (!distance_ref_points) {
>   dbg("NUMA: ibm,associativity-reference-points not found.\n");
>   goto err;
>   }
>  
> - distance_ref_points_depth /= sizeof(int);
> + max_domain_index /= sizeof(int);
>  
>   if (firmware_has_feature(FW_FEATURE_OPAL) ||
>   firmware_has_feature(FW_FEATURE_TYPE1_AFFINITY)) {
> @@ -328,7 +328,7 @@ static int __init find_primary_domain_index(void)
>   if (form1_affinity) {
>   index = of_read_number(distance_ref_points, 1);
>   } else {
> - if (distance_ref_points_depth < 2) {
> + if (max_domain_index < 2) {
>   printk(KERN_WARNING "NUMA: "
>   "short ibm,associativity-reference-points\n");
>   goto err;
> @@ -341,10 +341,10 @@ static int __init find_primary_domain_index(void)
>* Warn and cap if the hardware supports more than
>* MAX_DISTANCE_REF_POINTS domains.
>*/
> - if (distance_ref_points_depth > MAX_DISTANCE_REF_POINTS) {
> + if (max_domain_index > MAX_DISTANCE_REF_POINTS) {
>   printk(KERN_WARNING "NUMA: distance array capped at "
>   "%d entries\n", MAX_DISTANCE_REF_POINTS);
> - distance_ref_points_depth = MAX_DISTANCE_REF_POINTS;
> + max_domain_index = MAX_DISTANCE_REF_POINTS;
>   }
>  
>   of_node_put(root);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


RE: [PATCH] usb: gadget: fsl: properly remove remnant of MXC support

2021-06-14 Thread Leo Li


> -Original Message-
> From: Joel Stanley 
> Sent: Monday, June 14, 2021 8:52 PM
> To: Leo Li 
> Cc: Felipe Balbi ; Greg Kroah-Hartman
> ; linux-...@vger.kernel.org; linuxppc-dev
> ; Linux Kernel Mailing List  ker...@vger.kernel.org>; Arnd Bergmann ; Ran Wang
> ; Fabio Estevam 
> Subject: Re: [PATCH] usb: gadget: fsl: properly remove remnant of MXC
> support
> 
> On Sat, 12 Jun 2021 at 00:31, Li Yang  wrote:
> >
> > Commit a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver")
> > didn't remove all the MXC related stuff which can cause build problem
> > for LS1021 when enabled again in Kconfig.  This patch remove all the
> > remnants.
> >
> > Signed-off-by: Li Yang 
> 
> Reviewed-by: Joel Stanley 
> 
> Will you re-submit the kconfig change once this is merged?

I think that we can re-use your previous patch.

Hi Greg,

Can you apply the reverted Kconfig patch again?  Or do you prefer us to 
re-submit it again?

Regards,
Leo


Re: [PATCH v2 09/12] powerpc/inst: Refactor PPC32 and PPC64 versions

2021-06-14 Thread Jordan Niethe
On Thu, May 20, 2021 at 11:50 PM Christophe Leroy
 wrote:
>
> ppc_inst() ppc_inst_prefixed() ppc_inst_swab() can easily
> be made common to both PPC32 and PPC64.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/inst.h | 49 +
>  1 file changed, 13 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index 32d318c3b180..e009e94e90b2 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -60,9 +60,9 @@ static inline int ppc_inst_primary_opcode(struct ppc_inst x)
> return ppc_inst_val(x) >> 26;
>  }
>
> -#ifdef CONFIG_PPC64
>  #define ppc_inst(x) ((struct ppc_inst){ .val = (x) })
>
> +#ifdef CONFIG_PPC64
>  #define ppc_inst_prefix(x, y) ((struct ppc_inst){ .val = (x), .suffix = (y) 
> })
>
>  static inline u32 ppc_inst_suffix(struct ppc_inst x)
> @@ -70,57 +70,34 @@ static inline u32 ppc_inst_suffix(struct ppc_inst x)
> return x.suffix;
>  }
>
> -static inline bool ppc_inst_prefixed(struct ppc_inst x)
> -{
> -   return ppc_inst_primary_opcode(x) == OP_PREFIX;
> -}
> +#else
> +#define ppc_inst_prefix(x, y) ppc_inst(x)
>
> -static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
> +static inline u32 ppc_inst_suffix(struct ppc_inst x)
>  {
> -   return ppc_inst_prefix(swab32(ppc_inst_val(x)), 
> swab32(ppc_inst_suffix(x)));
> +   return 0;
>  }
>
> +#endif /* CONFIG_PPC64 */
> +
>  static inline struct ppc_inst ppc_inst_read(const unsigned int *ptr)
>  {
> -   u32 val, suffix;
> -
> -   val = *ptr;
> -   if ((val >> 26) == OP_PREFIX) {
> -   suffix = *(ptr + 1);
> -   return ppc_inst_prefix(val, suffix);
> -   } else {
> -   return ppc_inst(val);
> -   }
> +   if (IS_ENABLED(CONFIG_PPC64) && (*ptr >> 26) == OP_PREFIX)
> +   return ppc_inst_prefix(*ptr, *(ptr + 1));
> +   else
> +   return ppc_inst(*ptr);
>  }
>
> -#else
> -
> -#define ppc_inst(x) ((struct ppc_inst){ .val = x })
> -
> -#define ppc_inst_prefix(x, y) ppc_inst(x)
> -
>  static inline bool ppc_inst_prefixed(struct ppc_inst x)
>  {
> -   return false;
> -}
> -
> -static inline u32 ppc_inst_suffix(struct ppc_inst x)
> -{
> -   return 0;
> +   return IS_ENABLED(CONFIG_PPC64) && ppc_inst_primary_opcode(x) == 
> OP_PREFIX;
>  }
>
>  static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
>  {
> -   return ppc_inst(swab32(ppc_inst_val(x)));
> -}
> -
> -static inline struct ppc_inst ppc_inst_read(const unsigned int *ptr)
> -{
> -   return ppc_inst(*ptr);
> +   return ppc_inst_prefix(swab32(ppc_inst_val(x)), 
> swab32(ppc_inst_suffix(x)));
>  }
>
> -#endif /* CONFIG_PPC64 */
> -
>  static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y)
>  {
> if (ppc_inst_val(x) != ppc_inst_val(y))
> --
> 2.25.0
>
Reviewed by: Jordan Niethe 


Re: [PATCH v2 08/12] powerpc: Don't use 'struct ppc_inst' to reference instruction location

2021-06-14 Thread Jordan Niethe
On Tue, Jun 15, 2021 at 12:01 PM Michael Ellerman  wrote:
>
> Christophe Leroy  writes:
> > diff --git a/arch/powerpc/include/asm/inst.h 
> > b/arch/powerpc/include/asm/inst.h
> > index 5a0740ebf132..32d318c3b180 100644
> > --- a/arch/powerpc/include/asm/inst.h
> > +++ b/arch/powerpc/include/asm/inst.h
> > @@ -139,7 +139,7 @@ static inline int ppc_inst_len(struct ppc_inst x)
> >   * Return the address of the next instruction, if the instruction @value 
> > was
> >   * located at @location.
> >   */
> > -static inline struct ppc_inst *ppc_inst_next(void *location, struct 
> > ppc_inst *value)
> > +static inline unsigned int *ppc_inst_next(unsigned int *location, unsigned 
> > int *value)
> >  {
> >   struct ppc_inst tmp;
> >
>
> It's not visible in the diff, but the rest of the function is:
>
> tmp = ppc_inst_read(value);
>
> return location + ppc_inst_len(tmp);
> }
>
> And so changing the type of location from void * to int * changes the
> result of that addition, ie. previously it was in units of bytes, now
> it's units of 4 bytes.
>
> To fix it I've kept location as unsigned int *, and added a cast where
> we do the addition. That way users of the function just see unsigned int *,
> the cast to void * is an implementation detail.
>
> We only have a handful of uses of ppc_inst_len(), so maybe that should
> change name and return a result in units of int *. But that can be a
> separate change.
>
> > diff --git a/arch/powerpc/platforms/86xx/mpc86xx_smp.c 
> > b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
> > index 87f524e4b09c..302f2a1e0361 100644
> > --- a/arch/powerpc/platforms/86xx/mpc86xx_smp.c
> > +++ b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
> > @@ -83,7 +83,7 @@ smp_86xx_kick_cpu(int nr)
> >   mdelay(1);
> >
> >   /* Restore the exception vector */
> > - patch_instruction((struct ppc_inst *)vector, ppc_inst(save_vector));
> > + patch_instruction(vector, ppc_inst(save_vector));
> >
> >   local_irq_restore(flags);
> >
>
> There was another usage in here:
>
> /* Setup fake reset vector to call __secondary_start_mpc86xx. */
> target = (unsigned long) __secondary_start_mpc86xx;
> -   patch_branch((struct ppc_inst *)vector, target, BRANCH_SET_LINK);
> +   patch_branch(vector, target, BRANCH_SET_LINK);
>
> /* Kick that CPU */
> smp_86xx_release_core(nr);
>
> I fixed it up.
>
> cheers
fwiw
Reviewed by: Jordan Niethe 


Re: [PATCH v2 07/12] powerpc/lib/code-patching: Don't use struct 'ppc_inst' for runnable code in tests.

2021-06-14 Thread Jordan Niethe
On Thu, May 20, 2021 at 11:50 PM Christophe Leroy
 wrote:
>
> 'struct ppc_inst' is meant to represent an instruction internally, it
> is not meant to dereference code in memory.
>
> For testing code patching, use patch_instruction() to properly
> write into memory the code to be tested.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/lib/code-patching.c | 95 ++--
>  1 file changed, 53 insertions(+), 42 deletions(-)
>
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 82f2c1edb498..508e9511ca96 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -422,9 +422,9 @@ static void __init test_branch_iform(void)
>  {
> int err;
> struct ppc_inst instr;
> -   unsigned long addr;
> -
> -   addr = (unsigned long)
> +   unsigned int tmp[2];
> +   struct ppc_inst *iptr = (struct ppc_inst *)tmp;
> +   unsigned long addr = (unsigned long)tmp;
>
> /* The simplest case, branch to self, no flags */
> check(instr_is_branch_iform(ppc_inst(0x4800)));
> @@ -445,52 +445,57 @@ static void __init test_branch_iform(void)
> check(!instr_is_branch_iform(ppc_inst(0x7bfd)));
>
> /* Absolute branch to 0x100 */
> -   instr = ppc_inst(0x48000103);
> -   check(instr_is_branch_to_addr(, 0x100));
> +   patch_instruction(iptr, ppc_inst(0x48000103));
> +   check(instr_is_branch_to_addr(iptr, 0x100));
> /* Absolute branch to 0x420fc */
> -   instr = ppc_inst(0x480420ff);
> -   check(instr_is_branch_to_addr(, 0x420fc));
> +   patch_instruction(iptr, ppc_inst(0x480420ff));
> +   check(instr_is_branch_to_addr(iptr, 0x420fc));
> /* Maximum positive relative branch, + 20MB - 4B */
> -   instr = ppc_inst(0x49fc);
> -   check(instr_is_branch_to_addr(, addr + 0x1FC));
> +   patch_instruction(iptr, ppc_inst(0x49fc));
> +   check(instr_is_branch_to_addr(iptr, addr + 0x1FC));
> /* Smallest negative relative branch, - 4B */
> -   instr = ppc_inst(0x4bfc);
> -   check(instr_is_branch_to_addr(, addr - 4));
> +   patch_instruction(iptr, ppc_inst(0x4bfc));
> +   check(instr_is_branch_to_addr(iptr, addr - 4));
> /* Largest negative relative branch, - 32 MB */
> -   instr = ppc_inst(0x4a00);
> -   check(instr_is_branch_to_addr(, addr - 0x200));
> +   patch_instruction(iptr, ppc_inst(0x4a00));
> +   check(instr_is_branch_to_addr(iptr, addr - 0x200));
>
> /* Branch to self, with link */
> -   err = create_branch(, , addr, BRANCH_SET_LINK);
> -   check(instr_is_branch_to_addr(, addr));
> +   err = create_branch(, iptr, addr, BRANCH_SET_LINK);
> +   patch_instruction(iptr, instr);
> +   check(instr_is_branch_to_addr(iptr, addr));
>
> /* Branch to self - 0x100, with link */
> -   err = create_branch(, , addr - 0x100, BRANCH_SET_LINK);
> -   check(instr_is_branch_to_addr(, addr - 0x100));
> +   err = create_branch(, iptr, addr - 0x100, BRANCH_SET_LINK);
> +   patch_instruction(iptr, instr);
> +   check(instr_is_branch_to_addr(iptr, addr - 0x100));
>
> /* Branch to self + 0x100, no link */
> -   err = create_branch(, , addr + 0x100, 0);
> -   check(instr_is_branch_to_addr(, addr + 0x100));
> +   err = create_branch(, iptr, addr + 0x100, 0);
> +   patch_instruction(iptr, instr);
> +   check(instr_is_branch_to_addr(iptr, addr + 0x100));
>
> /* Maximum relative negative offset, - 32 MB */
> -   err = create_branch(, , addr - 0x200, 
> BRANCH_SET_LINK);
> -   check(instr_is_branch_to_addr(, addr - 0x200));
> +   err = create_branch(, iptr, addr - 0x200, BRANCH_SET_LINK);
> +   patch_instruction(iptr, instr);
> +   check(instr_is_branch_to_addr(iptr, addr - 0x200));
>
> /* Out of range relative negative offset, - 32 MB + 4*/
> -   err = create_branch(, , addr - 0x204, 
> BRANCH_SET_LINK);
> +   err = create_branch(, iptr, addr - 0x204, BRANCH_SET_LINK);
> check(err);
>
> /* Out of range relative positive offset, + 32 MB */
> -   err = create_branch(, , addr + 0x200, 
> BRANCH_SET_LINK);
> +   err = create_branch(, iptr, addr + 0x200, BRANCH_SET_LINK);
> check(err);
>
> /* Unaligned target */
> -   err = create_branch(, , addr + 3, BRANCH_SET_LINK);
> +   err = create_branch(, iptr, addr + 3, BRANCH_SET_LINK);
> check(err);
>
> /* Check flags are masked correctly */
> -   err = create_branch(, , addr, 0xFFFC);
> -   check(instr_is_branch_to_addr(, addr));
> +   err = create_branch(, iptr, addr, 0xFFFC);
> +   patch_instruction(iptr, instr);
> +   check(instr_is_branch_to_addr(iptr, addr));
> check(ppc_inst_equal(instr, ppc_inst(0x4800)));
>  }
>
> @@ -513,9 +518,10 @@ 

Re: [PATCH v2 06/12] powerpc/lib/code-patching: Make instr_is_branch_to_addr() static

2021-06-14 Thread Jordan Niethe
On Thu, May 20, 2021 at 11:50 PM Christophe Leroy
 wrote:
>
> instr_is_branch_to_addr() is only used in code-patching.c
>
> Make it static.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/code-patching.h |  1 -
>  arch/powerpc/lib/code-patching.c | 18 +-
>  2 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/code-patching.h 
> b/arch/powerpc/include/asm/code-patching.h
> index f1d029bf906e..f9bd1397b696 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -59,7 +59,6 @@ static inline int modify_instruction_site(s32 *site, 
> unsigned int clr, unsigned
>
>  int instr_is_relative_branch(struct ppc_inst instr);
>  int instr_is_relative_link_branch(struct ppc_inst instr);
> -int instr_is_branch_to_addr(const struct ppc_inst *instr, unsigned long 
> addr);
>  unsigned long branch_target(const struct ppc_inst *instr);
>  int translate_branch(struct ppc_inst *instr, const struct ppc_inst *dest,
>  const struct ppc_inst *src);
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 0308429b0d1a..82f2c1edb498 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -367,15 +367,6 @@ unsigned long branch_target(const struct ppc_inst *instr)
> return 0;
>  }
>
> -int instr_is_branch_to_addr(const struct ppc_inst *instr, unsigned long addr)
> -{
> -   if (instr_is_branch_iform(ppc_inst_read(instr)) ||
> -   instr_is_branch_bform(ppc_inst_read(instr)))
> -   return branch_target(instr) == addr;
> -
> -   return 0;
> -}
> -
>  int translate_branch(struct ppc_inst *instr, const struct ppc_inst *dest,
>  const struct ppc_inst *src)
>  {
> @@ -410,6 +401,15 @@ void __patch_exception(int exc, unsigned long addr)
>
>  #ifdef CONFIG_CODE_PATCHING_SELFTEST
>
> +static int instr_is_branch_to_addr(const struct ppc_inst *instr, unsigned 
> long addr)
> +{
> +   if (instr_is_branch_iform(ppc_inst_read(instr)) ||
> +   instr_is_branch_bform(ppc_inst_read(instr)))
> +   return branch_target(instr) == addr;
> +
> +   return 0;
> +}
> +
>  static void __init test_trampoline(void)
>  {
> asm ("nop;\n");
> --
> 2.25.0
>
Reviewed by: Jordan Niethe 


Re: [PATCH v2 05/12] powerpc: Do not dereference code as 'struct ppc_inst' (uprobe, code-patching, feature-fixups)

2021-06-14 Thread Jordan Niethe
On Thu, May 20, 2021 at 11:50 PM Christophe Leroy
 wrote:
>
> 'struct ppc_inst' is an internal structure to represent an instruction,
> it is not directly the representation of that instruction in text code.
> It is not meant to map and dereference code.
>
> Dereferencing code directly through 'struct ppc_inst' has two main issues:
> - On powerpc, structs are expected to be 8 bytes aligned while code is
> spread every 4 byte.
> - Should a non prefixed instruction lie at the end of the page and the
> following page not be mapped, it would generate a page fault.
>
> In-memory code must be accessed with ppc_inst_read().
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/kernel/uprobes.c | 2 +-
>  arch/powerpc/lib/code-patching.c  | 8 
>  arch/powerpc/lib/feature-fixups.c | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
> index 186f69b11e94..46971bb41d05 100644
> --- a/arch/powerpc/kernel/uprobes.c
> +++ b/arch/powerpc/kernel/uprobes.c
> @@ -42,7 +42,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
> return -EINVAL;
>
> if (cpu_has_feature(CPU_FTR_ARCH_31) &&
> -   ppc_inst_prefixed(auprobe->insn) &&
> +   ppc_inst_prefixed(ppc_inst_read(>insn)) &&
> (addr & 0x3f) == 60) {
> pr_info_ratelimited("Cannot register a uprobe on 64 byte 
> unaligned prefixed instruction\n");
> return -EINVAL;
> diff --git a/arch/powerpc/lib/code-patching.c 
> b/arch/powerpc/lib/code-patching.c
> index 870b30d9be2f..0308429b0d1a 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -329,13 +329,13 @@ static unsigned long branch_iform_target(const struct 
> ppc_inst *instr)
>  {
> signed long imm;
>
> -   imm = ppc_inst_val(*instr) & 0x3FC;
> +   imm = ppc_inst_val(ppc_inst_read(instr)) & 0x3FC;
>
> /* If the top bit of the immediate value is set this is negative */
> if (imm & 0x200)
> imm -= 0x400;
>
> -   if ((ppc_inst_val(*instr) & BRANCH_ABSOLUTE) == 0)
> +   if ((ppc_inst_val(ppc_inst_read(instr)) & BRANCH_ABSOLUTE) == 0)
> imm += (unsigned long)instr;
>
> return (unsigned long)imm;
> @@ -345,13 +345,13 @@ static unsigned long branch_bform_target(const struct 
> ppc_inst *instr)
>  {
> signed long imm;
>
> -   imm = ppc_inst_val(*instr) & 0xFFFC;
> +   imm = ppc_inst_val(ppc_inst_read(instr)) & 0xFFFC;
>
> /* If the top bit of the immediate value is set this is negative */
> if (imm & 0x8000)
> imm -= 0x1;
>
> -   if ((ppc_inst_val(*instr) & BRANCH_ABSOLUTE) == 0)
> +   if ((ppc_inst_val(ppc_inst_read(instr)) & BRANCH_ABSOLUTE) == 0)
> imm += (unsigned long)instr;
>
> return (unsigned long)imm;
> diff --git a/arch/powerpc/lib/feature-fixups.c 
> b/arch/powerpc/lib/feature-fixups.c
> index fe26f2fa0f3f..8905b53109bc 100644
> --- a/arch/powerpc/lib/feature-fixups.c
> +++ b/arch/powerpc/lib/feature-fixups.c
> @@ -51,7 +51,7 @@ static int patch_alt_instruction(struct ppc_inst *src, 
> struct ppc_inst *dest,
>
> instr = ppc_inst_read(src);
>
> -   if (instr_is_relative_branch(*src)) {
> +   if (instr_is_relative_branch(ppc_inst_read(src))) {
The above variable instr could be used here, but that is not an issue
with this patch.
> struct ppc_inst *target = (struct ppc_inst 
> *)branch_target(src);
>
> /* Branch within the section doesn't need translating */
> --
> 2.25.0
>
Reviewed by: Jordan Niethe 


Re: [PATCH v2 04/12] powerpc/inst: Avoid pointer dereferencing in ppc_inst_equal()

2021-06-14 Thread Jordan Niethe
On Thu, May 20, 2021 at 11:50 PM Christophe Leroy
 wrote:
>
> Avoid casting/dereferencing ppc_inst() as u64* , check each member
> of the struct when relevant.
>
> And remove the 0xff initialisation of the suffix for non
> prefixed instruction. An instruction with 0xff as a suffix
> might be invalid, but still is a prefixed instruction and
> has to be considered as this.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/inst.h | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index fc6adef528a5..5a0740ebf132 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -61,7 +61,7 @@ static inline int ppc_inst_primary_opcode(struct ppc_inst x)
>  }
>
>  #ifdef CONFIG_PPC64
> -#define ppc_inst(x) ((struct ppc_inst){ .val = (x), .suffix = 0xff })
> +#define ppc_inst(x) ((struct ppc_inst){ .val = (x) })
>
>  #define ppc_inst_prefix(x, y) ((struct ppc_inst){ .val = (x), .suffix = (y) 
> })
>
> @@ -72,7 +72,7 @@ static inline u32 ppc_inst_suffix(struct ppc_inst x)
>
>  static inline bool ppc_inst_prefixed(struct ppc_inst x)
>  {
> -   return ppc_inst_primary_opcode(x) == OP_PREFIX && ppc_inst_suffix(x) 
> != 0xff;
> +   return ppc_inst_primary_opcode(x) == OP_PREFIX;
>  }
>
>  static inline struct ppc_inst ppc_inst_swab(struct ppc_inst x)
> @@ -93,11 +93,6 @@ static inline struct ppc_inst ppc_inst_read(const struct 
> ppc_inst *ptr)
> }
>  }
>
> -static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y)
> -{
> -   return *(u64 *) == *(u64 *)
> -}
> -
>  #else
>
>  #define ppc_inst(x) ((struct ppc_inst){ .val = x })
> @@ -124,13 +119,17 @@ static inline struct ppc_inst ppc_inst_read(const 
> struct ppc_inst *ptr)
> return *ptr;
>  }
>
> +#endif /* CONFIG_PPC64 */
> +
>  static inline bool ppc_inst_equal(struct ppc_inst x, struct ppc_inst y)
>  {
> -   return ppc_inst_val(x) == ppc_inst_val(y);
> +   if (ppc_inst_val(x) != ppc_inst_val(y))
> +   return false;
> +   if (!ppc_inst_prefixed(x))
> +   return true;
> +   return ppc_inst_suffix(x) == ppc_inst_suffix(y);
>  }
>
> -#endif /* CONFIG_PPC64 */
> -
>  static inline int ppc_inst_len(struct ppc_inst x)
>  {
> return ppc_inst_prefixed(x) ? 8 : 4;
> --
> 2.25.0
>
Reviewed by: Jordan Niethe 


Re: [PATCH] powerpc/signal64: Copy siginfo before changing regs->nip

2021-06-14 Thread Nicholas Piggin
Excerpts from Christophe Leroy's message of June 14, 2021 5:22 pm:
> 
> 
> Le 14/06/2021 à 07:55, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of June 14, 2021 3:31 pm:
>>>
>>>
>>> Le 14/06/2021 à 03:29, Nicholas Piggin a écrit :
 Excerpts from Nicholas Piggin's message of June 14, 2021 10:47 am:
> Excerpts from Michael Ellerman's message of June 8, 2021 11:46 pm:
>> In commit 96d7a4e06fab ("powerpc/signal64: Rewrite handle_rt_signal64()
>> to minimise uaccess switches") the 64-bit signal code was rearranged to
>> use user_write_access_begin/end().
>>
>> As part of that change the call to copy_siginfo_to_user() was moved
>> later in the function, so that it could be done after the
>> user_write_access_end().
>>
>> In particular it was moved after we modify regs->nip to point to the
>> signal trampoline. That means if copy_siginfo_to_user() fails we exit
>> handle_rt_signal64() with an error but with regs->nip modified, whereas
>> previously we would not modify regs->nip until the copy succeeded.
>>
>> Returning an error from signal delivery but with regs->nip updated
>> leaves the process in a sort of half-delivered state. We do immediately
>> force a SEGV in signal_setup_done(), called from do_signal(), so the
>> process should never run in the half-delivered state.
>>
>> However that SEGV is not delivered until we've gone around to
>> do_notify_resume() again, so it's possible some tracing could observe
>> the half-delivered state.
>>
>> There are other cases where we fail signal delivery with regs partly
>> updated, eg. the write to newsp and SA_SIGINFO, but the latter at least
>> is very unlikely to fail as it reads back from the frame we just wrote
>> to.
>>
>> Looking at other arches they seem to be more careful about leaving regs
>> unchanged until the copy operations have succeeded, and in general that
>> seems like good hygenie.
>>
>> So although the current behaviour is not cleary buggy, it's also not
>> clearly correct. So move the call to copy_siginfo_to_user() up prior to
>> the modification of regs->nip, which is closer to the old behaviour, and
>> easier to reason about.
>
> Good catch, should it still have a Fixes: tag though? Even if it's not
> clearly buggy we want it to be patched.

 Also...

>>
>> Signed-off-by: Michael Ellerman 
>> ---
>>arch/powerpc/kernel/signal_64.c | 9 -
>>1 file changed, 4 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/signal_64.c 
>> b/arch/powerpc/kernel/signal_64.c
>> index dca66481d0c2..f9e1f5428b9e 100644
>> --- a/arch/powerpc/kernel/signal_64.c
>> +++ b/arch/powerpc/kernel/signal_64.c
>> @@ -902,6 +902,10 @@ int handle_rt_signal64(struct ksignal *ksig, 
>> sigset_t *set,
>>  unsafe_copy_to_user(>uc.uc_sigmask, set, sizeof(*set), 
>> badframe_block);
>>  user_write_access_end();
>>
>> +/* Save the siginfo outside of the unsafe block. */
>> +if (copy_siginfo_to_user(>info, >info))
>> +goto badframe;
>> +
>>  /* Make sure signal handler doesn't get spurious FP exceptions 
>> */
>>  tsk->thread.fp_state.fpscr = 0;
>>
>> @@ -915,11 +919,6 @@ int handle_rt_signal64(struct ksignal *ksig, 
>> sigset_t *set,
>>  regs->nip = (unsigned long) >tramp[0];
>>  }
>>
>> -
>> -/* Save the siginfo outside of the unsafe block. */
>> -if (copy_siginfo_to_user(>info, >info))
>> -goto badframe;
>> -
>>  /* Allocate a dummy caller frame for the signal handler. */
>>  newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
>>  err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);

 Does the same reasoning apply to this one and the ELF V1 function
 descriptor thing? It seems like you could move all of that block
 up instead. With your other SA_SIGINFO get_user patch, there would
 then be no possibility of error after you start modifying regs.

>>>
>>> To move the above in the user access block, we need to open a larger 
>>> window. At the time being the
>>> window opened only contains the 'frame'. 'newsp' points before the 'frame'.
>>>
>> 
>> Only by 64/128 bytes though. Is that a problem? Not for 64s. Could it
>> cause more overhead than it saves on other platforms?
> 
> No it is not a problem at all, just need to not be forgotten, on ppc64 it may 
> go unnoticed, on 32s 
> it will blew up if we forget to enlarge the access window and the access 
> involves a different 256M 
> segment (Very unlikely for sure but ...)

Okay, and it's a good point. Would be nice if there was some sanitizer 
that could check this to byte 

Re: [PATCH] powerpc/signal64: Copy siginfo before changing regs->nip

2021-06-14 Thread Michael Ellerman
Nicholas Piggin  writes:
> Excerpts from Nicholas Piggin's message of June 14, 2021 10:47 am:
>> Excerpts from Michael Ellerman's message of June 8, 2021 11:46 pm:
>>> In commit 96d7a4e06fab ("powerpc/signal64: Rewrite handle_rt_signal64()
>>> to minimise uaccess switches") the 64-bit signal code was rearranged to
>>> use user_write_access_begin/end().
>>> 
>>> As part of that change the call to copy_siginfo_to_user() was moved
>>> later in the function, so that it could be done after the
>>> user_write_access_end().
>>> 
>>> In particular it was moved after we modify regs->nip to point to the
>>> signal trampoline. That means if copy_siginfo_to_user() fails we exit
>>> handle_rt_signal64() with an error but with regs->nip modified, whereas
>>> previously we would not modify regs->nip until the copy succeeded.
>>> 
>>> Returning an error from signal delivery but with regs->nip updated
>>> leaves the process in a sort of half-delivered state. We do immediately
>>> force a SEGV in signal_setup_done(), called from do_signal(), so the
>>> process should never run in the half-delivered state.
>>> 
>>> However that SEGV is not delivered until we've gone around to
>>> do_notify_resume() again, so it's possible some tracing could observe
>>> the half-delivered state.
>>> 
>>> There are other cases where we fail signal delivery with regs partly
>>> updated, eg. the write to newsp and SA_SIGINFO, but the latter at least
>>> is very unlikely to fail as it reads back from the frame we just wrote
>>> to.
>>> 
>>> Looking at other arches they seem to be more careful about leaving regs
>>> unchanged until the copy operations have succeeded, and in general that
>>> seems like good hygenie.
>>> 
>>> So although the current behaviour is not cleary buggy, it's also not
>>> clearly correct. So move the call to copy_siginfo_to_user() up prior to
>>> the modification of regs->nip, which is closer to the old behaviour, and
>>> easier to reason about.
>> 
>> Good catch, should it still have a Fixes: tag though? Even if it's not
>> clearly buggy we want it to be patched.
>
> Also...
>
>>> 
>>> Signed-off-by: Michael Ellerman 
>>> ---
>>>  arch/powerpc/kernel/signal_64.c | 9 -
>>>  1 file changed, 4 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/kernel/signal_64.c 
>>> b/arch/powerpc/kernel/signal_64.c
>>> index dca66481d0c2..f9e1f5428b9e 100644
>>> --- a/arch/powerpc/kernel/signal_64.c
>>> +++ b/arch/powerpc/kernel/signal_64.c
>>> @@ -902,6 +902,10 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t 
>>> *set,
>>> unsafe_copy_to_user(>uc.uc_sigmask, set, sizeof(*set), 
>>> badframe_block);
>>> user_write_access_end();
>>>  
>>> +   /* Save the siginfo outside of the unsafe block. */
>>> +   if (copy_siginfo_to_user(>info, >info))
>>> +   goto badframe;
>>> +
>>> /* Make sure signal handler doesn't get spurious FP exceptions */
>>> tsk->thread.fp_state.fpscr = 0;
>>>  
>>> @@ -915,11 +919,6 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t 
>>> *set,
>>> regs->nip = (unsigned long) >tramp[0];
>>> }
>>>  
>>> -
>>> -   /* Save the siginfo outside of the unsafe block. */
>>> -   if (copy_siginfo_to_user(>info, >info))
>>> -   goto badframe;
>>> -
>>> /* Allocate a dummy caller frame for the signal handler. */
>>> newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
>>> err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);
>
> Does the same reasoning apply to this one and the ELF V1 function
> descriptor thing? It seems like you could move all of that block
> up instead. With your other SA_SIGINFO get_user patch, there would
> then be no possibility of error after you start modifying regs.

Yeah I think we should rework it further and eventually get to the point
were we leave regs untouched until we're guaranteed to return success.

It will need a bit more work though because of copy_siginfo_to_user().

cheers


Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks

2021-06-14 Thread Nicholas Piggin
Excerpts from Jessica Yu's message of June 14, 2021 10:06 pm:
> +++ Nicholas Piggin [11/06/21 19:39 +1000]:
>>The elf_check_arch() function is used to test usermode binaries, but
>>kernel modules may have more specific requirements. powerpc would like
>>to test for ABI version compatibility.
>>
>>Add an arch-overridable function elf_check_module_arch() that defaults
>>to elf_check_arch() and use it in elf_validity_check().
>>
>>Signed-off-by: Michael Ellerman 
>>[np: split patch, added changelog]
>>Signed-off-by: Nicholas Piggin 
>>---
>> include/linux/moduleloader.h | 5 +
>> kernel/module.c  | 2 +-
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
>>index 9e09d11ffe5b..fdc042a84562 100644
>>--- a/include/linux/moduleloader.h
>>+++ b/include/linux/moduleloader.h
>>@@ -13,6 +13,11 @@
>>  * must be implemented by each architecture.
>>  */
>>
>>+// Allow arch to optionally do additional checking of module ELF header
>>+#ifndef elf_check_module_arch
>>+#define elf_check_module_arch elf_check_arch
>>+#endif
> 
> Hi Nicholas,
> 
> Why not make elf_check_module_arch() consistent with the other
> arch-specific functions? Please see module_frob_arch_sections(),
> module_{init,exit}_section(), etc in moduleloader.h. That is, they are
> all __weak functions that are overridable by arches. We can maybe make
> elf_check_module_arch() a weak symbol, available for arches to
> override if they want to perform additional elf checks. Then we don't
> have to have this one-off #define.


Like this? I like it. Good idea.

Thanks,
Nick

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..7b4587a19189 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -13,6 +13,9 @@
  * must be implemented by each architecture.
  */
 
+/* arch may override to do additional checking of ELF header architecture */
+bool module_elf_check_arch(Elf_Ehdr *hdr);
+
 /* Adjust arch-specific sections.  Return 0 on success.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
  Elf_Shdr *sechdrs,
diff --git a/kernel/module.c b/kernel/module.c
index 7e78dfabca97..8b31c0b7c2a0 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3459,6 +3459,11 @@ static void flush_module_icache(const struct module *mod)
   (unsigned long)mod->core_layout.base + 
mod->core_layout.size);
 }
 
+bool __weak module_elf_check_arch(Elf_Ehdr *hdr)
+{
+   return elf_check_arch(hdr);
+}
+
 int __weak module_frob_arch_sections(Elf_Ehdr *hdr,
 Elf_Shdr *sechdrs,
 char *secstrings,


Re: [PATCH] powerpc/signal64: Copy siginfo before changing regs->nip

2021-06-14 Thread Michael Ellerman
Nicholas Piggin  writes:
> Excerpts from Michael Ellerman's message of June 8, 2021 11:46 pm:
>> In commit 96d7a4e06fab ("powerpc/signal64: Rewrite handle_rt_signal64()
>> to minimise uaccess switches") the 64-bit signal code was rearranged to
>> use user_write_access_begin/end().
>> 
>> As part of that change the call to copy_siginfo_to_user() was moved
>> later in the function, so that it could be done after the
>> user_write_access_end().
>> 
>> In particular it was moved after we modify regs->nip to point to the
>> signal trampoline. That means if copy_siginfo_to_user() fails we exit
>> handle_rt_signal64() with an error but with regs->nip modified, whereas
>> previously we would not modify regs->nip until the copy succeeded.
>> 
>> Returning an error from signal delivery but with regs->nip updated
>> leaves the process in a sort of half-delivered state. We do immediately
>> force a SEGV in signal_setup_done(), called from do_signal(), so the
>> process should never run in the half-delivered state.
>> 
>> However that SEGV is not delivered until we've gone around to
>> do_notify_resume() again, so it's possible some tracing could observe
>> the half-delivered state.
>> 
>> There are other cases where we fail signal delivery with regs partly
>> updated, eg. the write to newsp and SA_SIGINFO, but the latter at least
>> is very unlikely to fail as it reads back from the frame we just wrote
>> to.
>> 
>> Looking at other arches they seem to be more careful about leaving regs
>> unchanged until the copy operations have succeeded, and in general that
>> seems like good hygenie.
>> 
>> So although the current behaviour is not cleary buggy, it's also not
>> clearly correct. So move the call to copy_siginfo_to_user() up prior to
>> the modification of regs->nip, which is closer to the old behaviour, and
>> easier to reason about.
>
> Good catch, should it still have a Fixes: tag though? Even if it's not
> clearly buggy we want it to be patched.

Yeah I'll add one.

cheers


Re: [PATCH v2 08/12] powerpc: Don't use 'struct ppc_inst' to reference instruction location

2021-06-14 Thread Michael Ellerman
Christophe Leroy  writes:
> diff --git a/arch/powerpc/include/asm/inst.h b/arch/powerpc/include/asm/inst.h
> index 5a0740ebf132..32d318c3b180 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -139,7 +139,7 @@ static inline int ppc_inst_len(struct ppc_inst x)
>   * Return the address of the next instruction, if the instruction @value was
>   * located at @location.
>   */
> -static inline struct ppc_inst *ppc_inst_next(void *location, struct ppc_inst 
> *value)
> +static inline unsigned int *ppc_inst_next(unsigned int *location, unsigned 
> int *value)
>  {
>   struct ppc_inst tmp;
>  

It's not visible in the diff, but the rest of the function is:

tmp = ppc_inst_read(value);

return location + ppc_inst_len(tmp);
}

And so changing the type of location from void * to int * changes the
result of that addition, ie. previously it was in units of bytes, now
it's units of 4 bytes.

To fix it I've kept location as unsigned int *, and added a cast where
we do the addition. That way users of the function just see unsigned int *,
the cast to void * is an implementation detail.

We only have a handful of uses of ppc_inst_len(), so maybe that should
change name and return a result in units of int *. But that can be a
separate change.

> diff --git a/arch/powerpc/platforms/86xx/mpc86xx_smp.c 
> b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
> index 87f524e4b09c..302f2a1e0361 100644
> --- a/arch/powerpc/platforms/86xx/mpc86xx_smp.c
> +++ b/arch/powerpc/platforms/86xx/mpc86xx_smp.c
> @@ -83,7 +83,7 @@ smp_86xx_kick_cpu(int nr)
>   mdelay(1);
>  
>   /* Restore the exception vector */
> - patch_instruction((struct ppc_inst *)vector, ppc_inst(save_vector));
> + patch_instruction(vector, ppc_inst(save_vector));
>  
>   local_irq_restore(flags);
>  

There was another usage in here:

/* Setup fake reset vector to call __secondary_start_mpc86xx. */
target = (unsigned long) __secondary_start_mpc86xx;
-   patch_branch((struct ppc_inst *)vector, target, BRANCH_SET_LINK);
+   patch_branch(vector, target, BRANCH_SET_LINK);
 
/* Kick that CPU */
smp_86xx_release_core(nr);

I fixed it up.

cheers


Re: [PATCH -next] powerpc/spider-pci: Remove set but not used variable 'val'

2021-06-14 Thread libaokun (A)

ping

在 2021/6/1 16:53, Baokun Li 写道:

Fixes gcc '-Wunused-but-set-variable' warning:

arch/powerpc/platforms/cell/spider-pci.c: In function 'spiderpci_io_flush':
arch/powerpc/platforms/cell/spider-pci.c:28:6: warning:
variable ‘val’ set but not used [-Wunused-but-set-variable]

It never used since introduction.

Signed-off-by: Baokun Li 
---
  arch/powerpc/platforms/cell/spider-pci.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spider-pci.c 
b/arch/powerpc/platforms/cell/spider-pci.c
index 93ea41680f54..a1c293f42a1f 100644
--- a/arch/powerpc/platforms/cell/spider-pci.c
+++ b/arch/powerpc/platforms/cell/spider-pci.c
@@ -25,10 +25,9 @@ struct spiderpci_iowa_private {
  static void spiderpci_io_flush(struct iowa_bus *bus)
  {
struct spiderpci_iowa_private *priv;
-   u32 val;
  
  	priv = bus->private;

-   val = in_be32(priv->regs + SPIDER_PCI_DUMMY_READ);
+   in_be32(priv->regs + SPIDER_PCI_DUMMY_READ);
iosync();
  }
  


Re: [PATCH] usb: gadget: fsl: properly remove remnant of MXC support

2021-06-14 Thread Joel Stanley
On Sat, 12 Jun 2021 at 00:31, Li Yang  wrote:
>
> Commit a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver")
> didn't remove all the MXC related stuff which can cause build problem
> for LS1021 when enabled again in Kconfig.  This patch remove all the
> remnants.
>
> Signed-off-by: Li Yang 

Reviewed-by: Joel Stanley 

Will you re-submit the kconfig change once this is merged?

Cheers,

Joel


Re: [PATCH v1 10/12] powerpc/lib/feature-fixups: Use PPC_RAW_xxx() macros

2021-06-14 Thread Michael Ellerman
Christophe Leroy  writes:
> diff --git a/arch/powerpc/lib/feature-fixups.c 
> b/arch/powerpc/lib/feature-fixups.c
> index fe26f2fa0f3f..f0fc521b82ae 100644
> --- a/arch/powerpc/lib/feature-fixups.c
> +++ b/arch/powerpc/lib/feature-fixups.c
> @@ -180,32 +180,31 @@ static void do_stf_exit_barrier_fixups(enum 
> stf_barrier_type types)
>   start = PTRRELOC(&__start___stf_exit_barrier_fixup);
>   end = PTRRELOC(&__stop___stf_exit_barrier_fixup);
>  
> - instrs[0] = 0x6000; /* nop */
> - instrs[1] = 0x6000; /* nop */
> - instrs[2] = 0x6000; /* nop */
> - instrs[3] = 0x6000; /* nop */
> - instrs[4] = 0x6000; /* nop */
> - instrs[5] = 0x6000; /* nop */
> + instrs[0] = PPC_RAW_NOP();
> + instrs[1] = PPC_RAW_NOP();
> + instrs[2] = PPC_RAW_NOP();
> + instrs[3] = PPC_RAW_NOP();
> + instrs[4] = PPC_RAW_NOP();
> + instrs[5] = PPC_RAW_NOP();
>  
>   i = 0;
>   if (types & STF_BARRIER_FALLBACK || types & STF_BARRIER_SYNC_ORI) {
>   if (cpu_has_feature(CPU_FTR_HVMODE)) {
> - instrs[i++] = 0x7db14ba6; /* mtspr 0x131, r13 (HSPRG1) 
> */
> - instrs[i++] = 0x7db04aa6; /* mfspr r13, 0x130 (HSPRG0) 
> */
> + instrs[i++] = PPC_RAW_MTSPR(SPRN_HSPRG1, _R13);
> + instrs[i++] = PPC_RAW_MFSPR(_R13, SPRN_HSPRG0);
>   } else {
> - instrs[i++] = 0x7db243a6; /* mtsprg 2,r13   */
> - instrs[i++] = 0x7db142a6; /* mfsprg r13,1*/
> + instrs[i++] = PPC_RAW_MTSPR(SPRN_SPRG2, _R13);
> + instrs[i++] = PPC_RAW_MFSPR(_R13, SPRN_SPRG1);
>   }
> - instrs[i++] = 0x7c0004ac; /* hwsync */
> - instrs[i++] = 0xe9ad; /* ld r13,0(r13)  */
...
> + instrs[i++] = PPC_RAW_LD(_R10, _R13, 0);

This conversion was wrong, r13 became r10.

I fixed it up.

cheers


[PATCH v12 6/6] [RFC] powerpc: Book3S 64-bit outline-only KASAN support

2021-06-14 Thread Daniel Axtens
[I'm hoping to get this in a subsequent merge window after we get the core
changes in. I know there are still a few outstanding review comments, I just
wanted to make sure that I supplied a real use-case for the core changes I'm
proposing.]

Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.

 - Enable the compiler instrumentation to check addresses and maintain the
   shadow region. (This is the guts of KASAN which we can easily reuse.)

 - Require kasan-vmalloc support to handle modules and anything else in
   vmalloc space.

 - KASAN needs to be able to validate all pointer accesses, but we can't
   instrument all kernel addresses - only linear map and vmalloc. On boot,
   set up a single page of read-only shadow that marks all iomap and
   vmemmap accesses as valid.

 - Document KASAN in both generic and powerpc docs.

Background
--

KASAN support on Book3S is a bit tricky to get right:

 - It would be good to support inline instrumentation so as to be able to
   catch stack issues that cannot be caught with outline mode.

 - Inline instrumentation requires a fixed offset.

 - Book3S runs code with translations off ("real mode") during boot,
   including a lot of generic device-tree parsing code which is used to
   determine MMU features.

[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008 This is a one-to-one mapping with physical
memory from ... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]

 - Some code - most notably a lot of KVM code - also runs with translations
   off after boot.

 - Therefore any offset has to point to memory that is valid with
   translations on or off.

One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.

Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.

Cc: Aneesh Kumar K.V  # ppc64 hash version
Cc: Christophe Leroy  # ppc32 version
Originally-by: Balbir Singh  # ppc64 out-of-line radix 
version
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst| 11 +--
 Documentation/powerpc/kasan.txt  | 48 +-
 arch/powerpc/Kconfig |  4 +-
 arch/powerpc/Kconfig.debug   |  3 +-
 arch/powerpc/include/asm/book3s/64/hash.h|  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h |  4 +
 arch/powerpc/include/asm/book3s/64/radix.h   | 13 ++-
 arch/powerpc/include/asm/kasan.h | 22 +
 arch/powerpc/kernel/Makefile | 11 +++
 arch/powerpc/kernel/process.c| 16 ++--
 arch/powerpc/kvm/Makefile|  5 ++
 arch/powerpc/mm/book3s64/Makefile|  9 ++
 arch/powerpc/mm/kasan/Makefile   |  1 +
 arch/powerpc/mm/kasan/init_book3s_64.c   | 95 
 arch/powerpc/mm/ptdump/ptdump.c  | 20 -
 arch/powerpc/platforms/Kconfig.cputype   |  1 +
 arch/powerpc/platforms/powernv/Makefile  |  6 ++
 arch/powerpc/platforms/pseries/Makefile  |  3 +
 18 files changed, 257 insertions(+), 19 deletions(-)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 05d2d428a332..f8d6048db1bb 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,8 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures. It is also supported on 32-bit powerpc kernels.
-Tag-based KASAN modes are supported only for arm64.
+and riscv architectures. It is also supported on powerpc for 32-bit kernels and
+for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
+supported only for arm64.
 
 Usage
 -
@@ -344,10 +345,10 @@ CONFIG_KASAN_VMALLOC
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
 cost of greater memory usage. Currently, this is supported on x86,

[PATCH v12 5/6] powerpc/mm/kasan: rename kasan_init_32.c to init_32.c

2021-06-14 Thread Daniel Axtens
kasan is already implied by the directory name, we don't need to
repeat it.

Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 arch/powerpc/mm/kasan/Makefile   | 2 +-
 arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)

diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
 
 KASAN_SANITIZE := n
 
-obj-$(CONFIG_PPC32)   += kasan_init_32.o
+obj-$(CONFIG_PPC32)   += init_32.o
 obj-$(CONFIG_PPC_8xx)  += 8xx.o
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
-- 
2.27.0



[PATCH v12 4/6] kasan: Document support on 32-bit powerpc

2021-06-14 Thread Daniel Axtens
KASAN is supported on 32-bit powerpc and the docs should reflect this.

Suggested-by: Christophe Leroy 
Reviewed-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 
---
 Documentation/dev-tools/kasan.rst |  8 ++--
 Documentation/powerpc/kasan.txt   | 12 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index 83ec4a556c19..05d2d428a332 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,7 +36,8 @@ Both software KASAN modes work with SLUB and SLAB memory 
allocators,
 while the hardware tag-based KASAN currently only supports SLUB.
 
 Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures, and tag-based KASAN modes are supported only for 
arm64.
+and riscv architectures. It is also supported on 32-bit powerpc kernels.
+Tag-based KASAN modes are supported only for arm64.
 
 Usage
 -
@@ -343,7 +344,10 @@ CONFIG_KASAN_VMALLOC
 
 With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
 cost of greater memory usage. Currently, this is supported on x86,
-riscv, s390, and powerpc.
+riscv, s390, and 32-bit powerpc.
+
+It is optional, except on 32-bit powerpc kernels with module support,
+where it is required.
 
 This works by hooking into vmalloc and vmap and dynamically
 allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index ..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
-- 
2.27.0



[PATCH v12 3/6] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-06-14 Thread Daniel Axtens
powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..fd65f477ac92 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.27.0



[PATCH v12 2/6] kasan: allow architectures to provide an outline readiness check

2021-06-14 Thread Daniel Axtens
Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 mm/kasan/common.c  | 4 
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 4 
 mm/kasan/shadow.c  | 4 
 4 files changed, 15 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   /* Bail if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..19323a3d5975 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..74134b657d7d 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
-- 
2.27.0



[PATCH v12 1/6] kasan: allow an architecture to disable inline instrumentation

2021-06-14 Thread Daniel Axtens
For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..935814f332a7 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+# Sometimes an architecture might not be able to support inline instrumentation
+# but might be able to support outline instrumentation. This option allows an 
+# arch to prevent inline and stack instrumentation from being enabled.
+# ppc64 turns on virtual memory late in boot, after calling into generic code
+# like the device-tree parser, so it uses this in conjuntion with a hook in
+# outline mode to avoid invalid access early in boot.
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.27.0



[PATCH v12 0/6] KASAN core changes for ppc64 radix KASAN

2021-06-14 Thread Daniel Axtens
Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.

I've been trying this for a while, but we keep having collisions
between the kasan code in the mm tree and the code I want to put in to
the ppc tree. So my aim here is for patches 1 through 4 or 1 through 5
to go in via the mm tree. I will then propose the powerpc changes for
a later cycle. (I have attached them to this series as an RFC, and
there are still outstanding review comments I need to attend to.)

v12 applies to next-20210611. There should be no noticable changes to
other platforms.

Kind regards,
Daniel

Daniel Axtens (6):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables
  kasan: Document support on 32-bit powerpc
  powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
  [RFC] powerpc: Book3S 64-bit outline-only KASAN support

 Documentation/dev-tools/kasan.rst |  7 +-
 Documentation/powerpc/kasan.txt   | 58 +++
 arch/powerpc/Kconfig  |  4 +-
 arch/powerpc/Kconfig.debug|  3 +-
 arch/powerpc/include/asm/book3s/64/hash.h |  4 +
 arch/powerpc/include/asm/book3s/64/pgtable.h  |  4 +
 arch/powerpc/include/asm/book3s/64/radix.h| 13 ++-
 arch/powerpc/include/asm/kasan.h  | 22 +
 arch/powerpc/kernel/Makefile  | 11 +++
 arch/powerpc/kernel/process.c | 16 ++--
 arch/powerpc/kvm/Makefile |  5 +
 arch/powerpc/mm/book3s64/Makefile |  9 ++
 arch/powerpc/mm/kasan/Makefile|  3 +-
 .../mm/kasan/{kasan_init_32.c => init_32.c}   |  0
 arch/powerpc/mm/kasan/init_book3s_64.c| 95 +++
 arch/powerpc/mm/ptdump/ptdump.c   | 20 +++-
 arch/powerpc/platforms/Kconfig.cputype|  1 +
 arch/powerpc/platforms/powernv/Makefile   |  6 ++
 arch/powerpc/platforms/pseries/Makefile   |  3 +
 include/linux/kasan.h | 18 +++-
 lib/Kconfig.kasan | 14 +++
 mm/kasan/common.c |  4 +
 mm/kasan/generic.c|  3 +
 mm/kasan/init.c   |  6 +-
 mm/kasan/kasan.h  |  4 +
 mm/kasan/shadow.c |  4 +
 26 files changed, 316 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/powerpc/kasan.txt
 rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
 create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c

-- 
2.27.0



Re: [RFC PATCH 0/8] Add support for FORM2 associativity

2021-06-14 Thread Daniel Henrique Barboza




On 6/14/21 1:39 PM, Aneesh Kumar K.V wrote:

Form2 associativity adds a much more flexible NUMA topology layout
than what is provided by Form1. This also allows PAPR SCM device
to use better associativity when using the device as DAX KMEM
device. More details can be found in patch x

$ ndctl list -N -v
[
   {
 "dev":"namespace0.0",
 "mode":"devdax",
 "map":"dev",
 "size":1071644672,
 "uuid":"37dea198-ddb5-4e42-915a-99a915e24188",
 "raw_uuid":"148deeaa-4a2f-41d1-8d74-fd9a942d26ba",
 "daxregion":{
   "id":0,
   "size":1071644672,
   "devices":[
 {
   "chardev":"dax0.0",
   "size":1071644672,
   "target_node":4,
   "mode":"devdax"
 }
   ]
 },
 "align":2097152,
 "numa_node":1
   }
]

$ numactl -H
...
node distances:
node   0   1   2   3
   0:  10  11  222  33
   1:  44  10  55  66
   2:  77  88  10  99
   3:  101  121  132  10
$

After DAX KMEM
# numactl -H
available: 5 nodes (0-4)
...
node distances:
node   0   1   2   3   4
   0:  10  11  22  33  255
   1:  44  10  55  66  255
   2:  77  88  10  99  255
   3:  101  121  132  10  255
   4:  255  255  255  255  10
#

The above output is with a Qemu command line



For reference, this QEMU:


https://github.com/danielhb/qemu/tree/form2_affinity_v1

https://lists.gnu.org/archive/html/qemu-devel/2021-06/msg03617.html


but ...



-numa node,nodeid=4 \
-numa dist,src=0,dst=1,val=11 -numa dist,src=0,dst=2,val=22 -numa 
dist,src=0,dst=3,val=33 -numa dist,src=0,dst=4,val=255 \
-numa dist,src=1,dst=0,val=44 -numa dist,src=1,dst=2,val=55 -numa 
dist,src=1,dst=3,val=66 -numa dist,src=1,dst=4,val=255 \
-numa dist,src=2,dst=0,val=77 -numa dist,src=2,dst=1,val=88 -numa 
dist,src=2,dst=3,val=99 -numa dist,src=2,dst=4,val=255 \
-numa dist,src=3,dst=0,val=101 -numa dist,src=3,dst=1,val=121 -numa 
dist,src=3,dst=2,val=132 -numa dist,src=3,dst=4,val=255 \
-numa dist,src=4,dst=0,val=255 -numa dist,src=4,dst=1,val=255 -numa 
dist,src=4,dst=2,val=255 -numa dist,src=4,dst=3,val=255 \
-object 
memory-backend-file,id=memnvdimm1,prealloc=yes,mem-path=$PMEM_DISK,share=yes,size=${PMEM_SIZE}
  \
-device 
nvdimm,label-size=128K,memdev=memnvdimm1,id=nvdimm1,slot=4,uuid=72511b67-0b3b-42fd-8d1d-5be3cae8bcaa,node=4,persistent-nodeid=1



with 'device-node=1' instead of 'persistent=nodeid=1' in the nvdimm parameter
up here.






Aneesh Kumar K.V (8):
   powerpc/pseries: rename min_common_depth to primary_domain_index
   powerpc/pseries: rename distance_ref_points_depth to max_domain_index
   powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
   powerpc/pseries: Consolidate DLPAR NUMA distance update
   powerpc/pseries: Consolidate NUMA distance update during boot
   powerpc/pseries: Add a helper for form1 cpu distance
   powerpc/pseries: Add support for FORM2 associativity
   powerpc/papr_scm: Use FORM2 associativity details



Series:


Tested-by: Daniel Henrique Barboza 





  Documentation/powerpc/associativity.rst   | 139 ++
  arch/powerpc/include/asm/firmware.h   |   7 +-
  arch/powerpc/include/asm/prom.h   |   3 +-
  arch/powerpc/kernel/prom_init.c   |   3 +-
  arch/powerpc/mm/numa.c| 436 ++
  arch/powerpc/platforms/pseries/firmware.c |   3 +-
  arch/powerpc/platforms/pseries/hotplug-cpu.c  |   2 +
  .../platforms/pseries/hotplug-memory.c|   2 +
  arch/powerpc/platforms/pseries/papr_scm.c |  26 +-
  arch/powerpc/platforms/pseries/pseries.h  |   2 +
  10 files changed, 522 insertions(+), 101 deletions(-)
  create mode 100644 Documentation/powerpc/associativity.rst



Re: [PATCH 07/11] powerpc: Add support for microwatt's hardware random number generator

2021-06-14 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 15, 2021 9:02 am:
> This is accessed using the DARN instruction and should probably be
> done more generically.
> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/include/asm/archrandom.h | 12 +-
>  arch/powerpc/platforms/microwatt/Kconfig  |  1 +
>  arch/powerpc/platforms/microwatt/Makefile |  2 +-
>  arch/powerpc/platforms/microwatt/rng.c| 48 +++
>  4 files changed, 61 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/microwatt/rng.c
> 
> diff --git a/arch/powerpc/include/asm/archrandom.h 
> b/arch/powerpc/include/asm/archrandom.h
> index 9a53e29680f4..e8ae0f7740f9 100644
> --- a/arch/powerpc/include/asm/archrandom.h
> +++ b/arch/powerpc/include/asm/archrandom.h
> @@ -8,12 +8,22 @@
>  
>  static inline bool __must_check arch_get_random_long(unsigned long *v)
>  {
> + if (ppc_md.get_random_seed)
> + return ppc_md.get_random_seed(v);
> +
>   return false;
>  }
>  
>  static inline bool __must_check arch_get_random_int(unsigned int *v)
>  {
> - return false;
> + unsigned long val;
> + bool rc;
> +
> + rc = arch_get_random_long();
> + if (rc)
> + *v = val;
> +
> + return rc;
>  }
>  

I would be happier if you didn't change this (or at least put it in its 
own patch explaining why it's not going to slow down other platforms).

I'm assuming the main problem you have is seeding the rngs at boot? It
should be enough to have ppc_md.get_random_seed for that.

(BTW I wonder should lib/random32.c be changed to call 
arch_get_random_seed_long() for seeding)


>  static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
> diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
> b/arch/powerpc/platforms/microwatt/Kconfig
> index 50ed0cedb5f1..8f6a81978461 100644
> --- a/arch/powerpc/platforms/microwatt/Kconfig
> +++ b/arch/powerpc/platforms/microwatt/Kconfig
> @@ -7,6 +7,7 @@ config PPC_MICROWATT
>   select PPC_ICP_NATIVE
>   select PPC_NATIVE
>   select PPC_UDBG_16550
> + select ARCH_RANDOM
>   help
>This option enables support for FPGA-based Microwatt 
> implementations.
>  
> diff --git a/arch/powerpc/platforms/microwatt/Makefile 
> b/arch/powerpc/platforms/microwatt/Makefile
> index e6885b3b2ee7..116d6d3ad3f0 100644
> --- a/arch/powerpc/platforms/microwatt/Makefile
> +++ b/arch/powerpc/platforms/microwatt/Makefile
> @@ -1 +1 @@
> -obj-y+= setup.o
> +obj-y+= setup.o rng.o
> diff --git a/arch/powerpc/platforms/microwatt/rng.c 
> b/arch/powerpc/platforms/microwatt/rng.c
> new file mode 100644
> index ..3d8ee6eb7dad
> --- /dev/null
> +++ b/arch/powerpc/platforms/microwatt/rng.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Derived from arch/powerpc/platforms/powernv/rng.c, which is:
> + * Copyright 2013, Michael Ellerman, IBM Corporation.
> + */
> +
> +#define pr_fmt(fmt)  "microwatt-rng: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DARN_ERR 0xul
> +
> +int microwatt_get_random_darn(unsigned long *v)
> +{
> + unsigned long val;
> +
> + /* Using DARN with L=1 - 64-bit conditioned random number */
> + asm volatile(PPC_DARN(%0, 1) : "=r"(val));
> +
> + if (val == DARN_ERR)
> + return 0;
> +
> + *v = val;
> +
> + return 1;
> +}
> +
> +static __init int rng_init(void)
> +{
> + unsigned long val;
> + int i;
> +
> + for (i = 0; i < 10; i++) {
> + if (microwatt_get_random_darn()) {
> + ppc_md.get_random_seed = microwatt_get_random_darn;
> + return 0;
> + }
> + }
> +
> + pr_warn("Unable to use DARN for get_random_seed()\n");
> +
> + return -EIO;
> +}
> +machine_subsys_initcall(, rng_init);
> -- 
> 2.31.1
> 
> 


[PATCH -next v2 9/9] ASoC: fsl_xcvr: check return value after calling platform_get_resource_byname()

2021-06-14 Thread Yang Yingliang
It will cause null-ptr-deref if platform_get_resource_byname() returns NULL,
we need check the return value.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_xcvr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index df7c189d97dd..1330e190e1ff 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -1202,6 +1202,10 @@ static int fsl_xcvr_probe(struct platform_device *pdev)
 
rx_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "rxfifo");
tx_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "txfifo");
+   if (!rx_res || !tx_res) {
+   dev_err(dev, "could not find rxfifo or txfifo resource\n");
+   return -EINVAL;
+   }
xcvr->dma_prms_rx.chan_name = "rx";
xcvr->dma_prms_tx.chan_name = "tx";
xcvr->dma_prms_rx.addr = rx_res->start;
-- 
2.25.1



[PATCH -next v2 6/9] ASoC: fsl_sai: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_sai.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 407a45e48eee..223fcd15bfcc 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -1017,8 +1017,7 @@ static int fsl_sai_probe(struct platform_device *pdev)
 
sai->is_lsb_first = of_property_read_bool(np, "lsb-first");
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   base = devm_ioremap_resource(>dev, res);
+   base = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(base))
return PTR_ERR(base);
 
-- 
2.25.1



[PATCH -next v2 2/9] ASoC: fsl_aud2htx: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_aud2htx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_aud2htx.c b/sound/soc/fsl/fsl_aud2htx.c
index a328697511f7..99ab7f0241cf 100644
--- a/sound/soc/fsl/fsl_aud2htx.c
+++ b/sound/soc/fsl/fsl_aud2htx.c
@@ -196,8 +196,7 @@ static int fsl_aud2htx_probe(struct platform_device *pdev)
 
aud2htx->pdev = pdev;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next v2 3/9] ASoC: fsl_easrc: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_easrc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index b1765c7d3bcd..19c3c3b5939e 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -1887,8 +1887,7 @@ static int fsl_easrc_probe(struct platform_device *pdev)
easrc->private = easrc_priv;
np = dev->of_node;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next v2 0/9] ASoC: fsl: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
patch #1 ~ #8:
  Use devm_platform_get_and_ioremap_resource()

patch #9
  check return value of platform_get_resource_byname()

v2:
  change error message in patch #9

Yang Yingliang (9):
  ASoC: fsl_asrc: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_aud2htx: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_easrc: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_esai: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_micfil: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_sai: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_spdif: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_ssi: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_xcvr: check return value after calling
platform_get_resource_byname()

 sound/soc/fsl/fsl_asrc.c| 3 +--
 sound/soc/fsl/fsl_aud2htx.c | 3 +--
 sound/soc/fsl/fsl_easrc.c   | 3 +--
 sound/soc/fsl/fsl_esai.c| 3 +--
 sound/soc/fsl/fsl_micfil.c  | 3 +--
 sound/soc/fsl/fsl_sai.c | 3 +--
 sound/soc/fsl/fsl_spdif.c   | 3 +--
 sound/soc/fsl/fsl_ssi.c | 3 +--
 sound/soc/fsl/fsl_xcvr.c| 4 
 9 files changed, 12 insertions(+), 16 deletions(-)

-- 
2.25.1



[PATCH -next v2 4/9] ASoC: fsl_esai: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_esai.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index f356ae5925af..a961f837cd09 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -969,8 +969,7 @@ static int fsl_esai_probe(struct platform_device *pdev)
esai_priv->soc = of_device_get_match_data(>dev);
 
/* Get the addresses and IRQ */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next v2 5/9] ASoC: fsl_micfil: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_micfil.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c
index 3cf789ed6cbe..8c0c75ce9490 100644
--- a/sound/soc/fsl/fsl_micfil.c
+++ b/sound/soc/fsl/fsl_micfil.c
@@ -669,8 +669,7 @@ static int fsl_micfil_probe(struct platform_device *pdev)
}
 
/* init regmap */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next v2 7/9] ASoC: fsl_spdif: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_spdif.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
index 2a76714eb8e6..d812a3ff5845 100644
--- a/sound/soc/fsl/fsl_spdif.c
+++ b/sound/soc/fsl/fsl_spdif.c
@@ -1355,8 +1355,7 @@ static int fsl_spdif_probe(struct platform_device *pdev)
spdif_priv->soc->tx_formats;
 
/* Get the addresses and IRQ */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next v2 1/9] ASoC: fsl_asrc: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_asrc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 0e1ad8efebd3..24b41881a68f 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -1035,8 +1035,7 @@ static int fsl_asrc_probe(struct platform_device *pdev)
asrc->private = asrc_priv;
 
/* Get the addresses and IRQ */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next v2 8/9] ASoC: fsl_ssi: Use devm_platform_get_and_ioremap_resource()

2021-06-14 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_ssi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 2b57b60431bb..ecbc1c365d5b 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1503,8 +1503,7 @@ static int fsl_ssi_probe(struct platform_device *pdev)
}
ssi->cpu_dai_drv.name = dev_name(dev);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   iomem = devm_ioremap_resource(dev, res);
+   iomem = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(iomem))
return PTR_ERR(iomem);
ssi->ssi_phys = res->start;
-- 
2.25.1



Re: [PATCH 11/11] powerpc/microwatt: Disable interrupts in boot wrapper main program

2021-06-14 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 15, 2021 9:05 am:
> This ensures that we don't get a decrementer interrupt arriving before
> we have set up a handler for it.

Would this be better off merged in the previous patch (maybe with 
comment)? Why don't other platform_init()s seem to require this?

Thanks,
Nick

> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/boot/microwatt.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/powerpc/boot/microwatt.c b/arch/powerpc/boot/microwatt.c
> index ac922dd0aa4d..86a07bceaadf 100644
> --- a/arch/powerpc/boot/microwatt.c
> +++ b/arch/powerpc/boot/microwatt.c
> @@ -12,6 +12,7 @@ void platform_init(unsigned long r3, unsigned long r4, 
> unsigned long r5)
>  {
>   unsigned long heapsize = 16*1024*1024 - (unsigned long)_end;
>  
> + __asm__ volatile("mtmsrd %0,1" : : "r" (0));
>   simple_alloc_init(_end, heapsize, 32, 64);
>   fdt_init(_dtb_start);
>   serial_console_init();
> -- 
> 2.31.1
> 
> 


Re: [PATCH] powerpc: Fix initrd corruption with relative jump labels

2021-06-14 Thread Daniel Axtens
Hi Michael,

> The fix is simply to make the key value relative to the jump_entry
> struct in the ARCH_STATIC_BRANCH macro.

This fixes the boot issues I observed. Thank you very much!!

Tested-by: Daniel Axtens 

Kind regards,
Daniel


Re: [PATCH 03/11] powerpc/radix: Add support for microwatt's PRTBL SPR

2021-06-14 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 15, 2021 8:59 am:
> Microwatt currently doesn't implement hypervisor mode and therefore
> doesn't implement the partition table.  It does implement the process
> table and radix page table walks.
> 
> This adds code to write the base address of the process table to the
> PRTBL SPR,

Is there a particular reason you haven't called it PRTCR or similar to 
match PTCR?

> which has been assigned SPR 720 for now, as that is in the
> range of SPR numbers assigned for experimental use.  PRTBL is only
> written when we have neither the FW_FEATURE_LPAR feature nor the
> CPU_FTR_HVMODE feature.

Seems like reasonable architecture for a non-HV platform.

Could it have a comment to say it's not architected, and a microwatt
ifdef until that changes?

The patch also does avoid touching LPCR or initing amor...

Thanks,
Nick

> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/include/asm/reg.h   |  1 +
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 13 +
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> index da103e92c112..3200a2522d6c 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -729,6 +729,7 @@
>  #endif
>  #define SPRN_TIR 0x1BE   /* Thread Identification Register */
>  #define SPRN_PTCR0x1D0   /* Partition table control Register */
> +#define SPRN_PRTBL   0x2D0   /* Process table pointer */
>  #define SPRN_PSPB0x09F   /* Problem State Priority Boost reg */
>  #define SPRN_PTEHI   0x3D5   /* 981 7450 PTE HI word (S/W TLB load) */
>  #define SPRN_PTELO   0x3D6   /* 982 7450 PTE LO word (S/W TLB load) */
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 98f0b243c1ab..6595859173a7 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -646,10 +646,15 @@ void __init radix__early_init_mmu(void)
>   radix_init_pgtable();
>  
>   if (!firmware_has_feature(FW_FEATURE_LPAR)) {
> - lpcr = mfspr(SPRN_LPCR);
> - mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
> - radix_init_partition_table();
> - radix_init_amor();
> + if (cpu_has_feature(CPU_FTR_HVMODE)) {
> + lpcr = mfspr(SPRN_LPCR);
> + mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
> + radix_init_partition_table();
> + radix_init_amor();
> + } else {
> + mtspr(SPRN_PRTBL, (__pa(process_tb) |
> +(PRTB_SIZE_SHIFT - 12)));
> + }
>   } else {
>   radix_init_pseries();
>   }
> -- 
> 2.31.1
> 
> 


Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable

2021-06-14 Thread Nicholas Piggin
Excerpts from Andy Lutomirski's message of June 15, 2021 2:20 am:
> Replying to several emails at once...
> 
> 
> On 6/13/21 10:21 PM, Nicholas Piggin wrote:
>> Excerpts from Nicholas Piggin's message of June 14, 2021 2:47 pm:
>>> Excerpts from Nicholas Piggin's message of June 14, 2021 2:14 pm:
 Excerpts from Andy Lutomirski's message of June 14, 2021 1:52 pm:
> On 6/13/21 5:45 PM, Nicholas Piggin wrote:
>> Excerpts from Andy Lutomirski's message of June 9, 2021 2:20 am:
>>> On 6/4/21 6:42 PM, Nicholas Piggin wrote:
 Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb 
 mm
 when it is context switched. This can be disabled by architectures that
 don't require this refcounting if they clean up lazy tlb mms when the
 last refcount is dropped. Currently this is always enabled, which is
 what existing code does, so the patch is effectively a no-op.

 Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is.
>>>
>>> I am in favor of this approach, but I would be a lot more comfortable
>>> with the resulting code if task->active_mm were at least better
>>> documented and possibly even guarded by ifdefs.
>>
>> active_mm is fairly well documented in Documentation/active_mm.rst IMO.
>> I don't think anything has changed in 20 years, I don't know what more
>> is needed, but if you can add to documentation that would be nice. Maybe
>> moving a bit of that into .c and .h files?
>>
>
> Quoting from that file:
>
>   - however, we obviously need to keep track of which address space we
> "stole" for such an anonymous user. For that, we have 
> "tsk->active_mm",
> which shows what the currently active address space is.
>
> This isn't even true right now on x86.

 From the perspective of core code, it is. x86 might do something crazy 
 with it, but it has to make it appear this way to non-arch code that
 uses active_mm.

 Is x86's scheme documented?
> 
> arch/x86/include/asm/tlbflush.h documents it a bit:
> 
> /*
>  * cpu_tlbstate.loaded_mm should match CR3 whenever interrupts
>  * are on.  This means that it may not match current->active_mm,
>  * which will contain the previous user mm when we're in lazy TLB
>  * mode even if we've already switched back to swapper_pg_dir.
>  *
>  * During switch_mm_irqs_off(), loaded_mm will be set to
>  * LOADED_MM_SWITCHING during the brief interrupts-off window
>  * when CR3 and loaded_mm would otherwise be inconsistent.  This
>  * is for nmi_uaccess_okay()'s benefit.
>  */

So the only documentation relating to the current active_mm value or 
refcounting is that it may not match what the x86 specific code is 
doing?

All this complexity you accuse me of adding is entirely in x86 code.
On other architectures, it's very simple and understandable, and 
documented. I don't know how else to explain this.


> With your patch applied:
>
>  To support all that, the "struct mm_struct" now has two counters: a
>  "mm_users" counter that is how many "real address space users" there are,
>  and a "mm_count" counter that is the number of "lazy" users (ie anonymous
>  users) plus one if there are any real users.
>
> isn't even true any more.

 Well yeah but the active_mm concept hasn't changed. The refcounting 
 change is hopefully reasonably documented?
> 
> active_mm is *only* refcounting in the core code.  See below.

It's just not. It's passed in to switch_mm. Most architectures except 
for x86 require this.

>
> I looked through all active_mm references in core code.  We have:
>
> kernel/sched/core.c: it's all refcounting, although it's a bit tangled
> with membarrier.
>
> kernel/kthread.c: same.  refcounting and membarrier stuff.
>
> kernel/exit.c: exit_mm() a BUG_ON().
>
> kernel/fork.c: initialization code and a warning.
>
> kernel/cpu.c: cpu offline stuff.  wouldn't be needed if active_mm went 
> away.
>
> fs/exec.c: nothing of interest

 I might not have been clear. Core code doesn't need active_mm if 
 active_mm somehow goes away. I'm saying active_mm can't go away because
 it's needed to support (most) archs that do lazy tlb mm switching.

 The part I don't understand is when you say it can just go away. How? 
> 
> #ifdef CONFIG_MMU_TLB_REFCOUNT
>   struct mm_struct *active_mm;
> #endif

Thanks for returning the snark.


> I didn't go through drivers, but I maintain my point.  active_mm is
> there for refcounting.  So please don't just make it even more confusing
> -- do your performance improvement, but improve the code at the same
> time: get rid of active_mm, at least on architectures that opt out of
> the refcounting.


Re: [PATCH v5 16/17] crypto/nx: Get NX capabilities for GZIP coprocessor type

2021-06-14 Thread Haren Myneni
On Mon, 2021-06-14 at 13:39 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of June 13, 2021 9:04 pm:
> > The hypervisor provides different capabilities that it supports
> > to define the user space NX request. These capabilities are
> > recommended minimum compression / decompression lengths and the
> > maximum request buffer size in bytes.
> > 
> > Changes to get NX overall capabilities which points to the
> > specific features that the hypervisor supports. Then retrieve
> > the capabilities for the specific feature (available only
> > for NXGZIP).
> 
> So what does this give you which you didn't have before? Should
> this go before the previous patch that enables the interface for
> guests,
> or is there some functional-yet-degraded mode that is available
> without
> this patch?
> 
> I would suggest even if this is the case to switch ordering of the 
> patches so as to reduce the matrix of functionality that userspace
> sees 
> when bisecting. Unless you specifically want this kind of
> bisectability,
> in which case make that explicit in the changelog.

Thanks for your suggestions. I will incorporate them and post next
revision. 

The user space request buffer length should not be more than
req_max_processed_len (available through sysfs). Otherwise NX will
return the request with RMA_Reject. Whereasas min_compress_len and
min_decompress_len are recommended values. 

We can add this patch and the last one (crypto/nx: Add sysfs interface
to export NX capabilities) before the actual enablement ("crypto/nx:
Register and unregisterVAS interface on PowerVM""

Thanks
Haren

> 
> Thanks,
> Nick
> 
> > Signed-off-by: Haren Myneni 
> > Acked-by: Herbert Xu 
> > ---
> >  drivers/crypto/nx/nx-common-pseries.c | 86
> > +++
> >  1 file changed, 86 insertions(+)
> > 
> > diff --git a/drivers/crypto/nx/nx-common-pseries.c
> > b/drivers/crypto/nx/nx-common-pseries.c
> > index 9a40fca8a9e6..60b5049ec9f7 100644
> > --- a/drivers/crypto/nx/nx-common-pseries.c
> > +++ b/drivers/crypto/nx/nx-common-pseries.c
> > @@ -9,6 +9,7 @@
> >   */
> >  
> >  #include 
> > +#include 
> >  #include 
> >  
> >  #include "nx-842.h"
> > @@ -20,6 +21,29 @@ MODULE_DESCRIPTION("842 H/W Compression driver
> > for IBM Power processors");
> >  MODULE_ALIAS_CRYPTO("842");
> >  MODULE_ALIAS_CRYPTO("842-nx");
> >  
> > +/*
> > + * Coprocessor type specific capabilities from the hypervisor.
> > + */
> > +struct hv_nx_ct_caps {
> > +   __be64  descriptor;
> > +   __be64  req_max_processed_len;  /* Max bytes in one GZIP
> > request */
> > +   __be64  min_compress_len;   /* Min compression size in
> > bytes */
> > +   __be64  min_decompress_len; /* Min decompression size
> > in bytes */
> > +} __packed __aligned(0x1000);
> > +
> > +/*
> > + * Coprocessor type specific capabilities.
> > + */
> > +struct nx_ct_caps {
> > +   u64 descriptor;
> > +   u64 req_max_processed_len;  /* Max bytes in one GZIP request */
> > +   u64 min_compress_len;   /* Min compression in bytes */
> > +   u64 min_decompress_len; /* Min decompression in bytes */
> > +};
> > +
> > +static u64 caps_feat;
> > +static struct nx_ct_caps nx_ct_caps;
> > +
> >  static struct nx842_constraints nx842_pseries_constraints = {
> > .alignment =DDE_BUFFER_ALIGN,
> > .multiple = DDE_BUFFER_LAST_MULT,
> > @@ -1066,6 +1090,64 @@ static void nx842_remove(struct vio_dev
> > *viodev)
> > kfree(old_devdata);
> >  }
> >  
> > +/*
> > + * Get NX capabilities from the hypervisor.
> > + * Only NXGZIP capabilities are provided by the hypersvisor right
> > + * now and these values are available to user space with sysfs.
> > + */
> > +static void __init nxct_get_capabilities(void)
> > +{
> > +   struct hv_vas_all_caps *hv_caps;
> > +   struct hv_nx_ct_caps *hv_nxc;
> > +   int rc;
> > +
> > +   hv_caps = kmalloc(sizeof(*hv_caps), GFP_KERNEL);
> > +   if (!hv_caps)
> > +   return;
> > +   /*
> > +* Get NX overall capabilities with feature type=0
> > +*/
> > +   rc = h_query_vas_capabilities(H_QUERY_NX_CAPABILITIES, 0,
> > + (u64)virt_to_phys(hv_caps));
> > +   if (rc)
> > +   goto out;
> > +
> > +   caps_feat = be64_to_cpu(hv_caps->feat_type);
> > +   /*
> > +* NX-GZIP feature available
> > +*/
> > +   if (caps_feat & VAS_NX_GZIP_FEAT_BIT) {
> > +   hv_nxc = kmalloc(sizeof(*hv_nxc), GFP_KERNEL);
> > +   if (!hv_nxc)
> > +   goto out;
> > +   /*
> > +* Get capabilities for NX-GZIP feature
> > +*/
> > +   rc = h_query_vas_capabilities(H_QUERY_NX_CAPABILITIES,
> > + VAS_NX_GZIP_FEAT,
> > + (u64)virt_to_phys(hv_
> > nxc));
> > +   } else {
> > +   pr_err("NX-GZIP feature is not available\n");
> > +   rc = -EINVAL;
> > +   }
> > +
> > +   if (!rc) {
> > +   

[PATCH 11/11] powerpc/microwatt: Disable interrupts in boot wrapper main program

2021-06-14 Thread Paul Mackerras
This ensures that we don't get a decrementer interrupt arriving before
we have set up a handler for it.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/boot/microwatt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/boot/microwatt.c b/arch/powerpc/boot/microwatt.c
index ac922dd0aa4d..86a07bceaadf 100644
--- a/arch/powerpc/boot/microwatt.c
+++ b/arch/powerpc/boot/microwatt.c
@@ -12,6 +12,7 @@ void platform_init(unsigned long r3, unsigned long r4, 
unsigned long r5)
 {
unsigned long heapsize = 16*1024*1024 - (unsigned long)_end;
 
+   __asm__ volatile("mtmsrd %0,1" : : "r" (0));
simple_alloc_init(_end, heapsize, 32, 64);
fdt_init(_dtb_start);
serial_console_init();
-- 
2.31.1



[PATCH 05/11] powerpc/xics: Add a native ICS backend for microwatt

2021-06-14 Thread Paul Mackerras
From: Benjamin Herrenschmidt 

This is a simple native ICS backend that matches the layout of
the Microwatt implementation of ICS.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/boot/dts/microwatt.dts  |  18 ++
 arch/powerpc/platforms/microwatt/Kconfig |   2 +
 arch/powerpc/platforms/microwatt/setup.c |   8 +
 arch/powerpc/sysdev/xics/Kconfig |   3 +
 arch/powerpc/sysdev/xics/Makefile|   1 +
 arch/powerpc/sysdev/xics/ics-native.c| 257 +++
 arch/powerpc/sysdev/xics/xics-common.c   |   2 +
 7 files changed, 291 insertions(+)
 create mode 100644 arch/powerpc/sysdev/xics/ics-native.c

diff --git a/arch/powerpc/boot/dts/microwatt.dts 
b/arch/powerpc/boot/dts/microwatt.dts
index a72177e5041d..2e75600320e8 100644
--- a/arch/powerpc/boot/dts/microwatt.dts
+++ b/arch/powerpc/boot/dts/microwatt.dts
@@ -106,7 +106,25 @@ soc@c000 {
compatible = "simple-bus";
#address-cells = <1>;
#size-cells = <1>;
+   interrupt-parent = <>;
 
ranges = <0 0 0xc000 0x4000>;
+
+   interrupt-controller@4000 {
+   compatible = "openpower,xics-presentation", 
"ibm,ppc-xicp";
+   ibm,interrupt-server-ranges = <0x0 0x1>;
+   reg = <0x4000 0x100>;
+   };
+
+   ICS: interrupt-controller@5000 {
+   compatible = "openpower,xics-sources";
+   interrupt-controller;
+   interrupt-ranges = <0x10 0x10>;
+   reg = <0x5000 0x100>;
+   #address-cells = <0>;
+   #size-cells = <0>;
+   #interrupt-cells = <2>;
+   };
+
};
 };
diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
b/arch/powerpc/platforms/microwatt/Kconfig
index 3be01e78ce57..b52c869c0eb8 100644
--- a/arch/powerpc/platforms/microwatt/Kconfig
+++ b/arch/powerpc/platforms/microwatt/Kconfig
@@ -3,6 +3,8 @@ config PPC_MICROWATT
depends on PPC_BOOK3S_64 && !SMP
bool "Microwatt SoC platform"
select PPC_XICS
+   select PPC_ICS_NATIVE
+   select PPC_ICP_NATIVE
select PPC_NATIVE
help
   This option enables support for FPGA-based Microwatt implementations.
diff --git a/arch/powerpc/platforms/microwatt/setup.c 
b/arch/powerpc/platforms/microwatt/setup.c
index 5af4adf881bc..1c1b7791fa57 100644
--- a/arch/powerpc/platforms/microwatt/setup.c
+++ b/arch/powerpc/platforms/microwatt/setup.c
@@ -10,8 +10,15 @@
 #include 
 #include 
 #include 
+
 #include 
 #include 
+#include 
+
+static void __init microwatt_init_IRQ(void)
+{
+   xics_init();
+}
 
 static int __init microwatt_probe(void)
 {
@@ -27,5 +34,6 @@ machine_arch_initcall(microwatt, microwatt_populate);
 define_machine(microwatt) {
.name   = "microwatt",
.probe  = microwatt_probe,
+   .init_IRQ   = microwatt_init_IRQ,
.calibrate_decr = generic_calibrate_decr,
 };
diff --git a/arch/powerpc/sysdev/xics/Kconfig b/arch/powerpc/sysdev/xics/Kconfig
index 304614c920aa..063d9195891f 100644
--- a/arch/powerpc/sysdev/xics/Kconfig
+++ b/arch/powerpc/sysdev/xics/Kconfig
@@ -12,3 +12,6 @@ config PPC_ICP_HV
 
 config PPC_ICS_RTAS
def_bool n
+
+config PPC_ICS_NATIVE
+   def_bool n
diff --git a/arch/powerpc/sysdev/xics/Makefile 
b/arch/powerpc/sysdev/xics/Makefile
index ba1e3117b1c0..747063927c6c 100644
--- a/arch/powerpc/sysdev/xics/Makefile
+++ b/arch/powerpc/sysdev/xics/Makefile
@@ -4,4 +4,5 @@ obj-y   += xics-common.o
 obj-$(CONFIG_PPC_ICP_NATIVE)   += icp-native.o
 obj-$(CONFIG_PPC_ICP_HV)   += icp-hv.o
 obj-$(CONFIG_PPC_ICS_RTAS) += ics-rtas.o
+obj-$(CONFIG_PPC_ICS_NATIVE)   += ics-native.o
 obj-$(CONFIG_PPC_POWERNV)  += ics-opal.o icp-opal.o
diff --git a/arch/powerpc/sysdev/xics/ics-native.c 
b/arch/powerpc/sysdev/xics/ics-native.c
new file mode 100644
index ..d450502f4053
--- /dev/null
+++ b/arch/powerpc/sysdev/xics/ics-native.c
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * ICS backend for OPAL managed interrupts.
+ *
+ * Copyright 2011 IBM Corp.
+ */
+
+//#define DEBUG
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct ics_native {
+   struct ics  ics;
+   struct device_node  *node;
+   void __iomem*base;
+   u32 ibase;
+   u32 icount;
+};
+#define to_ics_native(_ics) container_of(_ics, struct ics_native, ics)
+
+static void __iomem *ics_native_xive(struct ics_native *in, unsigned int vec)
+{
+   return in->base + 0x800 + ((vec - in->ibase) << 2);
+}
+
+static void 

[PATCH 03/11] powerpc/radix: Add support for microwatt's PRTBL SPR

2021-06-14 Thread Paul Mackerras
Microwatt currently doesn't implement hypervisor mode and therefore
doesn't implement the partition table.  It does implement the process
table and radix page table walks.

This adds code to write the base address of the process table to the
PRTBL SPR, which has been assigned SPR 720 for now, as that is in the
range of SPR numbers assigned for experimental use.  PRTBL is only
written when we have neither the FW_FEATURE_LPAR feature nor the
CPU_FTR_HVMODE feature.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 13 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index da103e92c112..3200a2522d6c 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -729,6 +729,7 @@
 #endif
 #define SPRN_TIR   0x1BE   /* Thread Identification Register */
 #define SPRN_PTCR  0x1D0   /* Partition table control Register */
+#define SPRN_PRTBL 0x2D0   /* Process table pointer */
 #define SPRN_PSPB  0x09F   /* Problem State Priority Boost reg */
 #define SPRN_PTEHI 0x3D5   /* 981 7450 PTE HI word (S/W TLB load) */
 #define SPRN_PTELO 0x3D6   /* 982 7450 PTE LO word (S/W TLB load) */
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 98f0b243c1ab..6595859173a7 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -646,10 +646,15 @@ void __init radix__early_init_mmu(void)
radix_init_pgtable();
 
if (!firmware_has_feature(FW_FEATURE_LPAR)) {
-   lpcr = mfspr(SPRN_LPCR);
-   mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
-   radix_init_partition_table();
-   radix_init_amor();
+   if (cpu_has_feature(CPU_FTR_HVMODE)) {
+   lpcr = mfspr(SPRN_LPCR);
+   mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
+   radix_init_partition_table();
+   radix_init_amor();
+   } else {
+   mtspr(SPRN_PRTBL, (__pa(process_tb) |
+  (PRTB_SIZE_SHIFT - 12)));
+   }
} else {
radix_init_pseries();
}
-- 
2.31.1



[PATCH 04/11] powerpc/microwatt: Populate platform bus from device-tree

2021-06-14 Thread Paul Mackerras
From: Benjamin Herrenschmidt 

Just like any other embedded platform.

Add an empty soc node.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/boot/dts/microwatt.dts  | 7 +++
 arch/powerpc/platforms/microwatt/setup.c | 8 
 2 files changed, 15 insertions(+)

diff --git a/arch/powerpc/boot/dts/microwatt.dts 
b/arch/powerpc/boot/dts/microwatt.dts
index 9b2e64da9432..a72177e5041d 100644
--- a/arch/powerpc/boot/dts/microwatt.dts
+++ b/arch/powerpc/boot/dts/microwatt.dts
@@ -102,4 +102,11 @@ chosen {
ibm,architecture-vec-5 = [19 00 10 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 40 00 40];
};
 
+   soc@c000 {
+   compatible = "simple-bus";
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   ranges = <0 0 0xc000 0x4000>;
+   };
 };
diff --git a/arch/powerpc/platforms/microwatt/setup.c 
b/arch/powerpc/platforms/microwatt/setup.c
index d80d52612672..5af4adf881bc 100644
--- a/arch/powerpc/platforms/microwatt/setup.c
+++ b/arch/powerpc/platforms/microwatt/setup.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -16,6 +18,12 @@ static int __init microwatt_probe(void)
return of_machine_is_compatible("microwatt-soc");
 }
 
+static int __init microwatt_populate(void)
+{
+   return of_platform_default_populate(NULL, NULL, NULL);
+}
+machine_arch_initcall(microwatt, microwatt_populate);
+
 define_machine(microwatt) {
.name   = "microwatt",
.probe  = microwatt_probe,
-- 
2.31.1



[PATCH 00/11] powerpc: Add support for Microwatt soft-core

2021-06-14 Thread Paul Mackerras
This series of patches adds support for the Microwatt soft-core.
Microwatt is an open-source 64-bit Power ISA processor written in VHDL
which targets medium-sized FPGAs such as the Xilinx Artix-7 or the
Lattice ECP5.  Microwatt currently implements the scalar fixed plus
floating-point subset of Power ISA v3.0B plus the radix MMU, but not
logical partitioning (i.e. it does not have hypervisor mode, the
partition table or nested radix translation).

Paul.

 arch/powerpc/Kconfig  |   2 +-
 arch/powerpc/boot/Makefile|   4 +
 arch/powerpc/boot/devtree.c   |  59 ---
 arch/powerpc/boot/dts/microwatt.dts   | 145 +
 arch/powerpc/boot/microwatt.c |  19 +++
 arch/powerpc/boot/ns16550.c   |   9 +-
 arch/powerpc/boot/wrapper |   5 +
 arch/powerpc/configs/microwatt_defconfig  |  98 
 arch/powerpc/include/asm/archrandom.h |  12 +-
 arch/powerpc/include/asm/reg.h|   1 +
 arch/powerpc/kernel/udbg_16550.c  |  39 +
 arch/powerpc/mm/book3s64/radix_pgtable.c  |  13 +-
 arch/powerpc/platforms/Kconfig|   1 +
 arch/powerpc/platforms/Makefile   |   1 +
 arch/powerpc/platforms/microwatt/Kconfig  |  13 ++
 arch/powerpc/platforms/microwatt/Makefile |   1 +
 arch/powerpc/platforms/microwatt/rng.c|  48 ++
 arch/powerpc/platforms/microwatt/setup.c  |  41 +
 arch/powerpc/sysdev/xics/Kconfig  |   3 +
 arch/powerpc/sysdev/xics/Makefile |   1 +
 arch/powerpc/sysdev/xics/ics-native.c | 257 ++
 arch/powerpc/sysdev/xics/xics-common.c|   2 +
 22 files changed, 741 insertions(+), 33 deletions(-)


[PATCH 07/11] powerpc: Add support for microwatt's hardware random number generator

2021-06-14 Thread Paul Mackerras
This is accessed using the DARN instruction and should probably be
done more generically.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/include/asm/archrandom.h | 12 +-
 arch/powerpc/platforms/microwatt/Kconfig  |  1 +
 arch/powerpc/platforms/microwatt/Makefile |  2 +-
 arch/powerpc/platforms/microwatt/rng.c| 48 +++
 4 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/platforms/microwatt/rng.c

diff --git a/arch/powerpc/include/asm/archrandom.h 
b/arch/powerpc/include/asm/archrandom.h
index 9a53e29680f4..e8ae0f7740f9 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -8,12 +8,22 @@
 
 static inline bool __must_check arch_get_random_long(unsigned long *v)
 {
+   if (ppc_md.get_random_seed)
+   return ppc_md.get_random_seed(v);
+
return false;
 }
 
 static inline bool __must_check arch_get_random_int(unsigned int *v)
 {
-   return false;
+   unsigned long val;
+   bool rc;
+
+   rc = arch_get_random_long();
+   if (rc)
+   *v = val;
+
+   return rc;
 }
 
 static inline bool __must_check arch_get_random_seed_long(unsigned long *v)
diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
b/arch/powerpc/platforms/microwatt/Kconfig
index 50ed0cedb5f1..8f6a81978461 100644
--- a/arch/powerpc/platforms/microwatt/Kconfig
+++ b/arch/powerpc/platforms/microwatt/Kconfig
@@ -7,6 +7,7 @@ config PPC_MICROWATT
select PPC_ICP_NATIVE
select PPC_NATIVE
select PPC_UDBG_16550
+   select ARCH_RANDOM
help
   This option enables support for FPGA-based Microwatt implementations.
 
diff --git a/arch/powerpc/platforms/microwatt/Makefile 
b/arch/powerpc/platforms/microwatt/Makefile
index e6885b3b2ee7..116d6d3ad3f0 100644
--- a/arch/powerpc/platforms/microwatt/Makefile
+++ b/arch/powerpc/platforms/microwatt/Makefile
@@ -1 +1 @@
-obj-y  += setup.o
+obj-y  += setup.o rng.o
diff --git a/arch/powerpc/platforms/microwatt/rng.c 
b/arch/powerpc/platforms/microwatt/rng.c
new file mode 100644
index ..3d8ee6eb7dad
--- /dev/null
+++ b/arch/powerpc/platforms/microwatt/rng.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Derived from arch/powerpc/platforms/powernv/rng.c, which is:
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ */
+
+#define pr_fmt(fmt)"microwatt-rng: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DARN_ERR 0xul
+
+int microwatt_get_random_darn(unsigned long *v)
+{
+   unsigned long val;
+
+   /* Using DARN with L=1 - 64-bit conditioned random number */
+   asm volatile(PPC_DARN(%0, 1) : "=r"(val));
+
+   if (val == DARN_ERR)
+   return 0;
+
+   *v = val;
+
+   return 1;
+}
+
+static __init int rng_init(void)
+{
+   unsigned long val;
+   int i;
+
+   for (i = 0; i < 10; i++) {
+   if (microwatt_get_random_darn()) {
+   ppc_md.get_random_seed = microwatt_get_random_darn;
+   return 0;
+   }
+   }
+
+   pr_warn("Unable to use DARN for get_random_seed()\n");
+
+   return -EIO;
+}
+machine_subsys_initcall(, rng_init);
-- 
2.31.1



[PATCH 06/11] powerpc: microwatt: Use standard 16550 UART for console

2021-06-14 Thread Paul Mackerras
From: Benjamin Herrenschmidt 

This adds support to the Microwatt platform to use the standard
1655-style UART which available in the standalone Microwatt FPGA.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/boot/dts/microwatt.dts  | 25 ---
 arch/powerpc/kernel/udbg_16550.c | 39 
 arch/powerpc/platforms/microwatt/Kconfig |  1 +
 arch/powerpc/platforms/microwatt/setup.c |  2 ++
 4 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/boot/dts/microwatt.dts 
b/arch/powerpc/boot/dts/microwatt.dts
index 2e75600320e8..dbde200d4692 100644
--- a/arch/powerpc/boot/dts/microwatt.dts
+++ b/arch/powerpc/boot/dts/microwatt.dts
@@ -6,6 +6,10 @@ / {
model-name = "microwatt";
compatible = "microwatt-soc";
 
+   aliases {
+   serial0 = 
+   };
+
reserved-memory {
#size-cells = <0x02>;
#address-cells = <0x02>;
@@ -97,11 +101,6 @@ PowerPC,Microwatt@0 {
};
};
 
-   chosen {
-   bootargs = "";
-   ibm,architecture-vec-5 = [19 00 10 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 40 00 40];
-   };
-
soc@c000 {
compatible = "simple-bus";
#address-cells = <1>;
@@ -126,5 +125,21 @@ ICS: interrupt-controller@5000 {
#interrupt-cells = <2>;
};
 
+   UART0: serial@2000 {
+   device_type = "serial";
+   compatible = "ns16550";
+   reg = <0x2000 0x8>;
+   clock-frequency = <1>;
+   current-speed = <115200>;
+   reg-shift = <2>;
+   fifo-size = <16>;
+   interrupts = <0x10 0x1>;
+   };
+   };
+
+   chosen {
+   bootargs = "";
+   ibm,architecture-vec-5 = [19 00 10 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 40 00 40];
+   stdout-path = 
};
 };
diff --git a/arch/powerpc/kernel/udbg_16550.c b/arch/powerpc/kernel/udbg_16550.c
index 9356b60d6030..8513aa49614e 100644
--- a/arch/powerpc/kernel/udbg_16550.c
+++ b/arch/powerpc/kernel/udbg_16550.c
@@ -296,3 +296,42 @@ void __init udbg_init_40x_realmode(void)
 }
 
 #endif /* CONFIG_PPC_EARLY_DEBUG_40x */
+
+#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT
+
+#define UDBG_UART_MW_ADDR  ((void __iomem *)0xc0002000)
+
+static u8 udbg_uart_in_isa300_rm(unsigned int reg)
+{
+   uint64_t msr = mfmsr();
+   uint8_t  c;
+
+   mtmsr(msr & ~(MSR_EE|MSR_DR));
+   isync();
+   eieio();
+   c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
+   mtmsr(msr);
+   isync();
+   return c;
+}
+
+static void udbg_uart_out_isa300_rm(unsigned int reg, u8 val)
+{
+   uint64_t msr = mfmsr();
+
+   mtmsr(msr & ~(MSR_EE|MSR_DR));
+   isync();
+   eieio();
+   __raw_rm_writeb(val, UDBG_UART_MW_ADDR + (reg << 2));
+   mtmsr(msr);
+   isync();
+}
+
+void __init udbg_init_debug_microwatt(void)
+{
+   udbg_uart_in = udbg_uart_in_isa300_rm;
+   udbg_uart_out = udbg_uart_out_isa300_rm;
+   udbg_use_uart();
+}
+
+#endif /* CONFIG_PPC_EARLY_DEBUG_MICROWATT */
diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
b/arch/powerpc/platforms/microwatt/Kconfig
index b52c869c0eb8..50ed0cedb5f1 100644
--- a/arch/powerpc/platforms/microwatt/Kconfig
+++ b/arch/powerpc/platforms/microwatt/Kconfig
@@ -6,6 +6,7 @@ config PPC_MICROWATT
select PPC_ICS_NATIVE
select PPC_ICP_NATIVE
select PPC_NATIVE
+   select PPC_UDBG_16550
help
   This option enables support for FPGA-based Microwatt implementations.
 
diff --git a/arch/powerpc/platforms/microwatt/setup.c 
b/arch/powerpc/platforms/microwatt/setup.c
index 1c1b7791fa57..0b02603bdb74 100644
--- a/arch/powerpc/platforms/microwatt/setup.c
+++ b/arch/powerpc/platforms/microwatt/setup.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static void __init microwatt_init_IRQ(void)
 {
@@ -35,5 +36,6 @@ define_machine(microwatt) {
.name   = "microwatt",
.probe  = microwatt_probe,
.init_IRQ   = microwatt_init_IRQ,
+   .progress   = udbg_progress,
.calibrate_decr = generic_calibrate_decr,
 };
-- 
2.31.1



[PATCH 09/11] powerpc: boot: Fixup device-tree on little endian

2021-06-14 Thread Paul Mackerras
From: Benjamin Herrenschmidt 

This fixes the core devtree.c functions and the ns16550 UART backend.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/boot/devtree.c | 59 +
 arch/powerpc/boot/ns16550.c |  9 --
 2 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/boot/devtree.c b/arch/powerpc/boot/devtree.c
index 5d91036ad626..58fbcfcc98c9 100644
--- a/arch/powerpc/boot/devtree.c
+++ b/arch/powerpc/boot/devtree.c
@@ -13,6 +13,7 @@
 #include "string.h"
 #include "stdio.h"
 #include "ops.h"
+#include "of.h"
 
 void dt_fixup_memory(u64 start, u64 size)
 {
@@ -23,21 +24,25 @@ void dt_fixup_memory(u64 start, u64 size)
root = finddevice("/");
if (getprop(root, "#address-cells", , sizeof(naddr)) < 0)
naddr = 2;
+   else
+   naddr = be32_to_cpu(naddr);
if (naddr < 1 || naddr > 2)
fatal("Can't cope with #address-cells == %d in /\n\r", naddr);
 
if (getprop(root, "#size-cells", , sizeof(nsize)) < 0)
nsize = 1;
+   else
+   nsize = be32_to_cpu(nsize);
if (nsize < 1 || nsize > 2)
fatal("Can't cope with #size-cells == %d in /\n\r", nsize);
 
i = 0;
if (naddr == 2)
-   memreg[i++] = start >> 32;
-   memreg[i++] = start & 0x;
+   memreg[i++] = cpu_to_be32(start >> 32);
+   memreg[i++] = cpu_to_be32(start & 0x);
if (nsize == 2)
-   memreg[i++] = size >> 32;
-   memreg[i++] = size & 0x;
+   memreg[i++] = cpu_to_be32(size >> 32);
+   memreg[i++] = cpu_to_be32(size & 0x);
 
memory = finddevice("/memory");
if (! memory) {
@@ -45,9 +50,9 @@ void dt_fixup_memory(u64 start, u64 size)
setprop_str(memory, "device_type", "memory");
}
 
-   printf("Memory <- <0x%x", memreg[0]);
+   printf("Memory <- <0x%x", be32_to_cpu(memreg[0]));
for (i = 1; i < (naddr + nsize); i++)
-   printf(" 0x%x", memreg[i]);
+   printf(" 0x%x", be32_to_cpu(memreg[i]));
printf("> (%ldMB)\n\r", (unsigned long)(size >> 20));
 
setprop(memory, "reg", memreg, (naddr + nsize)*sizeof(u32));
@@ -65,10 +70,10 @@ void dt_fixup_cpu_clocks(u32 cpu, u32 tb, u32 bus)
printf("CPU bus-frequency <- 0x%x (%dMHz)\n\r", bus, MHZ(bus));
 
while ((devp = find_node_by_devtype(devp, "cpu"))) {
-   setprop_val(devp, "clock-frequency", cpu);
-   setprop_val(devp, "timebase-frequency", tb);
+   setprop_val(devp, "clock-frequency", cpu_to_be32(cpu));
+   setprop_val(devp, "timebase-frequency", cpu_to_be32(tb));
if (bus > 0)
-   setprop_val(devp, "bus-frequency", bus);
+   setprop_val(devp, "bus-frequency", cpu_to_be32(bus));
}
 
timebase_period_ns = 10 / tb;
@@ -80,7 +85,7 @@ void dt_fixup_clock(const char *path, u32 freq)
 
if (devp) {
printf("%s: clock-frequency <- %x (%dMHz)\n\r", path, freq, 
MHZ(freq));
-   setprop_val(devp, "clock-frequency", freq);
+   setprop_val(devp, "clock-frequency", cpu_to_be32(freq));
}
 }
 
@@ -133,8 +138,12 @@ void dt_get_reg_format(void *node, u32 *naddr, u32 *nsize)
 {
if (getprop(node, "#address-cells", naddr, 4) != 4)
*naddr = 2;
+   else
+   *naddr = be32_to_cpu(*naddr);
if (getprop(node, "#size-cells", nsize, 4) != 4)
*nsize = 1;
+   else
+   *nsize = be32_to_cpu(*nsize);
 }
 
 static void copy_val(u32 *dest, u32 *src, int naddr)
@@ -163,9 +172,9 @@ static int add_reg(u32 *reg, u32 *add, int naddr)
int i, carry = 0;
 
for (i = MAX_ADDR_CELLS - 1; i >= MAX_ADDR_CELLS - naddr; i--) {
-   u64 tmp = (u64)reg[i] + add[i] + carry;
+   u64 tmp = (u64)be32_to_cpu(reg[i]) + be32_to_cpu(add[i]) + 
carry;
carry = tmp >> 32;
-   reg[i] = (u32)tmp;
+   reg[i] = cpu_to_be32((u32)tmp);
}
 
return !carry;
@@ -180,18 +189,18 @@ static int compare_reg(u32 *reg, u32 *range, u32 
*rangesize)
u32 end;
 
for (i = 0; i < MAX_ADDR_CELLS; i++) {
-   if (reg[i] < range[i])
+   if (be32_to_cpu(reg[i]) < be32_to_cpu(range[i]))
return 0;
-   if (reg[i] > range[i])
+   if (be32_to_cpu(reg[i]) > be32_to_cpu(range[i]))
break;
}
 
for (i = 0; i < MAX_ADDR_CELLS; i++) {
-   end = range[i] + rangesize[i];
+   end = be32_to_cpu(range[i]) + be32_to_cpu(rangesize[i]);
 
-   if (reg[i] < end)
+   if (be32_to_cpu(reg[i]) < end)
break;
-   if (reg[i] > end)
+   if 

[PATCH 01/11] powerpc: Add Microwatt platform

2021-06-14 Thread Paul Mackerras
Microwatt is a FPGA-based implementation of the Power ISA.  It
currently only implements little-endian 64-bit mode, and does
not (yet) support SMP, VMX, VSX or transactional memory.

This adds a new machine type to support FPGA-based SoCs with a
Microwatt core.

Signed-off-by: Paul Mackerras 
---
 arch/powerpc/Kconfig  |  2 +-
 arch/powerpc/platforms/Kconfig|  1 +
 arch/powerpc/platforms/Makefile   |  1 +
 arch/powerpc/platforms/microwatt/Kconfig  |  9 +
 arch/powerpc/platforms/microwatt/Makefile |  1 +
 arch/powerpc/platforms/microwatt/setup.c  | 23 +++
 6 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/microwatt/Kconfig
 create mode 100644 arch/powerpc/platforms/microwatt/Makefile
 create mode 100644 arch/powerpc/platforms/microwatt/setup.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 386ae12d8523..5ce51c38a346 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -422,7 +422,7 @@ config HUGETLB_PAGE_SIZE_VARIABLE
 
 config MATH_EMULATION
bool "Math emulation"
-   depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
+   depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT
select PPC_FPU_REGS
help
  Some PowerPC chips designed for embedded applications do not have
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 7a5e8f4541e3..74be4d06afbf 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -20,6 +20,7 @@ source "arch/powerpc/platforms/embedded6xx/Kconfig"
 source "arch/powerpc/platforms/44x/Kconfig"
 source "arch/powerpc/platforms/40x/Kconfig"
 source "arch/powerpc/platforms/amigaone/Kconfig"
+source "arch/powerpc/platforms/microwatt/Kconfig"
 
 config KVM_GUEST
bool "KVM Guest support"
diff --git a/arch/powerpc/platforms/Makefile b/arch/powerpc/platforms/Makefile
index 143d4417f6cc..edcb54cdb1a8 100644
--- a/arch/powerpc/platforms/Makefile
+++ b/arch/powerpc/platforms/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_PPC_CELL)+= cell/
 obj-$(CONFIG_PPC_PS3)  += ps3/
 obj-$(CONFIG_EMBEDDED6xx)  += embedded6xx/
 obj-$(CONFIG_AMIGAONE) += amigaone/
+obj-$(CONFIG_PPC_MICROWATT)+= microwatt/
diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
b/arch/powerpc/platforms/microwatt/Kconfig
new file mode 100644
index ..3be01e78ce57
--- /dev/null
+++ b/arch/powerpc/platforms/microwatt/Kconfig
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0
+config PPC_MICROWATT
+   depends on PPC_BOOK3S_64 && !SMP
+   bool "Microwatt SoC platform"
+   select PPC_XICS
+   select PPC_NATIVE
+   help
+  This option enables support for FPGA-based Microwatt implementations.
+
diff --git a/arch/powerpc/platforms/microwatt/Makefile 
b/arch/powerpc/platforms/microwatt/Makefile
new file mode 100644
index ..e6885b3b2ee7
--- /dev/null
+++ b/arch/powerpc/platforms/microwatt/Makefile
@@ -0,0 +1 @@
+obj-y  += setup.o
diff --git a/arch/powerpc/platforms/microwatt/setup.c 
b/arch/powerpc/platforms/microwatt/setup.c
new file mode 100644
index ..d80d52612672
--- /dev/null
+++ b/arch/powerpc/platforms/microwatt/setup.c
@@ -0,0 +1,23 @@
+/*
+ * Microwatt FPGA-based SoC platform setup code.
+ *
+ * Copyright 2020 Paul Mackerras (pau...@ozlabs.org), IBM Corp.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int __init microwatt_probe(void)
+{
+   return of_machine_is_compatible("microwatt-soc");
+}
+
+define_machine(microwatt) {
+   .name   = "microwatt",
+   .probe  = microwatt_probe,
+   .calibrate_decr = generic_calibrate_decr,
+};
-- 
2.31.1



[PATCH 10/11] powerpc/microwatt: Add a boot wrapper for Microwatt

2021-06-14 Thread Paul Mackerras
From: Joel Stanley 

This allows microwatt's kernel to be built with an embedded device tree.

Load to arch/powerpc/boot/dtbImage.microwatt to 0x50:

 mw_debug -b fpga stop load arch/powerpc/boot/dtbImage.microwatt 50 start

Signed-off-by: Joel Stanley 
---
 arch/powerpc/boot/Makefile|  4 
 arch/powerpc/boot/microwatt.c | 18 ++
 arch/powerpc/boot/wrapper |  5 +
 3 files changed, 27 insertions(+)
 create mode 100644 arch/powerpc/boot/microwatt.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 2b8da923ceca..dfaa4094fcae 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -163,6 +163,8 @@ src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
 src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
 src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
 
+src-plat-$(CONFIG_PPC_MICROWATT) += fixed-head.S microwatt.c
+
 src-wlib := $(sort $(src-wlib-y))
 src-plat := $(sort $(src-plat-y))
 src-boot := $(src-wlib) $(src-plat) empty.c
@@ -355,6 +357,8 @@ image-$(CONFIG_MVME5100)+= dtbImage.mvme5100
 # Board port in arch/powerpc/platform/amigaone/Kconfig
 image-$(CONFIG_AMIGAONE)   += cuImage.amigaone
 
+image-$(CONFIG_PPC_MICROWATT)  += dtbImage.microwatt
+
 # For 32-bit powermacs, build the COFF and miboot images
 # as well as the ELF images.
 ifdef CONFIG_PPC32
diff --git a/arch/powerpc/boot/microwatt.c b/arch/powerpc/boot/microwatt.c
new file mode 100644
index ..ac922dd0aa4d
--- /dev/null
+++ b/arch/powerpc/boot/microwatt.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include 
+#include "stdio.h"
+#include "types.h"
+#include "io.h"
+#include "ops.h"
+
+BSS_STACK(8192);
+
+void platform_init(unsigned long r3, unsigned long r4, unsigned long r5)
+{
+   unsigned long heapsize = 16*1024*1024 - (unsigned long)_end;
+
+   simple_alloc_init(_end, heapsize, 32, 64);
+   fdt_init(_dtb_start);
+   serial_console_init();
+}
diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 41fa0a8715e3..ae48fffa1e13 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -342,6 +342,11 @@ gamecube|wii)
 link_address='0x60'
 platformo="$object/$platform-head.o $object/$platform.o"
 ;;
+microwatt)
+link_address='0x50'
+platformo="$object/fixed-head.o $object/$platform.o"
+binary=y
+;;
 treeboot-currituck)
 link_address='0x100'
 ;;
-- 
2.31.1



[PATCH 08/11] powerpc/microwatt: Add microwatt_defconfig

2021-06-14 Thread Paul Mackerras
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/configs/microwatt_defconfig | 98 
 1 file changed, 98 insertions(+)
 create mode 100644 arch/powerpc/configs/microwatt_defconfig

diff --git a/arch/powerpc/configs/microwatt_defconfig 
b/arch/powerpc/configs/microwatt_defconfig
new file mode 100644
index ..a08b739123da
--- /dev/null
+++ b/arch/powerpc/configs/microwatt_defconfig
@@ -0,0 +1,98 @@
+# CONFIG_SWAP is not set
+# CONFIG_CROSS_MEMORY_ATTACH is not set
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_PREEMPT_VOLUNTARY=y
+CONFIG_TICK_CPU_ACCOUNTING=y
+CONFIG_LOG_BUF_SHIFT=16
+CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_VM_EVENT_COUNTERS is not set
+# CONFIG_SLUB_DEBUG is not set
+# CONFIG_COMPAT_BRK is not set
+# CONFIG_SLAB_MERGE_DEFAULT is not set
+CONFIG_PPC64=y
+# CONFIG_PPC_KUEP is not set
+# CONFIG_PPC_KUAP is not set
+CONFIG_CPU_LITTLE_ENDIAN=y
+CONFIG_NR_IRQS=64
+CONFIG_PANIC_TIMEOUT=10
+# CONFIG_PPC_POWERNV is not set
+# CONFIG_PPC_PSERIES is not set
+CONFIG_PPC_MICROWATT=y
+# CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set
+CONFIG_CPU_FREQ=y
+CONFIG_HZ_100=y
+# CONFIG_PPC_MEM_KEYS is not set
+# CONFIG_SECCOMP is not set
+# CONFIG_MQ_IOSCHED_KYBER is not set
+# CONFIG_COREDUMP is not set
+# CONFIG_COMPACTION is not set
+# CONFIG_MIGRATION is not set
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_PACKET_DIAG=y
+CONFIG_UNIX=y
+CONFIG_UNIX_DIAG=y
+CONFIG_INET=y
+CONFIG_INET_UDP_DIAG=y
+CONFIG_INET_RAW_DIAG=y
+# CONFIG_WIRELESS is not set
+CONFIG_DEVTMPFS=y
+CONFIG_DEVTMPFS_MOUNT=y
+# CONFIG_STANDALONE is not set
+# CONFIG_PREVENT_FIRMWARE_BUILD is not set
+# CONFIG_FW_LOADER is not set
+# CONFIG_ALLOW_DEV_COREDUMP is not set
+CONFIG_MTD=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_PARTITIONED_MASTER=y
+CONFIG_MTD_SPI_NOR=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_NETDEVICES=y
+# CONFIG_WLAN is not set
+# CONFIG_INPUT is not set
+# CONFIG_SERIO is not set
+# CONFIG_VT is not set
+CONFIG_SERIAL_8250=y
+# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_SERIAL_OF_PLATFORM=y
+CONFIG_SERIAL_NONSTANDARD=y
+# CONFIG_NVRAM is not set
+CONFIG_RANDOM_TRUST_CPU=y
+CONFIG_SPI=y
+CONFIG_SPI_DEBUG=y
+CONFIG_SPI_BITBANG=y
+CONFIG_SPI_SPIDEV=y
+# CONFIG_HWMON is not set
+# CONFIG_USB_SUPPORT is not set
+# CONFIG_VIRTIO_MENU is not set
+# CONFIG_IOMMU_SUPPORT is not set
+# CONFIG_NVMEM is not set
+CONFIG_EXT4_FS=y
+# CONFIG_FILE_LOCKING is not set
+# CONFIG_DNOTIFY is not set
+# CONFIG_INOTIFY_USER is not set
+# CONFIG_MISC_FILESYSTEMS is not set
+# CONFIG_CRYPTO_HW is not set
+# CONFIG_XZ_DEC_X86 is not set
+# CONFIG_XZ_DEC_IA64 is not set
+# CONFIG_XZ_DEC_ARM is not set
+# CONFIG_XZ_DEC_ARMTHUMB is not set
+# CONFIG_XZ_DEC_SPARC is not set
+CONFIG_PRINTK_TIME=y
+# CONFIG_SYMBOLIC_ERRNAME is not set
+# CONFIG_DEBUG_BUGVERBOSE is not set
+# CONFIG_DEBUG_MISC is not set
+# CONFIG_SCHED_DEBUG is not set
+# CONFIG_FTRACE is not set
+# CONFIG_STRICT_DEVMEM is not set
+CONFIG_PPC_DISABLE_WERROR=y
+CONFIG_XMON=y
+CONFIG_XMON_DEFAULT=y
+# CONFIG_XMON_DEFAULT_RO_MODE is not set
+# CONFIG_RUNTIME_TESTING_MENU is not set
-- 
2.31.1



[PATCH 02/11] powerpc: Add Microwatt device tree

2021-06-14 Thread Paul Mackerras
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/boot/dts/microwatt.dts | 105 
 1 file changed, 105 insertions(+)
 create mode 100644 arch/powerpc/boot/dts/microwatt.dts

diff --git a/arch/powerpc/boot/dts/microwatt.dts 
b/arch/powerpc/boot/dts/microwatt.dts
new file mode 100644
index ..9b2e64da9432
--- /dev/null
+++ b/arch/powerpc/boot/dts/microwatt.dts
@@ -0,0 +1,105 @@
+/dts-v1/;
+
+/ {
+   #size-cells = <0x02>;
+   #address-cells = <0x02>;
+   model-name = "microwatt";
+   compatible = "microwatt-soc";
+
+   reserved-memory {
+   #size-cells = <0x02>;
+   #address-cells = <0x02>;
+   ranges;
+   };
+
+   memory@0 {
+   device_type = "memory";
+   reg = <0x 0x 0x 0x1000>;
+   };
+
+   cpus {
+   #size-cells = <0x00>;
+   #address-cells = <0x01>;
+
+   ibm,powerpc-cpu-features {
+   display-name = "Microwatt";
+   isa = <3000>;
+   device_type = "cpu-features";
+   compatible = "ibm,powerpc-cpu-features";
+
+   mmu-radix {
+   isa = <3000>;
+   usable-privilege = <2>;
+   os-support = <0x00>;
+   };
+
+   little-endian {
+   isa = <0>;
+   usable-privilege = <3>;
+   os-support = <0x00>;
+   };
+
+   cache-inhibited-large-page {
+   isa = <0x00>;
+   usable-privilege = <2>;
+   os-support = <0x00>;
+   };
+
+   fixed-point-v3 {
+   isa = <3000>;
+   usable-privilege = <3>;
+   };
+
+   no-execute {
+   isa = <0x00>;
+   usable-privilege = <2>;
+   os-support = <0x00>;
+   };
+
+   floating-point {
+   hfscr-bit-nr = <0x00>;
+   hwcap-bit-nr = <0x1b>;
+   isa = <0x00>;
+   usable-privilege = <0x07>;
+   hv-support = <0x00>;
+   os-support = <0x00>;
+   };
+   };
+
+   PowerPC,Microwatt@0 {
+   i-cache-sets = <2>;
+   ibm,dec-bits = <64>;
+   reservation-granule-size = <64>;
+   clock-frequency = <1>;
+   timebase-frequency = <1>;
+   i-tlb-sets = <1>;
+   ibm,ppc-interrupt-server#s = <0>;
+   i-cache-block-size = <64>;
+   d-cache-block-size = <64>;
+   ibm,pa-features = [40 00 c2 27 00 00 00 80 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00 80 00 80 00 
00 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 
80 00];
+   d-cache-sets = <2>;
+   ibm,pir = <0x3c>;
+   i-tlb-size = <64>;
+   cpu-version = <0x99>;
+   status = "okay";
+   i-cache-size = <0x1000>;
+   ibm,processor-radix-AP-encodings = <0x0c 0xa010 
0x2015 0x401e>;
+   tlb-size = <0>;
+   tlb-sets = <0>;
+   device_type = "cpu";
+   d-tlb-size = <128>;
+   d-tlb-sets = <2>;
+   reg = <0>;
+   general-purpose;
+   64-bit;
+   d-cache-size = <0x1000>;
+   ibm,chip-id = <0x00>;
+   };
+   };
+
+   chosen {
+   bootargs = "";
+   ibm,architecture-vec-5 = [19 00 10 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 40 00 40];
+   };
+
+};
-- 
2.31.1



Re: [PATCH 0/5] cpufreq: cppc: Fix suspend/resume specific races with FIE code

2021-06-14 Thread Qian Cai



On 6/10/2021 4:23 AM, Viresh Kumar wrote:
> Hi Qian,
> 
> It would be helpful if you can test this patchset and confirm if the races you
> mentioned went away or not and that the FIE code works as we wanted it to.
> 
> I don't have a real setup and so it won't be easy for me to test this out.
> 
> I have already sent a temporary fix for 5.13 and this patchset is targeted for
> 5.14 and is based over that.

Unfortunately, this series looks like needing more works.

[  487.773586][T0] CPU17: Booted secondary processor 0x000801 
[0x503f0002]
[  487.976495][  T670] list_del corruption. next->prev should be 
009b66e9ec70, but was 009b66dfec70
[  487.987037][  T670] [ cut here ]
[  487.992351][  T670] kernel BUG at lib/list_debug.c:54!
[  487.997810][  T670] Internal error: Oops - BUG: 0 [#1] SMP
[  488.003295][  T670] Modules linked in: cpufreq_userspace xfs loop 
cppc_cpufreq processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb 
i2c_algo_bit nvme mlx5_core i2c_core nvme_core firmware_class
[  488.021759][  T670] CPU: 1 PID: 670 Comm: cppc_fie Not tainted 
5.13.0-rc5-next-20210611+ #46
[  488.030190][  T670] Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, 
BIOS 1.6 06/28/2020
[  488.038705][  T670] pstate: 60c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
[  488.045398][  T670] pc : __list_del_entry_valid+0x154/0x158
[  488.050969][  T670] lr : __list_del_entry_valid+0x154/0x158
[  488.056534][  T670] sp : 8000229afd70
[  488.060534][  T670] x29: 8000229afd70 x28: 0008c8f4f340 x27: 
dfff8000
[  488.068361][  T670] x26: 009b66e9ec70 x25: 800011c8b4d0 x24: 
0008d4bfe488
[  488.076188][  T670] x23: 0008c8f4f340 x22: 0008c8f4f340 x21: 
009b6789ec70
[  488.084015][  T670] x20: 0008d4bfe4c8 x19: 009b66e9ec70 x18: 
0008c8f4fd70
[  488.091842][  T670] x17: 20747562202c3037 x16: 6365396536366239 x15: 
0028
[  488.099669][  T670] x14:  x13: 0001 x12: 
60136cdd3447
[  488.107495][  T670] x11: 1fffe0136cdd3446 x10: 60136cdd3446 x9 : 
8000103ee444
[  488.115322][  T670] x8 : 009b66e9a237 x7 : 0001 x6 : 
009b66e9a230
[  488.123149][  T670] x5 : 9fec9322cbba x4 : 60136cdd3447 x3 : 
1fffe001191e9e69
[  488.130975][  T670] x2 :  x1 :  x0 : 
0054
[  488.138803][  T670] Call trace:
[  488.141935][  T670]  __list_del_entry_valid+0x154/0x158
[  488.147153][  T670]  kthread_worker_fn+0x15c/0xda0
[  488.151939][  T670]  kthread+0x3ac/0x460
[  488.155854][  T670]  ret_from_fork+0x10/0x18
[  488.160120][  T670] Code: 911e8000 aa1303e1 910a 941b595b (d421)
[  488.166901][  T670] ---[ end trace e637e2d38b2cc087 ]---
[  488.172206][  T670] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  488.179182][  T670] SMP: stopping secondary CPUs
[  489.209347][  T670] SMP: failed to stop secondary CPUs 0-1,10-11,16-17,31
[  489.216128][  T][  T670] Memoryn ]---

> 
> -8<-
> 
> The CPPC driver currently stops the frequency invariance related
> kthread_work and irq_work from cppc_freq_invariance_exit() which is only
> called during driver's removal.
> 
> This is not sufficient as the CPUs can get hot-plugged out while the
> driver is in use, the same also happens during system suspend/resume.
> 
> In such a cases we can reach a state where the CPU is removed by the
> kernel but its kthread_work or irq_work aren't stopped.
> 
> Fix this by implementing the start_cpu() and stop_cpu() callbacks in the
> cpufreq core, which will be called for each CPU's addition/removal.
> 
> A similar call was already available in the cpufreq core, which isn't required
> anymore and so its users are migrated to use exit() callback instead.
> 
> This is targeted for v5.14-rc1.
> 
> --
> Viresh
> 
> Viresh Kumar (5):
>   cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu()
>   cpufreq: intel_pstate: Migrate to ->exit() callback instead of
> ->stop_cpu()
>   cpufreq: powerenv: Migrate to ->exit() callback instead of
> ->stop_cpu()
>   cpufreq: Add start_cpu() and stop_cpu() callbacks
>   cpufreq: cppc: Fix suspend/resume specific races with the FIE code
> 
>  Documentation/cpu-freq/cpu-drivers.rst |   7 +-
>  drivers/cpufreq/Kconfig.arm|   1 -
>  drivers/cpufreq/cppc_cpufreq.c | 163 ++---
>  drivers/cpufreq/cpufreq.c  |  11 +-
>  drivers/cpufreq/intel_pstate.c |   9 +-
>  drivers/cpufreq/powernv-cpufreq.c  |  23 ++--
>  include/linux/cpufreq.h|   5 +-
>  7 files changed, 119 insertions(+), 100 deletions(-)
> 


Re: [PATCH] ASoC:fsl_easrc:Remove superfluous error message around platform_get_irq()

2021-06-14 Thread Mark Brown
On Thu, 10 Jun 2021 20:50:52 +0800, Zhongjun Tan wrote:
> Clean up the check for irq.dev_err is superfluous as platform_get_irq()
> already prints an error.Remove curly braces to confirm to styling
> requirements.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC:fsl_easrc:Remove superfluous error message around platform_get_irq()
  commit: 4d5f3a096f3d9e7067c7c2e730d989668e06d552

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


Re: [PATCH] ASoC:fsl_spdif:Remove superfluous error message around platform_get_irq()

2021-06-14 Thread Mark Brown
On Thu, 10 Jun 2021 12:00:37 +0800, Zhongjun Tan wrote:
> The platform_get_irq() prints error message telling that interrupt is
> missing, hence there is no need to duplicated that message.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC:fsl_spdif:Remove superfluous error message around platform_get_irq()
  commit: 2e8a8adb96a335a04f1697dd4314f5569521328f

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark


[RFC PATCH 1/8] powerpc/pseries: rename min_common_depth to primary_domain_index

2021-06-14 Thread Aneesh Kumar K.V
No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..8365b298ec48 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -51,7 +51,7 @@ EXPORT_SYMBOL(numa_cpu_lookup_table);
 EXPORT_SYMBOL(node_to_cpumask_map);
 EXPORT_SYMBOL(node_data);
 
-static int min_common_depth;
+static int primary_domain_index;
 static int n_mem_addr_cells, n_mem_size_cells;
 static int form1_affinity;
 
@@ -232,8 +232,8 @@ static int associativity_to_nid(const __be32 *associativity)
if (!numa_enabled)
goto out;
 
-   if (of_read_number(associativity, 1) >= min_common_depth)
-   nid = of_read_number([min_common_depth], 1);
+   if (of_read_number(associativity, 1) >= primary_domain_index)
+   nid = of_read_number([primary_domain_index], 1);
 
/* POWER4 LPAR uses 0x as invalid node */
if (nid == 0x || nid >= nr_node_ids)
@@ -284,9 +284,9 @@ int of_node_to_nid(struct device_node *device)
 }
 EXPORT_SYMBOL(of_node_to_nid);
 
-static int __init find_min_common_depth(void)
+static int __init find_primary_domain_index(void)
 {
-   int depth;
+   int index;
struct device_node *root;
 
if (firmware_has_feature(FW_FEATURE_OPAL))
@@ -326,7 +326,7 @@ static int __init find_min_common_depth(void)
}
 
if (form1_affinity) {
-   depth = of_read_number(distance_ref_points, 1);
+   index = of_read_number(distance_ref_points, 1);
} else {
if (distance_ref_points_depth < 2) {
printk(KERN_WARNING "NUMA: "
@@ -334,7 +334,7 @@ static int __init find_min_common_depth(void)
goto err;
}
 
-   depth = of_read_number(_ref_points[1], 1);
+   index = of_read_number(_ref_points[1], 1);
}
 
/*
@@ -348,7 +348,7 @@ static int __init find_min_common_depth(void)
}
 
of_node_put(root);
-   return depth;
+   return index;
 
 err:
of_node_put(root);
@@ -437,16 +437,16 @@ int of_drconf_to_nid_single(struct drmem_lmb *lmb)
int nid = default_nid;
int rc, index;
 
-   if ((min_common_depth < 0) || !numa_enabled)
+   if ((primary_domain_index < 0) || !numa_enabled)
return default_nid;
 
rc = of_get_assoc_arrays();
if (rc)
return default_nid;
 
-   if (min_common_depth <= aa.array_sz &&
+   if (primary_domain_index <= aa.array_sz &&
!(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < 
aa.n_arrays) {
-   index = lmb->aa_index * aa.array_sz + min_common_depth - 1;
+   index = lmb->aa_index * aa.array_sz + primary_domain_index - 1;
nid = of_read_number([index], 1);
 
if (nid == 0x || nid >= nr_node_ids)
@@ -708,18 +708,18 @@ static int __init parse_numa_properties(void)
return -1;
}
 
-   min_common_depth = find_min_common_depth();
+   primary_domain_index = find_primary_domain_index();
 
-   if (min_common_depth < 0) {
+   if (primary_domain_index < 0) {
/*
-* if we fail to parse min_common_depth from device tree
+* if we fail to parse primary_domain_index from device tree
 * mark the numa disabled, boot with numa disabled.
 */
numa_enabled = false;
-   return min_common_depth;
+   return primary_domain_index;
}
 
-   dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth);
+   dbg("NUMA associativity depth for CPU/Memory: %d\n", 
primary_domain_index);
 
/*
 * Even though we connect cpus to numa domains later in SMP
@@ -919,14 +919,14 @@ static void __init find_possible_nodes(void)
goto out;
}
 
-   max_nodes = of_read_number([min_common_depth], 1);
+   max_nodes = of_read_number([primary_domain_index], 1);
for (i = 0; i < max_nodes; i++) {
if (!node_possible(i))
node_set(i, node_possible_map);
}
 
prop_length /= sizeof(int);
-   if (prop_length > min_common_depth + 2)
+   if (prop_length > primary_domain_index + 2)
coregroup_enabled = 1;
 
 out:
@@ -1259,7 +1259,7 @@ int cpu_to_coregroup_id(int cpu)
goto out;
 
index = of_read_number(associativity, 1);
-   if (index > min_common_depth + 1)
+   if (index > primary_domain_index + 1)
return of_read_number([index - 1], 1);
 
 out:
-- 
2.31.1



[RFC PATCH 3/8] powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY

2021-06-14 Thread Aneesh Kumar K.V
Also make related code cleanup that will allow adding FORM2_AFFINITY in
later patches. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/firmware.h   |  4 +--
 arch/powerpc/include/asm/prom.h   |  2 +-
 arch/powerpc/kernel/prom_init.c   |  2 +-
 arch/powerpc/mm/numa.c| 35 ++-
 arch/powerpc/platforms/pseries/firmware.c |  2 +-
 5 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 7604673787d6..60b631161360 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -44,7 +44,7 @@
 #define FW_FEATURE_OPALASM_CONST(0x1000)
 #define FW_FEATURE_SET_MODEASM_CONST(0x4000)
 #define FW_FEATURE_BEST_ENERGY ASM_CONST(0x8000)
-#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0001)
+#define FW_FEATURE_FORM1_AFFINITY ASM_CONST(0x0001)
 #define FW_FEATURE_PRRNASM_CONST(0x0002)
 #define FW_FEATURE_DRMEM_V2ASM_CONST(0x0004)
 #define FW_FEATURE_DRC_INFOASM_CONST(0x0008)
@@ -69,7 +69,7 @@ enum {
FW_FEATURE_SPLPAR | FW_FEATURE_LPAR |
FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO |
FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY |
-   FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN |
+   FW_FEATURE_FORM1_AFFINITY | FW_FEATURE_PRRN |
FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 |
FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE |
FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
index 324a13351749..df9fec9d232c 100644
--- a/arch/powerpc/include/asm/prom.h
+++ b/arch/powerpc/include/asm/prom.h
@@ -147,7 +147,7 @@ extern int of_read_drc_info_cell(struct property **prop,
 #define OV5_MSI0x0201  /* PCIe/MSI support */
 #define OV5_CMO0x0480  /* Cooperative Memory 
Overcommitment */
 #define OV5_XCMO   0x0440  /* Page Coalescing */
-#define OV5_TYPE1_AFFINITY 0x0580  /* Type 1 NUMA affinity */
+#define OV5_FORM1_AFFINITY 0x0580  /* FORM1 NUMA affinity */
 #define OV5_PRRN   0x0540  /* Platform Resource Reassignment */
 #define OV5_HP_EVT 0x0604  /* Hot Plug Event support */
 #define OV5_RESIZE_HPT 0x0601  /* Hash Page Table resizing */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 41ed7e33d897..64b9593038a7 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1070,7 +1070,7 @@ static const struct ibm_arch_vec 
ibm_architecture_vec_template __initconst = {
 #else
0,
 #endif
-   .associativity = OV5_FEAT(OV5_TYPE1_AFFINITY) | 
OV5_FEAT(OV5_PRRN),
+   .associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | 
OV5_FEAT(OV5_PRRN),
.bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT),
.micro_checkpoint = 0,
.reserved0 = 0,
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 5941da201fa3..192067991f8a 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -53,7 +53,10 @@ EXPORT_SYMBOL(node_data);
 
 static int primary_domain_index;
 static int n_mem_addr_cells, n_mem_size_cells;
-static int form1_affinity;
+
+#define FORM0_AFFINITY 0
+#define FORM1_AFFINITY 1
+static int affinity_form;
 
 #define MAX_DISTANCE_REF_POINTS 4
 static int max_domain_index;
@@ -190,7 +193,7 @@ int __node_distance(int a, int b)
int i;
int distance = LOCAL_DISTANCE;
 
-   if (!form1_affinity)
+   if (affinity_form == FORM0_AFFINITY)
return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
 
for (i = 0; i < max_domain_index; i++) {
@@ -210,7 +213,7 @@ static void initialize_distance_lookup_table(int nid,
 {
int i;
 
-   if (!form1_affinity)
+   if (affinity_form != FORM1_AFFINITY)
return;
 
for (i = 0; i < max_domain_index; i++) {
@@ -289,6 +292,17 @@ static int __init find_primary_domain_index(void)
int index;
struct device_node *root;
 
+   /*
+* Check for which form of affinity.
+*/
+   if (firmware_has_feature(FW_FEATURE_OPAL)) {
+   affinity_form = FORM1_AFFINITY;
+   } else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) {
+   dbg("Using form 1 affinity\n");
+   affinity_form = FORM1_AFFINITY;
+   } else
+   affinity_form = FORM0_AFFINITY;
+
if (firmware_has_feature(FW_FEATURE_OPAL))
root = of_find_node_by_path("/ibm,opal");
else
@@ -318,23 +332,16 @@ static int __init find_primary_domain_index(void)
}
 

[RFC PATCH 8/8] powerpc/papr_scm: Use FORM2 associativity details

2021-06-14 Thread Aneesh Kumar K.V
FORM2 introduce a concept of secondary domain which is identical to the
conceept of FORM1 primary domain. Use secondary domain as the numa node
when using persistent memory device. For DAX kmem use the logical domain
id introduced in FORM2. This new numa node

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c| 28 +++
 arch/powerpc/platforms/pseries/papr_scm.c | 26 +
 arch/powerpc/platforms/pseries/pseries.h  |  1 +
 3 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 86cd2af014f7..b9ac6d02e944 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -265,6 +265,34 @@ static int associativity_to_nid(const __be32 
*associativity)
return nid;
 }
 
+int get_primary_and_secondary_domain(struct device_node *node, int *primary, 
int *secondary)
+{
+   int secondary_index;
+   const __be32 *associativity;
+
+   if (!numa_enabled) {
+   *primary = NUMA_NO_NODE;
+   *secondary = NUMA_NO_NODE;
+   return 0;
+   }
+
+   associativity = of_get_associativity(node);
+   if (!associativity)
+   return -ENODEV;
+
+   if (of_read_number(associativity, 1) >= primary_domain_index) {
+   *primary = of_read_number([primary_domain_index], 
1);
+   secondary_index = of_read_number(_ref_points[1], 1);
+   *secondary = of_read_number([secondary_index], 1);
+   }
+   if (*primary == 0x || *primary >= nr_node_ids)
+   *primary = NUMA_NO_NODE;
+
+   if (*secondary == 0x || *secondary >= nr_node_ids)
+   *secondary = NUMA_NO_NODE;
+   return 0;
+}
+
 /* Returns the nid associated with the given device tree node,
  * or -1 if not found.
  */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ef26fe40efb0..9bf2f1f3ddc5 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include "pseries.h"
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -88,6 +89,8 @@ struct papr_scm_perf_stats {
 struct papr_scm_priv {
struct platform_device *pdev;
struct device_node *dn;
+   int numa_node;
+   int target_node;
uint32_t drc_index;
uint64_t blocks;
uint64_t block_size;
@@ -923,7 +926,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
struct nd_mapping_desc mapping;
struct nd_region_desc ndr_desc;
unsigned long dimm_flags;
-   int target_nid, online_nid;
ssize_t stat_size;
 
p->bus_desc.ndctl = papr_scm_ndctl;
@@ -974,10 +976,8 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
mapping.size = p->blocks * p->block_size; // XXX: potential overflow?
 
memset(_desc, 0, sizeof(ndr_desc));
-   target_nid = dev_to_node(>pdev->dev);
-   online_nid = numa_map_to_online_node(target_nid);
-   ndr_desc.numa_node = online_nid;
-   ndr_desc.target_node = target_nid;
+   ndr_desc.numa_node = p->numa_node;
+   ndr_desc.target_node = p->target_node;
ndr_desc.res = >res;
ndr_desc.of_node = p->dn;
ndr_desc.provider_data = p;
@@ -1001,9 +1001,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.res, p->dn);
goto err;
}
-   if (target_nid != online_nid)
-   dev_info(dev, "Region registered with target node %d and online 
node %d",
-target_nid, online_nid);
 
mutex_lock(_ndr_lock);
list_add_tail(>region_list, _nd_regions);
@@ -1096,7 +1093,7 @@ static int papr_scm_probe(struct platform_device *pdev)
struct papr_scm_priv *p;
const char *uuid_str;
u64 uuid[2];
-   int rc;
+   int rc, numa_node;
 
/* check we have all the required DT properties */
if (of_property_read_u32(dn, "ibm,my-drc-index", _index)) {
@@ -1119,11 +1116,20 @@ static int papr_scm_probe(struct platform_device *pdev)
return -ENODEV;
}
 
-
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (!p)
return -ENOMEM;
 
+   if (get_primary_and_secondary_domain(dn, >target_node, _node)) {
+   dev_err(>dev, "%pOF: missing NUMA attributes!\n", dn);
+   rc = -ENODEV;
+   goto err;
+   }
+   p->numa_node = numa_map_to_online_node(numa_node);
+   if (numa_node != p->numa_node)
+   dev_info(>dev, "Region registered with online node %d and 
device tree node %d",
+p->numa_node, numa_node);
+
/* Initialize the dimm mutex */
mutex_init(>health_mutex);
 
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 663a0859cf13..9c2a1fc9ded1 

[RFC PATCH 7/8] powerpc/pseries: Add support for FORM2 associativity

2021-06-14 Thread Aneesh Kumar K.V
Signed-off-by: Daniel Henrique Barboza 
Signed-off-by: Aneesh Kumar K.V 
---
 Documentation/powerpc/associativity.rst   | 139 
 arch/powerpc/include/asm/firmware.h   |   3 +-
 arch/powerpc/include/asm/prom.h   |   1 +
 arch/powerpc/kernel/prom_init.c   |   3 +-
 arch/powerpc/mm/numa.c| 149 +-
 arch/powerpc/platforms/pseries/firmware.c |   1 +
 6 files changed, 290 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/powerpc/associativity.rst

diff --git a/Documentation/powerpc/associativity.rst 
b/Documentation/powerpc/associativity.rst
new file mode 100644
index ..58abedea81d7
--- /dev/null
+++ b/Documentation/powerpc/associativity.rst
@@ -0,0 +1,139 @@
+
+NUMA resource associativity
+=
+
+Associativity represents the groupings of the various platform resources into
+domains of substantially similar mean performance relative to resources outside
+of that domain. Resources subsets of a given domain that exhibit better
+performance relative to each other than relative to other resources subsets
+are represented as being members of a sub-grouping domain. This performance
+characteristic is presented in terms of NUMA node distance within the Linux 
kernel.
+From the platform view, these groups are also referred to as domains.
+
+PAPR interface currently supports two different ways of communicating these 
resource
+grouping details to the OS. These are referred to as Form 0 and Form 1 
associativity grouping.
+Form 0 is the older format and is now considered deprecated.
+
+Hypervisor indicates the type/form of associativity used via 
"ibm,arcitecture-vec-5 property".
+Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
Form 0 or Form 1.
+A value of 1 indicates the usage of Form 1 associativity.
+
+Form 0
+-
+Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
+
+Form 1
+-
+With Form 1 a combination of ibm,associativity-reference-points and 
ibm,associativity
+device tree properties are used to determine the NUMA distance between 
resource groups/domains. 
+
+The ???ibm,associativity??? property contains one or more lists of numbers 
(domainID)
+representing the resource???s platform grouping domains.
+
+The ???ibm,associativity-reference-points??? property contains one or more 
list of numbers
+(domain index) that represents the 1 based ordinal in the associativity lists 
of the most
+significant boundary, with subsequent entries indicating progressively less 
significant boundaries.
+
+Linux kernel uses the domain id of the most significant boundary (aka primary 
domain)
+as the NUMA node id. Linux kernel computes NUMA distance between two domains by
+recursively comparing if they belong to the same higher-level domains. For 
mismatch
+at every higher level of the resource group, the kernel doubles the NUMA 
distance between
+the comparing domains.
+
+Form 2
+---
+Form 2 associativity format adds separate device tree properties representing 
NUMA node distance
+thereby making the node distance computation flexible. Form 2 also allows 
flexible primary
+domain numbering. With numa distance computation now detached from the index 
value of
+"ibm,associativity" property, Form 2 allows a large number of primary domain 
ids at the
+same domain index representing resource groups of different 
performance/latency characteristics.
+
+Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in 
the
+"ibm,architecture-vec-5" property.
+
+"ibm,numa-lookup-index-table" property contains one or more list numbers 
representing
+the domainIDs present in the system. The offset of the domainID in this 
property is considered
+the domainID index.
+
+prop-encoded-array: The number N of the domainIDs encoded as with encode-int, 
followed by
+N domainID encoded as with encode-int
+
+For ex:
+ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index for 
domainID 8 is 1.
+
+"ibm,numa-distance-table" property contains one or more list of numbers 
representing the NUMA
+distance between resource groups/domains present in the system.
+
+prop-encoded-array: The number N of the distance values encoded as with 
encode-int, followed by
+N distance values encoded as with encode-bytes. The max distance value we 
could encode is 255.
+
+For ex:
+ibm,numa-lookup-index-table =  {3, 0, 8, 40}
+ibm,numa-distance-table =  {9, 1, 2, 8, 2, 1, 16, 8, 16, 1}
+
+  | 08   40
+--|
+  |
+0 | 10   20  80
+  |
+8 | 20   10  160
+  |
+40| 80   160  10
+
+With Form2 "ibm,associativity" for resources is listed as below:
+
+"ibm,associativity" property for resources in node 0, 8 and 40
+{ 4, 6, 7, 0, 0}
+{ 4, 6, 9, 8, 8}
+{ 4, 6, 7, 0, 40}
+
+With "ibm,associativity-reference-points"  { 0x4, 0x3, 0x2 }
+
+With Form2 the primary domainID and secondary domainID are used to identify 
the NUMA nodes

[RFC PATCH 6/8] powerpc/pseries: Add a helper for form1 cpu distance

2021-06-14 Thread Aneesh Kumar K.V
This helper is only used with the dispatch trace log collection.
A later patch will add Form2 affinity support and this change helps
in keeping that simpler. Also add a comment explaining we don't expect
the code to be called with FORM0

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 64caaf07cf82..696e5bfe1414 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -166,7 +166,7 @@ static void unmap_cpu_from_node(unsigned long cpu)
 }
 #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
 
-int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
+static int __cpu_form1_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 {
int dist = 0;
 
@@ -182,6 +182,14 @@ int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
return dist;
 }
 
+int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
+{
+   /* We should not get called with FORM0 */
+   VM_WARN_ON(affinity_form == FORM0_AFFINITY);
+
+   return __cpu_form1_distance(cpu1_assoc, cpu2_assoc);
+}
+
 /* must hold reference to node during call */
 static const __be32 *of_get_associativity(struct device_node *dev)
 {
-- 
2.31.1



[RFC PATCH 5/8] powerpc/pseries: Consolidate NUMA distance update during boot

2021-06-14 Thread Aneesh Kumar K.V
Instead of updating NUMA distance every time we lookup a node id
from the associativity property, add helpers that can be used
during boot which does this only once. Also remove the distance
update from node id lookup helpers.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c | 133 +++--
 1 file changed, 87 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index fec47981c1ef..64caaf07cf82 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -208,22 +208,6 @@ int __node_distance(int a, int b)
 }
 EXPORT_SYMBOL(__node_distance);
 
-static void initialize_distance_lookup_table(int nid,
-   const __be32 *associativity)
-{
-   int i;
-
-   if (affinity_form != FORM1_AFFINITY)
-   return;
-
-   for (i = 0; i < max_domain_index; i++) {
-   const __be32 *entry;
-
-   entry = [be32_to_cpu(distance_ref_points[i]) - 1];
-   distance_lookup_table[nid][i] = of_read_number(entry, 1);
-   }
-}
-
 /*
  * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
  * info is found.
@@ -241,15 +225,6 @@ static int associativity_to_nid(const __be32 
*associativity)
/* POWER4 LPAR uses 0x as invalid node */
if (nid == 0x || nid >= nr_node_ids)
nid = NUMA_NO_NODE;
-
-   if (nid > 0 &&
-   of_read_number(associativity, 1) >= max_domain_index) {
-   /*
-* Skip the length field and send start of associativity array
-*/
-   initialize_distance_lookup_table(nid, associativity + 1);
-   }
-
 out:
return nid;
 }
@@ -291,6 +266,9 @@ static void __initialize_form1_numa_distance(const __be32 
*associativity)
 {
int i, nid;
 
+   if (affinity_form != FORM1_AFFINITY)
+   return;
+
if (of_read_number(associativity, 1) >= primary_domain_index) {
nid = of_read_number([primary_domain_index], 1);
 
@@ -474,6 +452,48 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
return 0;
 }
 
+static int get_nid_and_numa_distance(struct drmem_lmb *lmb)
+{
+   struct assoc_arrays aa = { .arrays = NULL };
+   int default_nid = NUMA_NO_NODE;
+   int nid = default_nid;
+   int rc, index;
+
+   if ((primary_domain_index < 0) || !numa_enabled)
+   return default_nid;
+
+   rc = of_get_assoc_arrays();
+   if (rc)
+   return default_nid;
+
+   if (primary_domain_index <= aa.array_sz &&
+   !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < 
aa.n_arrays) {
+   index = lmb->aa_index * aa.array_sz + primary_domain_index - 1;
+   nid = of_read_number([index], 1);
+
+   if (nid == 0x || nid >= nr_node_ids)
+   nid = default_nid;
+   if (nid > 0 && affinity_form == FORM1_AFFINITY) {
+   int i;
+   const __be32 *associativity;
+
+   index = lmb->aa_index * aa.array_sz;
+   associativity = [index];
+   /*
+* lookup array associativity entries have different 
format
+* There is no length of the array as the first element.
+*/
+   for (i = 0; i < max_domain_index; i++) {
+   const __be32 *entry;
+
+   entry = 
[be32_to_cpu(distance_ref_points[i]) - 1];
+   distance_lookup_table[nid][i] = 
of_read_number(entry, 1);
+   }
+   }
+   }
+   return nid;
+}
+
 /*
  * This is like of_node_to_nid_single() for memory represented in the
  * ibm,dynamic-reconfiguration-memory node.
@@ -499,21 +519,14 @@ int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 
if (nid == 0x || nid >= nr_node_ids)
nid = default_nid;
-
-   if (nid > 0) {
-   index = lmb->aa_index * aa.array_sz;
-   initialize_distance_lookup_table(nid,
-   [index]);
-   }
}
-
return nid;
 }
 
 #ifdef CONFIG_PPC_SPLPAR
-static int vphn_get_nid(long lcpu)
+
+static int __vphn_get_associativity(long lcpu, __be32 *associativity)
 {
-   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
long rc, hwid;
 
/*
@@ -533,10 +546,22 @@ static int vphn_get_nid(long lcpu)
 
rc = hcall_vphn(hwid, VPHN_FLAG_VCPU, associativity);
if (rc == H_SUCCESS)
-   return associativity_to_nid(associativity);
+   return 0;
}
 
+   return -1;
+}
+
+static int vphn_get_nid(long lcpu)
+{
+   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+
+
+   if 

[RFC PATCH 4/8] powerpc/pseries: Consolidate DLPAR NUMA distance update

2021-06-14 Thread Aneesh Kumar K.V
The associativity details of the newly added resourced are collected from
the hypervisor via "ibm,configure-connector" rtas call. Update the numa
distance details of the newly added numa node after the above call. In
later patch we will remove updating NUMA distance when we are looking
for node id from associativity array.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c| 41 +++
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |  2 +
 .../platforms/pseries/hotplug-memory.c|  2 +
 arch/powerpc/platforms/pseries/pseries.h  |  1 +
 4 files changed, 46 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 192067991f8a..fec47981c1ef 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -287,6 +287,47 @@ int of_node_to_nid(struct device_node *device)
 }
 EXPORT_SYMBOL(of_node_to_nid);
 
+static void __initialize_form1_numa_distance(const __be32 *associativity)
+{
+   int i, nid;
+
+   if (of_read_number(associativity, 1) >= primary_domain_index) {
+   nid = of_read_number([primary_domain_index], 1);
+
+   for (i = 0; i < max_domain_index; i++) {
+   const __be32 *entry;
+
+   entry = 
[be32_to_cpu(distance_ref_points[i])];
+   distance_lookup_table[nid][i] = of_read_number(entry, 
1);
+   }
+   }
+}
+
+static void initialize_form1_numa_distance(struct device_node *node)
+{
+   const __be32 *associativity;
+
+   associativity = of_get_associativity(node);
+   if (!associativity)
+   return;
+
+   __initialize_form1_numa_distance(associativity);
+   return;
+}
+
+/*
+ * Used to update distance information w.r.t newly added node.
+ */
+void update_numa_distance(struct device_node *node)
+{
+   if (affinity_form == FORM0_AFFINITY)
+   return;
+   else if (affinity_form == FORM1_AFFINITY) {
+   initialize_form1_numa_distance(node);
+   return;
+   }
+}
+
 static int __init find_primary_domain_index(void)
 {
int index;
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 7e970f81d8ff..778b6ab35f0d 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -498,6 +498,8 @@ static ssize_t dlpar_cpu_add(u32 drc_index)
return saved_rc;
}
 
+   update_numa_distance(dn);
+
rc = dlpar_online_cpu(dn);
if (rc) {
saved_rc = rc;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 8377f1f7c78e..0e602c3b01ea 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -180,6 +180,8 @@ static int update_lmb_associativity_index(struct drmem_lmb 
*lmb)
return -ENODEV;
}
 
+   update_numa_distance(lmb_node);
+
dr_node = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
if (!dr_node) {
dlpar_free_cc_nodes(lmb_node);
diff --git a/arch/powerpc/platforms/pseries/pseries.h 
b/arch/powerpc/platforms/pseries/pseries.h
index 1f051a786fb3..663a0859cf13 100644
--- a/arch/powerpc/platforms/pseries/pseries.h
+++ b/arch/powerpc/platforms/pseries/pseries.h
@@ -113,4 +113,5 @@ extern u32 pseries_security_flavor;
 void pseries_setup_security_mitigations(void);
 void pseries_lpar_read_hblkrm_characteristics(void);
 
+void update_numa_distance(struct device_node *node);
 #endif /* _PSERIES_PSERIES_H */
-- 
2.31.1



[RFC PATCH 2/8] powerpc/pseries: rename distance_ref_points_depth to max_domain_index

2021-06-14 Thread Aneesh Kumar K.V
No functional change in this patch

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/numa.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8365b298ec48..5941da201fa3 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -56,7 +56,7 @@ static int n_mem_addr_cells, n_mem_size_cells;
 static int form1_affinity;
 
 #define MAX_DISTANCE_REF_POINTS 4
-static int distance_ref_points_depth;
+static int max_domain_index;
 static const __be32 *distance_ref_points;
 static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
 
@@ -169,7 +169,7 @@ int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 
int i, index;
 
-   for (i = 0; i < distance_ref_points_depth; i++) {
+   for (i = 0; i < max_domain_index; i++) {
index = be32_to_cpu(distance_ref_points[i]);
if (cpu1_assoc[index] == cpu2_assoc[index])
break;
@@ -193,7 +193,7 @@ int __node_distance(int a, int b)
if (!form1_affinity)
return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
 
-   for (i = 0; i < distance_ref_points_depth; i++) {
+   for (i = 0; i < max_domain_index; i++) {
if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
break;
 
@@ -213,7 +213,7 @@ static void initialize_distance_lookup_table(int nid,
if (!form1_affinity)
return;
 
-   for (i = 0; i < distance_ref_points_depth; i++) {
+   for (i = 0; i < max_domain_index; i++) {
const __be32 *entry;
 
entry = [be32_to_cpu(distance_ref_points[i]) - 1];
@@ -240,7 +240,7 @@ static int associativity_to_nid(const __be32 *associativity)
nid = NUMA_NO_NODE;
 
if (nid > 0 &&
-   of_read_number(associativity, 1) >= distance_ref_points_depth) {
+   of_read_number(associativity, 1) >= max_domain_index) {
/*
 * Skip the length field and send start of associativity array
 */
@@ -310,14 +310,14 @@ static int __init find_primary_domain_index(void)
 */
distance_ref_points = of_get_property(root,
"ibm,associativity-reference-points",
-   _ref_points_depth);
+   _domain_index);
 
if (!distance_ref_points) {
dbg("NUMA: ibm,associativity-reference-points not found.\n");
goto err;
}
 
-   distance_ref_points_depth /= sizeof(int);
+   max_domain_index /= sizeof(int);
 
if (firmware_has_feature(FW_FEATURE_OPAL) ||
firmware_has_feature(FW_FEATURE_TYPE1_AFFINITY)) {
@@ -328,7 +328,7 @@ static int __init find_primary_domain_index(void)
if (form1_affinity) {
index = of_read_number(distance_ref_points, 1);
} else {
-   if (distance_ref_points_depth < 2) {
+   if (max_domain_index < 2) {
printk(KERN_WARNING "NUMA: "
"short ibm,associativity-reference-points\n");
goto err;
@@ -341,10 +341,10 @@ static int __init find_primary_domain_index(void)
 * Warn and cap if the hardware supports more than
 * MAX_DISTANCE_REF_POINTS domains.
 */
-   if (distance_ref_points_depth > MAX_DISTANCE_REF_POINTS) {
+   if (max_domain_index > MAX_DISTANCE_REF_POINTS) {
printk(KERN_WARNING "NUMA: distance array capped at "
"%d entries\n", MAX_DISTANCE_REF_POINTS);
-   distance_ref_points_depth = MAX_DISTANCE_REF_POINTS;
+   max_domain_index = MAX_DISTANCE_REF_POINTS;
}
 
of_node_put(root);
-- 
2.31.1



[RFC PATCH 0/8] Add support for FORM2 associativity

2021-06-14 Thread Aneesh Kumar K.V
Form2 associativity adds a much more flexible NUMA topology layout
than what is provided by Form1. This also allows PAPR SCM device
to use better associativity when using the device as DAX KMEM
device. More details can be found in patch x

$ ndctl list -N -v
[
  {
"dev":"namespace0.0",
"mode":"devdax",
"map":"dev",
"size":1071644672,
"uuid":"37dea198-ddb5-4e42-915a-99a915e24188",
"raw_uuid":"148deeaa-4a2f-41d1-8d74-fd9a942d26ba",
"daxregion":{
  "id":0,
  "size":1071644672,
  "devices":[
{
  "chardev":"dax0.0",
  "size":1071644672,
  "target_node":4,
  "mode":"devdax"
}
  ]
},
"align":2097152,
"numa_node":1
  }
]

$ numactl -H
...
node distances:
node   0   1   2   3 
  0:  10  11  222  33 
  1:  44  10  55  66 
  2:  77  88  10  99 
  3:  101  121  132  10 
$

After DAX KMEM
# numactl -H
available: 5 nodes (0-4)
...
node distances:
node   0   1   2   3   4 
  0:  10  11  22  33  255 
  1:  44  10  55  66  255 
  2:  77  88  10  99  255 
  3:  101  121  132  10  255 
  4:  255  255  255  255  10 
# 

The above output is with a Qemu command line

-numa node,nodeid=4 \
-numa dist,src=0,dst=1,val=11 -numa dist,src=0,dst=2,val=22 -numa 
dist,src=0,dst=3,val=33 -numa dist,src=0,dst=4,val=255 \
-numa dist,src=1,dst=0,val=44 -numa dist,src=1,dst=2,val=55 -numa 
dist,src=1,dst=3,val=66 -numa dist,src=1,dst=4,val=255 \
-numa dist,src=2,dst=0,val=77 -numa dist,src=2,dst=1,val=88 -numa 
dist,src=2,dst=3,val=99 -numa dist,src=2,dst=4,val=255 \
-numa dist,src=3,dst=0,val=101 -numa dist,src=3,dst=1,val=121 -numa 
dist,src=3,dst=2,val=132 -numa dist,src=3,dst=4,val=255 \
-numa dist,src=4,dst=0,val=255 -numa dist,src=4,dst=1,val=255 -numa 
dist,src=4,dst=2,val=255 -numa dist,src=4,dst=3,val=255 \
-object 
memory-backend-file,id=memnvdimm1,prealloc=yes,mem-path=$PMEM_DISK,share=yes,size=${PMEM_SIZE}
  \
-device 
nvdimm,label-size=128K,memdev=memnvdimm1,id=nvdimm1,slot=4,uuid=72511b67-0b3b-42fd-8d1d-5be3cae8bcaa,node=4,persistent-nodeid=1



Aneesh Kumar K.V (8):
  powerpc/pseries: rename min_common_depth to primary_domain_index
  powerpc/pseries: rename distance_ref_points_depth to max_domain_index
  powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
  powerpc/pseries: Consolidate DLPAR NUMA distance update
  powerpc/pseries: Consolidate NUMA distance update during boot
  powerpc/pseries: Add a helper for form1 cpu distance
  powerpc/pseries: Add support for FORM2 associativity
  powerpc/papr_scm: Use FORM2 associativity details

 Documentation/powerpc/associativity.rst   | 139 ++
 arch/powerpc/include/asm/firmware.h   |   7 +-
 arch/powerpc/include/asm/prom.h   |   3 +-
 arch/powerpc/kernel/prom_init.c   |   3 +-
 arch/powerpc/mm/numa.c| 436 ++
 arch/powerpc/platforms/pseries/firmware.c |   3 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c  |   2 +
 .../platforms/pseries/hotplug-memory.c|   2 +
 arch/powerpc/platforms/pseries/papr_scm.c |  26 +-
 arch/powerpc/platforms/pseries/pseries.h  |   2 +
 10 files changed, 522 insertions(+), 101 deletions(-)
 create mode 100644 Documentation/powerpc/associativity.rst

-- 
2.31.1



Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable

2021-06-14 Thread Andy Lutomirski
Replying to several emails at once...


On 6/13/21 10:21 PM, Nicholas Piggin wrote:
> Excerpts from Nicholas Piggin's message of June 14, 2021 2:47 pm:
>> Excerpts from Nicholas Piggin's message of June 14, 2021 2:14 pm:
>>> Excerpts from Andy Lutomirski's message of June 14, 2021 1:52 pm:
 On 6/13/21 5:45 PM, Nicholas Piggin wrote:
> Excerpts from Andy Lutomirski's message of June 9, 2021 2:20 am:
>> On 6/4/21 6:42 PM, Nicholas Piggin wrote:
>>> Add CONFIG_MMU_TLB_REFCOUNT which enables refcounting of the lazy tlb mm
>>> when it is context switched. This can be disabled by architectures that
>>> don't require this refcounting if they clean up lazy tlb mms when the
>>> last refcount is dropped. Currently this is always enabled, which is
>>> what existing code does, so the patch is effectively a no-op.
>>>
>>> Rename rq->prev_mm to rq->prev_lazy_mm, because that's what it is.
>>
>> I am in favor of this approach, but I would be a lot more comfortable
>> with the resulting code if task->active_mm were at least better
>> documented and possibly even guarded by ifdefs.
>
> active_mm is fairly well documented in Documentation/active_mm.rst IMO.
> I don't think anything has changed in 20 years, I don't know what more
> is needed, but if you can add to documentation that would be nice. Maybe
> moving a bit of that into .c and .h files?
>

 Quoting from that file:

   - however, we obviously need to keep track of which address space we
 "stole" for such an anonymous user. For that, we have "tsk->active_mm",
 which shows what the currently active address space is.

 This isn't even true right now on x86.
>>>
>>> From the perspective of core code, it is. x86 might do something crazy 
>>> with it, but it has to make it appear this way to non-arch code that
>>> uses active_mm.
>>>
>>> Is x86's scheme documented?

arch/x86/include/asm/tlbflush.h documents it a bit:

/*
 * cpu_tlbstate.loaded_mm should match CR3 whenever interrupts
 * are on.  This means that it may not match current->active_mm,
 * which will contain the previous user mm when we're in lazy TLB
 * mode even if we've already switched back to swapper_pg_dir.
 *
 * During switch_mm_irqs_off(), loaded_mm will be set to
 * LOADED_MM_SWITCHING during the brief interrupts-off window
 * when CR3 and loaded_mm would otherwise be inconsistent.  This
 * is for nmi_uaccess_okay()'s benefit.
 */



>>>
 With your patch applied:

  To support all that, the "struct mm_struct" now has two counters: a
  "mm_users" counter that is how many "real address space users" there are,
  and a "mm_count" counter that is the number of "lazy" users (ie anonymous
  users) plus one if there are any real users.

 isn't even true any more.
>>>
>>> Well yeah but the active_mm concept hasn't changed. The refcounting 
>>> change is hopefully reasonably documented?

active_mm is *only* refcounting in the core code.  See below.


 I looked through all active_mm references in core code.  We have:

 kernel/sched/core.c: it's all refcounting, although it's a bit tangled
 with membarrier.

 kernel/kthread.c: same.  refcounting and membarrier stuff.

 kernel/exit.c: exit_mm() a BUG_ON().

 kernel/fork.c: initialization code and a warning.

 kernel/cpu.c: cpu offline stuff.  wouldn't be needed if active_mm went 
 away.

 fs/exec.c: nothing of interest
>>>
>>> I might not have been clear. Core code doesn't need active_mm if 
>>> active_mm somehow goes away. I'm saying active_mm can't go away because
>>> it's needed to support (most) archs that do lazy tlb mm switching.
>>>
>>> The part I don't understand is when you say it can just go away. How? 

#ifdef CONFIG_MMU_TLB_REFCOUNT
struct mm_struct *active_mm;
#endif

>>>
 I didn't go through drivers, but I maintain my point.  active_mm is
 there for refcounting.  So please don't just make it even more confusing
 -- do your performance improvement, but improve the code at the same
 time: get rid of active_mm, at least on architectures that opt out of
 the refcounting.
>>>
>>> powerpc opts out of the refcounting and can not "get rid of active_mm".
>>> Not even in theory.
>>
>> That is to say, it does do a type of reference management that requires 
>> active_mm so you can argue it has not entirely opted out of refcounting.
>> But we're not just doing refcounting for the sake of refcounting! That
>> would make no sense.
>>
>> active_mm is required because that's the mm that we have switched to 
>> (from core code's perspective), and it is integral to know when to 
>> switch to a different mm. See how active_mm is a fundamental concept
>> in core code? It's part of the contract between core code and the
>> arch mm context 

Re: [mm/mremap] ecf8443e51: vm-scalability.throughput -29.4% regression

2021-06-14 Thread Aneesh Kumar K.V

On 6/14/21 8:25 PM, kernel test robot wrote:



Greeting,

FYI, we noticed a -29.4% regression of vm-scalability.throughput due to commit:


commit: ecf8443e51a862b261313c2319ab4e4aed9e6b7e ("[PATCH v7 02/11] mm/mremap: Fix 
race between MOVE_PUD mremap and pageout")
url: 
https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/Speedup-mremap-on-ppc64/20210607-135424
base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git next




We dropped that approach and is now using 
https://lore.kernel.org/linux-mm/20210610083549.386085-1-aneesh.ku...@linux.ibm.com 



Instead of pud lock we are now using rmap lock with mremap.

Can you check with that series?

-aneesh


Re: [PATCH] powerpc: Fix initrd corruption with relative jump labels

2021-06-14 Thread Greg Kurz
On Mon, 14 Jun 2021 23:14:40 +1000
Michael Ellerman  wrote:

> Commit b0b3b2c78ec0 ("powerpc: Switch to relative jump labels") switched
> us to using relative jump labels. That involves changing the code,
> target and key members in struct jump_entry to be relative to the
> address of the jump_entry, rather than absolute addresses.
> 
> We have two static inlines that create a struct jump_entry,
> arch_static_branch() and arch_static_branch_jump(), as well as an asm
> macro ARCH_STATIC_BRANCH, which is used by the pseries-only hypervisor
> tracing code.
> 
> Unfortunately we missed updating the key to be a relative reference in
> ARCH_STATIC_BRANCH.
> 
> That causes a pseries kernel to have a handful of jump_entry structs
> with bad key values. Instead of being a relative reference they instead
> hold the full address of the key.
> 
> However the code doesn't expect that, it still adds the key value to the
> address of the jump_entry (see jump_entry_key()) expecting to get a
> pointer to a key somewhere in kernel data.
> 
> The table of jump_entry structs sits in rodata, which comes after the
> kernel text. In a typical build this will be somewhere around 15MB. The
> address of the key will be somewhere in data, typically around 20MB.
> Adding the two values together gets us a pointer somewhere around 45MB.
> 
> We then call static_key_set_entries() with that bad pointer and modify
> some members of the struct static_key we think we are pointing at.
> 
> A pseries kernel is typically ~30MB in size, so writing to ~45MB won't
> corrupt the kernel itself. However if we're booting with an initrd,
> depending on the size and exact location of the initrd, we can corrupt
> the initrd. Depending on how exactly we corrupt the initrd it can either
> cause the system to not boot, or just corrupt one of the files in the
> initrd.
> 
> The fix is simply to make the key value relative to the jump_entry
> struct in the ARCH_STATIC_BRANCH macro.
> 
> Fixes: b0b3b2c78ec0 ("powerpc: Switch to relative jump labels")
> Reported-by: Anastasia Kovaleva 
> Reported-by: Roman Bolshakov 
> Reported-by: Greg Kurz 
> Reported-by: Daniel Axtens 
> Signed-off-by: Michael Ellerman 
> ---

Great thanks for debugging this issue ! I'll try it out tomorrow morning.

Cheers,

--
Greg

>  arch/powerpc/include/asm/jump_label.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/jump_label.h 
> b/arch/powerpc/include/asm/jump_label.h
> index 2d5c6bec2b4f..93ce3ec25387 100644
> --- a/arch/powerpc/include/asm/jump_label.h
> +++ b/arch/powerpc/include/asm/jump_label.h
> @@ -50,7 +50,7 @@ static __always_inline bool arch_static_branch_jump(struct 
> static_key *key, bool
>  1098:nop;\
>   .pushsection __jump_table, "aw";\
>   .long 1098b - ., LABEL - .; \
> - FTR_ENTRY_LONG KEY; \
> + FTR_ENTRY_LONG KEY - .; \
>   .popsection
>  #endif
>  



Re: [mm/mremap] ecf8443e51: vm-scalability.throughput -29.4% regression

2021-06-14 Thread Linus Torvalds
On Mon, Jun 14, 2021 at 7:39 AM kernel test robot  wrote:
>

> FYI, we noticed a -29.4% regression of vm-scalability.throughput due to 
> commit:
> ecf8443e51a8 ("[PATCH v7 02/11] mm/mremap: Fix race between MOVE_PUD mremap 
> and pageout")

Ouch.

I guess it's not a huge surprise, but that's a fairly large regression.

Probably because the pud lock is just one single lock ("No scalability
reason to split PUD locks yet").

What happens if pud_lockptr() were to do the same thing that pmd_lockptr() does?

Linus


[PATCH] powerpc: Fix initrd corruption with relative jump labels

2021-06-14 Thread Michael Ellerman
Commit b0b3b2c78ec0 ("powerpc: Switch to relative jump labels") switched
us to using relative jump labels. That involves changing the code,
target and key members in struct jump_entry to be relative to the
address of the jump_entry, rather than absolute addresses.

We have two static inlines that create a struct jump_entry,
arch_static_branch() and arch_static_branch_jump(), as well as an asm
macro ARCH_STATIC_BRANCH, which is used by the pseries-only hypervisor
tracing code.

Unfortunately we missed updating the key to be a relative reference in
ARCH_STATIC_BRANCH.

That causes a pseries kernel to have a handful of jump_entry structs
with bad key values. Instead of being a relative reference they instead
hold the full address of the key.

However the code doesn't expect that, it still adds the key value to the
address of the jump_entry (see jump_entry_key()) expecting to get a
pointer to a key somewhere in kernel data.

The table of jump_entry structs sits in rodata, which comes after the
kernel text. In a typical build this will be somewhere around 15MB. The
address of the key will be somewhere in data, typically around 20MB.
Adding the two values together gets us a pointer somewhere around 45MB.

We then call static_key_set_entries() with that bad pointer and modify
some members of the struct static_key we think we are pointing at.

A pseries kernel is typically ~30MB in size, so writing to ~45MB won't
corrupt the kernel itself. However if we're booting with an initrd,
depending on the size and exact location of the initrd, we can corrupt
the initrd. Depending on how exactly we corrupt the initrd it can either
cause the system to not boot, or just corrupt one of the files in the
initrd.

The fix is simply to make the key value relative to the jump_entry
struct in the ARCH_STATIC_BRANCH macro.

Fixes: b0b3b2c78ec0 ("powerpc: Switch to relative jump labels")
Reported-by: Anastasia Kovaleva 
Reported-by: Roman Bolshakov 
Reported-by: Greg Kurz 
Reported-by: Daniel Axtens 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/include/asm/jump_label.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h
index 2d5c6bec2b4f..93ce3ec25387 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -50,7 +50,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
 1098:  nop;\
.pushsection __jump_table, "aw";\
.long 1098b - ., LABEL - .; \
-   FTR_ENTRY_LONG KEY; \
+   FTR_ENTRY_LONG KEY - .; \
.popsection
 #endif
 
-- 
2.25.1



[PATCH v2] powerpc: make stack walking KASAN-safe

2021-06-14 Thread Daniel Axtens
Make our stack-walking code KASAN-safe by using __no_sanitize_address.
Generic code, arm64, s390 and x86 all make accesses unchecked for similar
sorts of reasons: when unwinding a stack, we might touch memory that KASAN
has marked as being out-of-bounds. In ppc64 KASAN development, I hit this
sometimes when checking for an exception frame - because we're checking
an arbitrary offset into the stack frame.

See commit 20955746320e ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c8366416
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer").

Cc: Naveen N. Rao 
Signed-off-by: Daniel Axtens 

---

v2: Use __no_sanitize_address, thanks Naveen
---
 arch/powerpc/kernel/process.c| 5 +++--
 arch/powerpc/kernel/stacktrace.c | 8 
 arch/powerpc/perf/callchain.c| 2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 89e34aa273e2..3464064a0b8b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2121,8 +2121,9 @@ unsigned long get_wchan(struct task_struct *p)
 
 static int kstack_depth_to_print = CONFIG_PRINT_STACK_DEPTH;
 
-void show_stack(struct task_struct *tsk, unsigned long *stack,
-   const char *loglvl)
+void __no_sanitize_address show_stack(struct task_struct *tsk,
+ unsigned long *stack,
+ const char *loglvl)
 {
unsigned long sp, ip, lr, newsp;
int count = 0;
diff --git a/arch/powerpc/kernel/stacktrace.c b/arch/powerpc/kernel/stacktrace.c
index 1deb1bf331dd..1961e6d5e33b 100644
--- a/arch/powerpc/kernel/stacktrace.c
+++ b/arch/powerpc/kernel/stacktrace.c
@@ -23,8 +23,8 @@
 
 #include 
 
-void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
-struct task_struct *task, struct pt_regs *regs)
+void __no_sanitize_address arch_stack_walk(stack_trace_consume_fn 
consume_entry, void *cookie,
+  struct task_struct *task, struct 
pt_regs *regs)
 {
unsigned long sp;
 
@@ -61,8 +61,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, 
void *cookie,
  *
  * If the task is not 'current', the caller *must* ensure the task is inactive.
  */
-int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry,
-void *cookie, struct task_struct *task)
+int __no_sanitize_address arch_stack_walk_reliable(stack_trace_consume_fn 
consume_entry,
+  void *cookie, struct 
task_struct *task)
 {
unsigned long sp;
unsigned long newsp;
diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 6c028ee513c0..082f6d0308a4 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -40,7 +40,7 @@ static int valid_next_sp(unsigned long sp, unsigned long 
prev_sp)
return 0;
 }
 
-void
+void __no_sanitize_address
 perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs 
*regs)
 {
unsigned long sp, next_sp;
-- 
2.27.0



Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks

2021-06-14 Thread Jessica Yu

+++ Nicholas Piggin [11/06/21 19:39 +1000]:

The elf_check_arch() function is used to test usermode binaries, but
kernel modules may have more specific requirements. powerpc would like
to test for ABI version compatibility.

Add an arch-overridable function elf_check_module_arch() that defaults
to elf_check_arch() and use it in elf_validity_check().

Signed-off-by: Michael Ellerman 
[np: split patch, added changelog]
Signed-off-by: Nicholas Piggin 
---
include/linux/moduleloader.h | 5 +
kernel/module.c  | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..fdc042a84562 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -13,6 +13,11 @@
 * must be implemented by each architecture.
 */

+// Allow arch to optionally do additional checking of module ELF header
+#ifndef elf_check_module_arch
+#define elf_check_module_arch elf_check_arch
+#endif


Hi Nicholas,

Why not make elf_check_module_arch() consistent with the other
arch-specific functions? Please see module_frob_arch_sections(),
module_{init,exit}_section(), etc in moduleloader.h. That is, they are
all __weak functions that are overridable by arches. We can maybe make
elf_check_module_arch() a weak symbol, available for arches to
override if they want to perform additional elf checks. Then we don't
have to have this one-off #define.

Thanks,

Jessica


+
/* Adjust arch-specific sections.  Return 0 on success.  */
int module_frob_arch_sections(Elf_Ehdr *hdr,
  Elf_Shdr *sechdrs,
diff --git a/kernel/module.c b/kernel/module.c
index 7e78dfabca97..7c3f9b7478dc 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2946,7 +2946,7 @@ static int elf_validity_check(struct load_info *info)

if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
|| info->hdr->e_type != ET_REL
-   || !elf_check_arch(info->hdr)
+   || !elf_check_module_arch(info->hdr)
|| info->hdr->e_shentsize != sizeof(Elf_Shdr))
return -ENOEXEC;

--
2.23.0



Re: [PATCH] powerpc/signal64: Don't read sigaction arguments back from user memory

2021-06-14 Thread Christophe Leroy




Le 14/06/2021 à 07:49, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of June 14, 2021 3:30 pm:



Le 14/06/2021 à 03:32, Nicholas Piggin a écrit :

Excerpts from Michael Ellerman's message of June 10, 2021 5:29 pm:

When delivering a signal to a sigaction style handler (SA_SIGINFO), we
pass pointers to the siginfo and ucontext via r4 and r5.

Currently we populate the values in those registers by reading the
pointers out of the sigframe in user memory, even though the values in
user memory were written by the kernel just prior:

unsafe_put_user(>info, >pinfo, badframe_block);
unsafe_put_user(>uc, >puc, badframe_block);
...
if (ksig->ka.sa.sa_flags & SA_SIGINFO) {
err |= get_user(regs->gpr[4], (unsigned long __user *)>pinfo);
err |= get_user(regs->gpr[5], (unsigned long __user *)>puc);

ie. we write >info into frame->pinfo, and then read frame->pinfo
back into r4, and similarly for >uc.

The code has always been like this, since linux-fullhistory commit
d4f2d95eca2c ("Forward port of 2.4 ppc64 signal changes.").

There's no reason for us to read the values back from user memory,
rather than just setting the value in the gpr[4/5] directly. In fact
reading the value back from user memory opens up the possibility of
another user thread changing the values before we read them back.
Although any process doing that would be racing against the kernel
delivering the signal, and would risk corrupting the stack, so that
would be a userspace bug.

Note that this is 64-bit only code, so there's no subtlety with the size
of pointers differing between kernel and user. Also the frame variable
is not modified to point elsewhere during the function.

In the past reading the values back from user memory was not costly, but
now that we have KUAP on some CPUs it is, so we'd rather avoid it for
that reason too.

So change the code to just set the values directly, using the same
values we have written to the sigframe previously in the function.

Note also that this matches what our 32-bit signal code does.

Using a version of will-it-scale's signal1_threads that sets SA_SIGINFO,
this results in a ~4% increase in signals per second on a Power9, from
229,777 to 239,766.


Good find, nice improvement. Will make it possible to make the error
handling much nicer too I think.

Reviewed-by: Nicholas Piggin 

You've moved copy_siginfo_to_user right up to the user access unlock,
could save 2 more KUAP lock/unlocks if we had an unsafe_clear_user. If
we can move the other user access stuff up as well, the stack frame
put_user could use unsafe_put_user as well, saving 1 more. Another few
percent?


I'm looking at making an 'unsafe' version of copy_siginfo_to_user().
That's straight forward for 'native' signals, but for compat signals that's 
more tricky.


Ah nice. Native is most important at the moment.



Finally not so easy. We have a quite efficient clear_user() which uses 'dcbz'. When replacing that 
by a simplistic unsafe_clear_user() on the same model as unsafe_copy_to_user(), performance are 
degradated on 32s. Need to implement it more efficiently.


Christophe


Re: [RFC PATCH 0/7] Memory hotplug/hotremove at subsection size

2021-06-14 Thread David Hildenbrand

On 02.06.21 17:56, Zi Yan wrote:

On 10 May 2021, at 10:36, Zi Yan wrote:


On 7 May 2021, at 10:00, David Hildenbrand wrote:


On 07.05.21 13:55, Michal Hocko wrote:

[I haven't read through respective patches due to lack of time but let
   me comment on the general idea and the underlying justification]

On Thu 06-05-21 17:31:09, David Hildenbrand wrote:

On 06.05.21 17:26, Zi Yan wrote:

From: Zi Yan 

Hi all,

This patchset tries to remove the restriction on memory hotplug/hotremove
granularity, which is always greater or equal to memory section size[1].
With the patchset, kernel is able to online/offline memory at a size independent
of memory section size, as small as 2MB (the subsection size).


... which doesn't make any sense as we can only online/offline whole memory
block devices.


Agreed. The subsection thingy is just a hack to workaround pmem
alignement problems. For the real memory hotplug it is quite hard to
argue for reasonable hotplug scenarios for very small physical memory
ranges wrt. to the existing sparsemem memory model.


The motivation is to increase MAX_ORDER of the buddy allocator and pageblock
size without increasing memory hotplug/hotremove granularity at the same time,


Gah, no. Please no. No.


Agreed. Those are completely independent concepts. MAX_ORDER is can be
really arbitrary irrespective of the section size with vmemmap sparse
model. The existing restriction is due to old sparse model not being
able to do page pointer arithmetic across memory sections. Is there any
reason to stick with that memory model for an advance feature you are
working on?


No. I just want to increase MAX_ORDER. If the existing restriction can
be removed, that will be great.



I gave it some more thought yesterday. I guess the first thing we should look 
into is increasing MAX_ORDER and leaving pageblock_order and section size as is 
-- finding out what we have to tweak to get that up and running. Once we have 
that in place, we can actually look into better fragmentation avoidance etc. 
One step at a time.


It makes sense to me.



Because that change itself might require some thought. Requiring that bigger 
MAX_ORDER depends on SPARSE_VMEMMAP is something reasonable to do.


OK, if with SPARSE_VMEMMAP MAX_ORDER can be set to be bigger than
SECTION_SIZE, it is perfectly OK to me. Since 1GB THP support, which I
want to add ultimately, will require SPARSE_VMEMMAP too (otherwise,
all page++ will need to be changed to nth_page(page,1)).



As stated somewhere here already, we'll have to look into making 
alloc_contig_range() (and main users CMA and virtio-mem) independent of 
MAX_ORDER and mainly rely on pageblock_order. The current handling in 
alloc_contig_range() is far from optimal as we have to isolate a whole 
MAX_ORDER - 1 page -- and on ZONE_NORMAL we'll fail easily if any part contains 
something unmovable although we don't even want to allocate that part. I 
actually have that on my list (to be able to fully support pageblock_order 
instead of MAX_ORDER -1 chunks in virtio-mem), however didn't have time to look 
into it.


So in your mind, for gigantic page allocation (> MAX_ORDER), 
alloc_contig_range()
should be used instead of buddy allocator while pageblock_order is kept at a 
small
granularity like 2MB. Is that the case? Isn’t it going to have high fail rate
when any of the pageblocks within a gigantic page range (like 1GB) becomes 
unmovable?
Are you thinking additional mechanism/policy to prevent such thing happening as
an additional step for gigantic page allocation? Like your ZONE_PREFER_MOVABLE 
idea?



Further, page onlining / offlining code and early init code most probably also 
needs care if MAX_ORDER - 1 crosses sections. Memory holes we might suddenly 
have in MAX_ORDER - 1 pages might become a problem and will have to be handled. 
Not sure which other code has to be tweaked (compaction? page isolation?).


Can you elaborate it a little more? From what I understand, memory holes mean 
valid
PFNs are not contiguous before and after a hole, so pfn++ will not work, but
struct pages are still virtually contiguous assuming SPARSE_VMEMMAP, meaning 
page++
would still work. So when MAX_ORDER - 1 crosses sections, additional code would 
be
needed instead of simple pfn++. Is there anything I am missing?

BTW, to test a system with memory holes, do you know is there an easy of adding
random memory holes to an x86_64 VM, which can help reveal potential missing 
pieces
in the code? Changing BIOS-e820 table might be one way, but I have no idea on
how to do it on QEMU.



Figuring out what needs care itself might take quite some effort.

One thing I was thinking about as well: The bigger our MAX_ORDER, the slower it 
could be to allocate smaller pages. If we have 1G pages, splitting them down to 
4k then takes 8 additional steps if I'm, not wrong. Of course, that's the worst 
case. Would be interesting to evaluate.


Sure. I am planning to check it too. As a simple start, I am going 

Re: [PATCH 00/21] Rid W=1 warnings from IDE

2021-06-14 Thread Lee Jones
On Mon, 07 Jun 2021, Christoph Hellwig wrote:

> Please don't touch this code as it is about to be removed entirely.

Do you have an ETA for this work?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog


Re: [PATCH] powerpc/signal64: Copy siginfo before changing regs->nip

2021-06-14 Thread Christophe Leroy




Le 14/06/2021 à 07:55, Nicholas Piggin a écrit :

Excerpts from Christophe Leroy's message of June 14, 2021 3:31 pm:



Le 14/06/2021 à 03:29, Nicholas Piggin a écrit :

Excerpts from Nicholas Piggin's message of June 14, 2021 10:47 am:

Excerpts from Michael Ellerman's message of June 8, 2021 11:46 pm:

In commit 96d7a4e06fab ("powerpc/signal64: Rewrite handle_rt_signal64()
to minimise uaccess switches") the 64-bit signal code was rearranged to
use user_write_access_begin/end().

As part of that change the call to copy_siginfo_to_user() was moved
later in the function, so that it could be done after the
user_write_access_end().

In particular it was moved after we modify regs->nip to point to the
signal trampoline. That means if copy_siginfo_to_user() fails we exit
handle_rt_signal64() with an error but with regs->nip modified, whereas
previously we would not modify regs->nip until the copy succeeded.

Returning an error from signal delivery but with regs->nip updated
leaves the process in a sort of half-delivered state. We do immediately
force a SEGV in signal_setup_done(), called from do_signal(), so the
process should never run in the half-delivered state.

However that SEGV is not delivered until we've gone around to
do_notify_resume() again, so it's possible some tracing could observe
the half-delivered state.

There are other cases where we fail signal delivery with regs partly
updated, eg. the write to newsp and SA_SIGINFO, but the latter at least
is very unlikely to fail as it reads back from the frame we just wrote
to.

Looking at other arches they seem to be more careful about leaving regs
unchanged until the copy operations have succeeded, and in general that
seems like good hygenie.

So although the current behaviour is not cleary buggy, it's also not
clearly correct. So move the call to copy_siginfo_to_user() up prior to
the modification of regs->nip, which is closer to the old behaviour, and
easier to reason about.


Good catch, should it still have a Fixes: tag though? Even if it's not
clearly buggy we want it to be patched.


Also...



Signed-off-by: Michael Ellerman 
---
   arch/powerpc/kernel/signal_64.c | 9 -
   1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index dca66481d0c2..f9e1f5428b9e 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -902,6 +902,10 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
unsafe_copy_to_user(>uc.uc_sigmask, set, sizeof(*set), 
badframe_block);
user_write_access_end();
   
+	/* Save the siginfo outside of the unsafe block. */

+   if (copy_siginfo_to_user(>info, >info))
+   goto badframe;
+
/* Make sure signal handler doesn't get spurious FP exceptions */
tsk->thread.fp_state.fpscr = 0;
   
@@ -915,11 +919,6 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,

regs->nip = (unsigned long) >tramp[0];
}
   
-

-   /* Save the siginfo outside of the unsafe block. */
-   if (copy_siginfo_to_user(>info, >info))
-   goto badframe;
-
/* Allocate a dummy caller frame for the signal handler. */
newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);


Does the same reasoning apply to this one and the ELF V1 function
descriptor thing? It seems like you could move all of that block
up instead. With your other SA_SIGINFO get_user patch, there would
then be no possibility of error after you start modifying regs.



To move the above in the user access block, we need to open a larger window. At 
the time being the
window opened only contains the 'frame'. 'newsp' points before the 'frame'.



Only by 64/128 bytes though. Is that a problem? Not for 64s. Could it
cause more overhead than it saves on other platforms?


No it is not a problem at all, just need to not be forgotten, on ppc64 it may go unnoticed, on 32s 
it will blew up if we forget to enlarge the access window and the access involves a different 256M 
segment (Very unlikely for sure but ...)




For protection, it looks like all the important control data is in the
signal frame anyway, this frame is just for stack unwinding?


That's my understanding as well.

Christophe


Re: [PATCH v9 11/14] swiotlb: Add restricted DMA alloc/free support.

2021-06-14 Thread Christoph Hellwig
On Mon, Jun 14, 2021 at 08:28:01AM +0200, Christoph Hellwig wrote:
> I think merging this with the next two patches would be a little more
> clear.

Sorry, I mean the next patch and the previous one.


Re: [PATCH v9 11/14] swiotlb: Add restricted DMA alloc/free support.

2021-06-14 Thread Christoph Hellwig
I think merging this with the next two patches would be a little more
clear.


Re: [PATCH v9 09/14] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:26:54PM +0800, Claire Chang wrote:
> Add a new function, release_slots, to make the code reusable for supporting
> different bounce buffer pools, e.g. restricted DMA pool.
> 
> Signed-off-by: Claire Chang 
> ---
>  kernel/dma/swiotlb.c | 35 ---
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 364c6c822063..a6562573f090 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -554,27 +554,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
> phys_addr_t orig_addr,
>   return tlb_addr;
>  }
>  
> -/*
> - * tlb_addr is the physical address of the bounce buffer to unmap.
> - */
> -void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
> -   size_t mapping_size, enum dma_data_direction dir,
> -   unsigned long attrs)
> +static void release_slots(struct device *dev, phys_addr_t tlb_addr)
>  {
> - struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned long flags;
> - unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
> + unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
>   int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
>   int nslots = nr_slots(mem->slots[index].alloc_size + offset);
>   int count, i;
>  
> - /*
> -  * First, sync the memory before unmapping the entry
> -  */
> - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> - (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
> - swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
> -
>   /*
>* Return the buffer to the free list by setting the corresponding
>* entries to indicate the number of contiguous entries available.
> @@ -609,6 +597,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   spin_unlock_irqrestore(>lock, flags);
>  }
>  
> +/*
> + * tlb_addr is the physical address of the bounce buffer to unmap.
> + */
> +void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
> +   size_t mapping_size, enum dma_data_direction dir,
> +   unsigned long attrs)
> +{
> + /*
> +  * First, sync the memory before unmapping the entry
> +  */
> + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> + (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
> + swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
> +
> + release_slots(dev, tlb_addr);

Can you give this a swiotlb_ prefix?

Otherwise looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v9 08/14] swiotlb: Move alloc_size to find_slots

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:26:53PM +0800, Claire Chang wrote:
> Move the maintenance of alloc_size to find_slots for better code
> reusability later.

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v9 07/14] swiotlb: Bounce data from/to restricted DMA pool if available

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:26:52PM +0800, Claire Chang wrote:
> Regardless of swiotlb setting, the restricted DMA pool is preferred if
> available.
> 
> The restricted DMA pools provide a basic level of protection against the
> DMA overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system
> needs to provide a way to lock down the memory access, e.g., MPU.
> 
> Note that is_dev_swiotlb_force doesn't check if
> swiotlb_force == SWIOTLB_FORCE. Otherwise the memory allocation behavior
> with default swiotlb will be changed by the following patche
> ("dma-direct: Allocate memory from restricted DMA pool if available").
> 
> Signed-off-by: Claire Chang 
> ---
>  include/linux/swiotlb.h | 10 +-
>  kernel/dma/direct.c |  3 ++-
>  kernel/dma/direct.h |  3 ++-
>  kernel/dma/swiotlb.c|  1 +
>  4 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 06cf17a80f5c..8200c100fe10 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
>   *   unmap calls.
>   * @debugfs: The dentry to debugfs.
>   * @late_alloc:  %true if allocated using the page allocator
> + * @force_swiotlb: %true if swiotlb is forced
>   */
>  struct io_tlb_mem {
>   phys_addr_t start;
> @@ -95,6 +96,7 @@ struct io_tlb_mem {
>   spinlock_t lock;
>   struct dentry *debugfs;
>   bool late_alloc;
> + bool force_swiotlb;
>   struct io_tlb_slot {
>   phys_addr_t orig_addr;
>   size_t alloc_size;
> @@ -115,6 +117,11 @@ static inline void swiotlb_set_io_tlb_default_mem(struct 
> device *dev)
>   dev->dma_io_tlb_mem = io_tlb_default_mem;
>  }
>  
> +static inline bool is_dev_swiotlb_force(struct device *dev)
> +{
> + return dev->dma_io_tlb_mem->force_swiotlb;
> +}
> +
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> @@ -126,8 +133,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  {
>   return false;
>  }
> -static inline void swiotlb_set_io_tlb_default_mem(struct device *dev)
> +static inline bool is_dev_swiotlb_force(struct device *dev)
>  {
> + return false;
>  }
>  static inline void swiotlb_exit(void)
>  {
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 7a88c34d0867..078f7087e466 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -496,7 +496,8 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  {
>   /* If SWIOTLB is active, use its maximum mapping size */
>   if (is_swiotlb_active(dev) &&
> - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
> + (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
> +  is_dev_swiotlb_force(dev)))

I think we can remove the extra swiotlb_force check here if the
swiotlb_force setting is propagated into io_tlb_default_mem->force
when that is initialized. This avoids an extra check in the fast path.

> - if (unlikely(swiotlb_force == SWIOTLB_FORCE))
> + if (unlikely(swiotlb_force == SWIOTLB_FORCE) ||
> + is_dev_swiotlb_force(dev))

Same here.


Re: [PATCH v9 06/14] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-14 Thread Christoph Hellwig
>  kernel/dma/direct.c  | 2 +-
>  kernel/dma/swiotlb.c | 4 ++--
>  6 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> index ce6b664b10aa..89a894354263 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
> @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
> drm_i915_gem_object *obj)
>  
>   max_order = MAX_ORDER;
>  #ifdef CONFIG_SWIOTLB
> - if (is_swiotlb_active()) {
> + if (is_swiotlb_active(obj->base.dev->dev)) {

This is the same device used for DMA mapping in
i915_gem_gtt_prepare_pages, so this looks good.

> index f4c2e46b6fe1..2ca9d9a9e5d5 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
> @@ -276,7 +276,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
>   }
>  
>  #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
> - need_swiotlb = is_swiotlb_active();
> + need_swiotlb = is_swiotlb_active(dev->dev);
>  #endif

This looks good, too.

> diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
> index b7a8f3a1921f..0d56985bfe81 100644
> --- a/drivers/pci/xen-pcifront.c
> +++ b/drivers/pci/xen-pcifront.c
> @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
> pcifront_device *pdev)
>  
>   spin_unlock(_dev_lock);
>  
> - if (!err && !is_swiotlb_active()) {
> + if (!err && !is_swiotlb_active(>xdev->dev)) {

This looks good as well.

So I think the devices are all good.

Reviewed-by: Christoph Hellwig 


Re: [PATCH v9 05/14] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-14 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v9 04/14] swiotlb: Add restricted DMA pool initialization

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:26:49PM +0800, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes.

Bisection hazard:  we should only add the new config option when the
code is actually read to be used.  So this patch should move to the end
of the series.


Re: [PATCH v9 03/14] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:33:15PM +0800, Claire Chang wrote:
> I'm not sure if this would break arch/x86/pci/sta2x11-fixup.c
> swiotlb_late_init_with_default_size is called here
> https://elixir.bootlin.com/linux/v5.13-rc5/source/arch/x86/pci/sta2x11-fixup.c#L60

It will.  It will also break all non-OF devices.  I think you need to
initialize the initial pool in device_initialize, which covers all devices.


Re: [PATCH v9 02/14] swiotlb: Refactor swiotlb_create_debugfs

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:26:47PM +0800, Claire Chang wrote:
> Split the debugfs creation to make the code reusable for supporting
> different bounce buffer pools, e.g. restricted DMA pool.
> 
> Signed-off-by: Claire Chang 
> ---
>  kernel/dma/swiotlb.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 1a1208c81e85..8a3e2b3b246d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -64,6 +64,9 @@
>  enum swiotlb_force swiotlb_force;
>  
>  struct io_tlb_mem *io_tlb_default_mem;
> +#ifdef CONFIG_DEBUG_FS
> +static struct dentry *debugfs_dir;
> +#endif

What about moving this declaration into the main CONFIG_DEBUG_FS block
near the functions using it?

Otherwise looks good:

Reviewed-by: Christoph Hellwig 


Re: [PATCH v9 01/14] swiotlb: Refactor swiotlb init functions

2021-06-14 Thread Christoph Hellwig
On Fri, Jun 11, 2021 at 11:26:46PM +0800, Claire Chang wrote:
> + spin_lock_init(>lock);
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
> + mem->slots[i].alloc_size = 0;
> + }
> +
> + if (memory_decrypted)
> + set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> + memset(vaddr, 0, bytes);

We don't really need to do this call before the memset.  Which means we
can just move it to the callers that care instead of having a bool
argument.

Otherwise looks good:

Reviewed-by: Christoph Hellwig