On 6/22/21 9:07 AM, Aneesh Kumar K.V wrote:
Daniel Henrique Barboza <danielhb...@gmail.com> writes:

On 6/17/21 1:51 PM, Aneesh Kumar K.V wrote:
PAPR interface currently supports two different ways of communicating resource
grouping details to the OS. These are referred to as Form 0 and Form 1
associativity grouping. Form 0 is the older format and is now considered
deprecated. This patch adds another resource grouping named FORM2.

Signed-off-by: Daniel Henrique Barboza <danielhb...@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.ibm.com>
---
   Documentation/powerpc/associativity.rst   | 135 ++++++++++++++++++++
   arch/powerpc/include/asm/firmware.h       |   3 +-
   arch/powerpc/include/asm/prom.h           |   1 +
   arch/powerpc/kernel/prom_init.c           |   3 +-
   arch/powerpc/mm/numa.c                    | 149 +++++++++++++++++++++-
   arch/powerpc/platforms/pseries/firmware.c |   1 +
   6 files changed, 286 insertions(+), 6 deletions(-)
   create mode 100644 Documentation/powerpc/associativity.rst

diff --git a/Documentation/powerpc/associativity.rst 
b/Documentation/powerpc/associativity.rst
new file mode 100644
index 000000000000..93be604ac54d
--- /dev/null
+++ b/Documentation/powerpc/associativity.rst
@@ -0,0 +1,135 @@
+============================
+NUMA resource associativity
+=============================
+
+Associativity represents the groupings of the various platform resources into
+domains of substantially similar mean performance relative to resources outside
+of that domain. Resources subsets of a given domain that exhibit better
+performance relative to each other than relative to other resources subsets
+are represented as being members of a sub-grouping domain. This performance
+characteristic is presented in terms of NUMA node distance within the Linux 
kernel.
+From the platform view, these groups are also referred to as domains.
+
+PAPR interface currently supports different ways of communicating these 
resource
+grouping details to the OS. These are referred to as Form 0, Form 1 and Form2
+associativity grouping. Form 0 is the older format and is now considered 
deprecated.
+
+Hypervisor indicates the type/form of associativity used via "ibm,arcitecture-vec-5 
property".
+Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of 
Form 0 or Form 1.
+A value of 1 indicates the usage of Form 1 associativity. For Form 2 
associativity
+bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
+
+Form 0
+-----
+Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
+
+Form 1
+-----
+With Form 1 a combination of ibm,associativity-reference-points and 
ibm,associativity
+device tree properties are used to determine the NUMA distance between 
resource groups/domains.
+
+The “ibm,associativity” property contains one or more lists of numbers 
(domainID)
+representing the resource’s platform grouping domains.
+
+The “ibm,associativity-reference-points” property contains one or more list of 
numbers
+(domainID index) that represents the 1 based ordinal in the associativity 
lists.
+The list of domainID index represnets increasing hierachy of resource grouping.
+
+ex:
+{ primary domainID index, secondary domainID index, tertiary domainID index.. }
+
+Linux kernel uses the domainID at the primary domainID index as the NUMA node 
id.
+Linux kernel computes NUMA distance between two domains by recursively 
comparing
+if they belong to the same higher-level domains. For mismatch at every higher
+level of the resource group, the kernel doubles the NUMA distance between the
+comparing domains.
+
+Form 2
+-------
+Form 2 associativity format adds separate device tree properties representing 
NUMA node distance
+thereby making the node distance computation flexible. Form 2 also allows 
flexible primary
+domain numbering. With numa distance computation now detached from the index 
value of
+"ibm,associativity" property, Form 2 allows a large number of primary domain 
ids at the
+same domainID index representing resource groups of different 
performance/latency characteristics.
+
+Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in 
the
+"ibm,architecture-vec-5" property.
+
+"ibm,numa-lookup-index-table" property contains one or more list numbers 
representing
+the domainIDs present in the system. The offset of the domainID in this 
property is considered
+the domainID index.
+
+prop-encoded-array: The number N of the domainIDs encoded as with encode-int, 
followed by
+N domainID encoded as with encode-int
+
+For ex:
+ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index for 
domainID 8 is 1.
+
+"ibm,numa-distance-table" property contains one or more list of numbers 
representing the NUMA
+distance between resource groups/domains present in the system.
+
+prop-encoded-array: The number N of the distance values encoded as with 
encode-int, followed by
+N distance values encoded as with encode-bytes. The max distance value we 
could encode is 255.
+
+For ex:
+ibm,numa-lookup-index-table =  {3, 0, 8, 40}
+ibm,numa-distance-table     =  {9, 10, 20, 80, 20, 10, 160, 80, 160, 10}
+
+  | 0    8   40
+--|------------
+  |
+0 | 10   20  80
+  |
+8 | 20   10  160
+  |
+40| 80   160  10
+
+
+"ibm,associativity" property for resources in node 0, 8 and 40
+
+{ 3, 6, 7, 0 }
+{ 3, 6, 9, 8 }
+{ 3, 6, 7, 40}
+
+With "ibm,associativity-reference-points"  { 0x3 }

With this configuration, would the following ibm,associativity arrays
also be valid?


{ 3, 0, 0, 0 }
{ 3, 0, 0, 8 }
{ 3, 0, 0, 40}


Yes

If yes, then we need a way to tell that the associativity domains assignment
are optional, and FORM2 relies solely on finding out the domainID of the
resource (0, 8 and 40) to retrieve the domainID index, and with this
index all performance metrics can be retrieved from the numa-* properties
(numa-distance-table, numa-bandwidth-table ...).


Where do you suggest we clarify that? I agree that it is not explicitly
mentioned. But we describe the details of how we find the numa distance
with example in the document.


Perhaps something like this, right in the middle of the example:


----------------

(...)

+  | 0    8   40
+--|------------
+  |
+0 | 10   20  80
+  |
+8 | 20   10  160
+  |
+40| 80   160  10
+
+

With "ibm,associativity-reference-points" equal to { 0x3 }, the domainID of
each resource is located at index 3 of each ibm,associativity property:

+{ 3, 6, 7, 0 }
+{ 3, 6, 9, 8 }
+{ 3, 6, 7, 40 }


FORM2 requires the ibm,associativity array to contain the domainID of the
resource, which is defined by the ibm,associativity-reference-points.
Calculating the associativity domains of the remaining ibm,associativity
elements is not obligatory. In this example, the following ibm,associativity
arrays are also valid:

{ 3, 0, 0, 0 }
{ 3, 0, 0, 8 }
{ 3, 0, 0, 40 }

(...)

-------------



Retrieving the resource domainID is done by using 
ibm,associativity-reference-points.

This will allow the platform to implement FORM2 such as:

{ 1, 0 }
{ 1, 8 }
{ 1, 40 }
- ref-points: { 0x1 }

If the platform chooses to do so.


That is correct.


+
+Each resource (drcIndex) now also supports additional optional device tree 
properties.
+These properties are marked optional because the platform can choose not to 
export
+them and provide the system topology details using the earlier defined device 
tree
+properties alone. The optional device tree properties are used when adding new 
resources
+(DLPAR) and when the platform didn't provide the topology details of the 
domain which
+contains the newly added resource during boot.
+
+"ibm,numa-lookup-index" property contains a number representing the domainID 
index to be used
+when building the NUMA distance of the numa node to which this resource 
belongs. This can
+be looked at as the index at which this new domainID would have appeared in
+"ibm,numa-lookup-index-table" if the domain was present during boot. The 
domainID
+of the new resource can be obtained from the existing "ibm,associativity" 
property. This
+can be used to build distance information of a newly onlined NUMA node via 
DLPAR operation.
+The value is 1 based array index value.
+
+prop-encoded-array: An integer encoded as with encode-int specifying the 
domainID index
+
+"ibm,numa-distance" property contains one or more list of numbers presenting 
the NUMA distance
+from this resource domain to other resources.
+
+prop-encoded-array: The number N of the distance values encoded as with 
encode-int, followed by
+N distance values encoded as with encode-bytes. The max distance value we 
could encode is 255.
+
+For ex:
+ibm,associativity     = { 4, 5, 10, 50}
+ibm,numa-lookup-index = { 4 }
+ibm,numa-distance   =  {8, 160, 255, 80, 10, 160, 255, 80, 10}
+
+resulting in a new toplogy as below.
+  | 0    8   40   50
+--|------------------
+  |
+0 | 10   20  80   160
+  |
+8 | 20   10  160  255
+  |
+40| 80   160  10  80
+  |
+50| 160  255  80  10
+

I see there is no mention of the special PAPR SCM handling. I saw in
one of the your replies of v1:

"Another option is to make sure that numa-distance-value is populated
such that PMEMB distance indicates it is closer to node0 when compared
to node1. ie, node_distance[40][0] < node_distance[40][1]. One could
possibly infer the grouping based on the distance value and not deepend
on ibm,associativity for that purpose."


Is that was we're supposed to do with PAPR SCM? I'm not sure how that
affects NVDIMM support in QEMU with FORM2.



yes that is what we are doing with this version of the patchset (v4)
version. We can drop the nvdimm specific changes from Qemu.


I see. I'll drop the NVDIMM changes in the QEMU POC of FORM2 then.



Thanks,


Daniel


-aneesh

Reply via email to