Re: [PATCHv5 10/10] doc/mm: New documentation for memory performance

2019-02-06 Thread Keith Busch
On Wed, Feb 06, 2019 at 10:45:52AM +, Jonathan Cameron wrote:
> On Thu, 24 Jan 2019 16:07:24 -0700
> Keith Busch  wrote:
> > +   # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/
> > +   /sys/devices/system/node/nodeY/access0/
> > +   |-- read_bandwidth
> > +   |-- read_latency
> > +   |-- write_bandwidth
> > +   `-- write_latency
> 
> These seem to be under
> /sys/devices/system/node/nodeY/access0/initiators/
> (so one directory deeper).

You're right, I used data from the previous series to generate that.
 
> > +   # tree sys/devices/system/node/node0/side_cache/
> > +   /sys/devices/system/node/node0/side_cache/
> > +   |-- index1
> > +   |   |-- associativity
> > +   |   |-- level
> 
> What is the purpose of having level in here?  Isn't it the same as the A..C
> in the index naming?

Yes, it is redundant with the name. I will remove it.
 
> > +   |   |-- line_size
> > +   |   |-- size
> > +   |   `-- write_policy
> > +
> > +The "associativity" will be 0 if it is a direct-mapped cache, and non-zero
> > +for any other indexed based, multi-way associativity.
> 
> Is it worth providing the ACPI mapping in this doc?  We have None, Direct and
> 'complex'.   Fun question of what None means?  Not specified?

Yeah, my take on "none" was that it's unreported and we don't know what
is actually happening..

> > +
> > +The "level" is the distance from the far memory, and matches the number
> > +appended to its "index" directory.
> > +
> > +The "line_size" is the number of bytes accessed on a cache miss.
> 
> Maybe "number of bytes accessed from next cache level" ?

Sounds good.


Re: [PATCHv5 10/10] doc/mm: New documentation for memory performance

2019-02-06 Thread Jonathan Cameron
On Thu, 24 Jan 2019 16:07:24 -0700
Keith Busch  wrote:

> Platforms may provide system memory where some physical address ranges
> perform differently than others, or is side cached by the system.
> 
> Add documentation describing a high level overview of such systems and the
> perforamnce and caching attributes the kernel provides for applications
> wishing to query this information.
> 
> Reviewed-by: Mike Rapoport 
> Signed-off-by: Keith Busch 
Hi Keith,

Nice doc in general. Comments inline.

> ---
>  Documentation/admin-guide/mm/numaperf.rst | 167 
> ++
>  1 file changed, 167 insertions(+)
>  create mode 100644 Documentation/admin-guide/mm/numaperf.rst
> 
> diff --git a/Documentation/admin-guide/mm/numaperf.rst 
> b/Documentation/admin-guide/mm/numaperf.rst
> new file mode 100644
> index ..52999336a8ed
> --- /dev/null
> +++ b/Documentation/admin-guide/mm/numaperf.rst
> @@ -0,0 +1,167 @@
> +.. _numaperf:
> +
> +=
> +NUMA Locality
> +=
> +
> +Some platforms may have multiple types of memory attached to a single
> +CPU. These disparate memory ranges share some characteristics, such as
> +CPU cache coherence, but may have different performance. For example,
> +different media types and buses affect bandwidth and latency.

This seems a bit restrictive, but I it gives a starting point.
I guess anyone who has a more complex system should look elsewhere for
how this maps to it!

> +
> +A system supporting such heterogeneous memory by grouping each memory
> +type under different "nodes" based on similar CPU locality and performance
> +characteristics.  Some memory may share the same node as a CPU, and others
> +are provided as memory only nodes. While memory only nodes do not provide
> +CPUs, they may still be directly accessible, or local, to one or more
> +compute nodes.

Perhaps define directly accessible?  I'm not keen on saying that they don't
involve an interconnect as that rules out things like CCIX with remote
memory homes.  The reality is this patch set works fine for that case.

The one or more compute nodes can only happen (I think) with a very weird
setup of an interconnect involved which is likely to have other data on it.

+ The following diagram shows one such example of two compute
> +nodes with local memory and a memory only node for each of compute node:
> +
> + +--+ +--+
> + | Compute Node 0   +-+ Compute Node 1   |
> + | Local Node0 Mem  | | Local Node1 Mem  |
> + ++-+ ++-+
> +  ||
> + ++-+ ++-+
> + | Slower Node2 Mem | | Slower Node3 Mem |
> + +--+ ++-+
> +
> +A "memory initiator" is a node containing one or more devices such as
> +CPUs or separate memory I/O devices that can initiate memory requests.
> +A "memory target" is a node containing one or more physical address
> +ranges accessible from one or more memory initiators.
> +
> +When multiple memory initiators exist, they may not all have the same
> +performance when accessing a given memory target. Each initiator-target
> +pair may be organized into different ranked access classes to represent
> +this relationship. The highest performing initiator to a given target
> +is considered to be one of that target's local initiators, and given
> +the highest access class, 0. Any given target may have one or more
> +local initiators, and any given initiator may have multiple local
> +memory targets.
> +
> +To aid applications matching memory targets with their initiators, the
> +kernel provides symlinks to each other. The following example lists the
> +relationship for the access class "0" memory initiators and targets, which is
> +the of nodes with the highest performing access relationship::
> +
> + # symlinks -v /sys/devices/system/node/nodeX/access0/targets/
> + relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> 
> ../../nodeY
> +
> + # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
> + relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> 
> ../../nodeX
> +
> +
> +NUMA Performance
> +
> +
> +Applications may wish to consider which node they want their memory to
> +be allocated from based on the node's performance characteristics. If
> +the system provides these attributes, the kernel exports them under the
> +node sysfs hierarchy by appending the attributes directory under the
> +memory node's access class 0 initiators as follows::
> +
> + /sys/devices/system/node/nodeY/access0/initiators/
> +
> +These attributes apply only when accessed from nodes that have the
> +are linked under the this access's inititiators.
> +
> +The performance characteristics the kernel provides for the local initiators
> +are exported are as follows::
> +
> + # tree -P "read*|write*" /sys/devices/system/node/nodeY/

[PATCHv5 10/10] doc/mm: New documentation for memory performance

2019-01-24 Thread Keith Busch
Platforms may provide system memory where some physical address ranges
perform differently than others, or is side cached by the system.

Add documentation describing a high level overview of such systems and the
perforamnce and caching attributes the kernel provides for applications
wishing to query this information.

Reviewed-by: Mike Rapoport 
Signed-off-by: Keith Busch 
---
 Documentation/admin-guide/mm/numaperf.rst | 167 ++
 1 file changed, 167 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/numaperf.rst

diff --git a/Documentation/admin-guide/mm/numaperf.rst 
b/Documentation/admin-guide/mm/numaperf.rst
new file mode 100644
index ..52999336a8ed
--- /dev/null
+++ b/Documentation/admin-guide/mm/numaperf.rst
@@ -0,0 +1,167 @@
+.. _numaperf:
+
+=
+NUMA Locality
+=
+
+Some platforms may have multiple types of memory attached to a single
+CPU. These disparate memory ranges share some characteristics, such as
+CPU cache coherence, but may have different performance. For example,
+different media types and buses affect bandwidth and latency.
+
+A system supporting such heterogeneous memory by grouping each memory
+type under different "nodes" based on similar CPU locality and performance
+characteristics.  Some memory may share the same node as a CPU, and others
+are provided as memory only nodes. While memory only nodes do not provide
+CPUs, they may still be directly accessible, or local, to one or more
+compute nodes. The following diagram shows one such example of two compute
+nodes with local memory and a memory only node for each of compute node:
+
+ +--+ +--+
+ | Compute Node 0   +-+ Compute Node 1   |
+ | Local Node0 Mem  | | Local Node1 Mem  |
+ ++-+ ++-+
+  ||
+ ++-+ ++-+
+ | Slower Node2 Mem | | Slower Node3 Mem |
+ +--+ ++-+
+
+A "memory initiator" is a node containing one or more devices such as
+CPUs or separate memory I/O devices that can initiate memory requests.
+A "memory target" is a node containing one or more physical address
+ranges accessible from one or more memory initiators.
+
+When multiple memory initiators exist, they may not all have the same
+performance when accessing a given memory target. Each initiator-target
+pair may be organized into different ranked access classes to represent
+this relationship. The highest performing initiator to a given target
+is considered to be one of that target's local initiators, and given
+the highest access class, 0. Any given target may have one or more
+local initiators, and any given initiator may have multiple local
+memory targets.
+
+To aid applications matching memory targets with their initiators, the
+kernel provides symlinks to each other. The following example lists the
+relationship for the access class "0" memory initiators and targets, which is
+the of nodes with the highest performing access relationship::
+
+   # symlinks -v /sys/devices/system/node/nodeX/access0/targets/
+   relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> 
../../nodeY
+
+   # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/
+   relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> 
../../nodeX
+
+
+NUMA Performance
+
+
+Applications may wish to consider which node they want their memory to
+be allocated from based on the node's performance characteristics. If
+the system provides these attributes, the kernel exports them under the
+node sysfs hierarchy by appending the attributes directory under the
+memory node's access class 0 initiators as follows::
+
+   /sys/devices/system/node/nodeY/access0/initiators/
+
+These attributes apply only when accessed from nodes that have the
+are linked under the this access's inititiators.
+
+The performance characteristics the kernel provides for the local initiators
+are exported are as follows::
+
+   # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/
+   /sys/devices/system/node/nodeY/access0/
+   |-- read_bandwidth
+   |-- read_latency
+   |-- write_bandwidth
+   `-- write_latency
+
+The bandwidth attributes are provided in MiB/second.
+
+The latency attributes are provided in nanoseconds.
+
+The values reported here correspond to the rated latency and bandwidth
+for the platform.
+
+==
+NUMA Cache
+==
+
+System memory may be constructed in a hierarchy of elements with various
+performance characteristics in order to provide large address space of
+slower performing memory side-cached by a smaller higher performing
+memory. The system physical addresses that initiators are aware of
+are provided by the last memory level in the hierarchy. The system
+meanwhile uses higher performing memory to transparently c