Tao Xu, Liu Jingqi,
Thanks for doing these updates.
On Fri, 9 Aug 2019 14:57:29 +0800
Tao <tao3...@intel.com> wrote:
> From: Liu Jingqi <jingqi....@intel.com>
>
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).
>
> Signed-off-by: Liu Jingqi <jingqi....@intel.com>
> Signed-off-by: Tao Xu <tao3...@intel.com>
> ---
> hw/acpi/hmat.h | 3 +
> hw/core/numa.c | 185
> ++++++++++++++++++++++++++++++++++++++++++ include/sysemu/numa.h |
> 2 + qapi/machine.json | 95 +++++++++++++++++++++-
> qemu-options.hx | 44 +++++++++-
> 5 files changed, 326 insertions(+), 3 deletions(-)
>
> diff --git a/hw/acpi/hmat.h b/hw/acpi/hmat.h
> index 6c32f12e78..b7c1e02cf0 100644
> --- a/hw/acpi/hmat.h
> +++ b/hw/acpi/hmat.h
> @@ -42,6 +42,9 @@
>
> #define HMAT_IS_LATENCY(type) (type <= HMAT_LB_DATA_WRITE_LATENCY)
>
> +#define PICO_PER_USEC 1000000
> +#define PICO_PER_NSEC 1000
> +
> struct HMAT_LB_Info {
> /*
> * Indicates total number of Proximity Domains
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index cfb6339810..9a494145f3 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -37,6 +37,7 @@
> #include "qemu/option.h"
> #include "qemu/config-file.h"
> #include "qemu/cutils.h"
> +#include "hw/acpi/hmat.h"
>
> QemuOptsList qemu_numa_opts = {
> .name = "numa",
> @@ -183,6 +184,184 @@ void parse_numa_distance(MachineState *ms,
> NumaDistOptions *dist, Error **errp)
> ms->numa_state->have_numa_distance = true; }
>
> +void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
> + Error **errp)
> +{
..
Optional; you could support not connected (0xffff) for latency/bandwidth in
this parsing.
> + if (*endptr == '\0') {
> + base_lat = 1;
> + } else if (*(endptr + 1) == 's') {
> + switch (*endptr) {
> + case 'p':
> + base_lat = 1;
> + break;
> + case 'n':
> + base_lat = PICO_PER_NSEC;
> + break;
> + case 'u':
Glad you picked up my mismatch of "u/micro".
> + } else {
> + error_setg(errp, "Invalid latency unit %s,"
> + "vaild units are \"ps\" \"ns\" \"us\"",
>node->latency);
typo "valid"
> + } else if (hmat_lb->base_lat != base_lat) {
> + error_setg(errp, "Invalid latency unit %s,"
> + " please unify the units.", node->latency);
This error is misleading. Should be something like "all latencies must be
specified in the same units"
> + switch (toupper(*endptr)) {
> + case '\0':
> + case 'M':
> + base_bw = 1;
> + break;
> + case 'G':
> + base_bw = UINT64_C(1) << 10;
> + break;
There was one more gap - Terra.
case 'T':
base_bw = UINT64_C(1) << 20;
break;
> + case 'P':
> + base_bw = UINT64_C(1) << 20;
and:
base_bw = UINT64_C(1) << 30;
> + break;
> + }
Currently Linux 5.3.0-rc3+ doesn't cope with real corrected "bandwidth=2P" so
maybe not worth it.
[ 2.092060] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
Domains:1 Target Domains:2 Base:1073741824 [ 2.092326]
Initiator-Target[0-0]:-2147483648 MB/s
On values, testing for overflow is required. e.g:
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=4096T
bandwidth=4096T
[ 2.047676] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
Domains:1 Target Domains:2 Base:1048576
[ 2.048084] Initiator-Target[0-0]:0 MB/s
Technically ACPI could support up to 4P with base/offset but you'd need to be a
lot trickier (i.e. base is highest common multiple of all entries and then see
if entry/base > 2^32-2 ) with base/entry values to arrive at this number.
+docs/commit message propagation of this.
> + } else if (hmat_lb->base_lat != base_lat) {
Bug: Incorrectly copied - base_lat should be base_bw (twice)
> + error_setg(errp, "Invalid bandwidth unit %s,"
> + " please unify the units.", node->bandwidth);
This error is misleading. Should be something like "all bandwidths must be
specified in the same units"
> diff --git a/qemu-options.hx b/qemu-options.hx
> index c480781992..cda4607f3a 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> +@example
> +-m 2G \
> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0
> -numa node,nodeid=0,memdev=ram-node0 \
> +-object memory-backend-ram,size=1024M,policy=bind,host-nodes=1,id=ram-node1
> -numa node,nodeid=1,memdev=ram-node1 \
> +-smp 2 \
> +-numa cpu,node-id=0,socket-id=0 \
> +-numa cpu,node-id=0,socket-id=1 \
> +-numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=5ns
> \
> +-numa
> hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M
> \
> +-numa
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=10ns
> \
> +-numa
> hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,bandwidth=100M
> \
> +@end example
nit: remove slash on last line
Is this a valid example? I get
qemu-system-x86_64: -numa
hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,latency=11us:
Invalid target=1, it hasn't a valid initiator proximity domain.
(I tested with host-nodes=1 changed to 0 as local machine is single node)
Technically on [PATCH v9 07/11]
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index abf99b1adc..431818dc82 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -67,11 +67,81 @@ static void build_hmat_mpda(GArray *table_data, uint16_t
flags, int initiator,
build_append_int_noprefix(table_data, 0, 8);
}
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-142
nit: 5-146
Test as follows:
qemu-system-x86_64 -kernel /home/dan/repos/linux/vmlinux -nographic -append
console=ttyS0 \
-m 2G -object
memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \
-numa node,nodeid=0,memdev=ram-node0 \
-object memory-backend-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node1 \
-numa node,nodeid=1,memdev=ram-node1 -smp 2 -numa cpu,node-id=0,socket-id=0 \
-numa cpu,node-id=0,socket-id=1 \
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,latency=123us
\
-numa
hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,bandwidth=200M
\
-numa
hmat-cache,node-id=0,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
\
-numa
hmat-cache,node-id=1,size=0x20000,total=1,level=1,assoc=direct,policy=write-back,line=8
\
| grep -A 5 HMAT
[ 0.038912] ACPI: HMAT 0x000000007FFE16C5 000118 (v02 BOCHS BXPCHMAT
00000001 BXPC 00000001)
[ 0.040954] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[ 0.040999] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[ 0.041189] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.041250] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0x3fffffff]
[ 0.041276] ACPI: SRAT: Node 1 PXM 1 [mem 0x40000000-0x7fffffff]
--
[ 1.984572] HMAT: Memory Flags:0001 Processor Domain:0 Memory Domain:0
[ 1.984792] HMAT: Memory Flags:0000 Processor Domain:0 Memory Domain:1
[ 1.985435] HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:1
Target Domains:2 Base:1000000
[ 1.986424] Initiator-Target[0-0]:123000 nsec
[ 1.986664] Initiator-Target[0-1]:0 nsec
[ 1.986910] HMAT: Locality: Flags:00 Type:Access Bandwidth Initiator
Domains:1 Target Domains:2 Base:1
[ 1.987229] Initiator-Target[0-0]:200 MB/s
[ 1.987356] Initiator-Target[0-1]:0 MB/s
[ 1.987549] HMAT: Cache: Domain:0 Size:131072 Attrs:00081111 SMBIOS Handles:0
[ 1.988393] HMAT: Cache: Domain:1 Size:131072 Attrs:00081111 SMBIOS Handles:0
Leaving default latency/bw as 0 if ok as spec says '0: the corresponding latency
or bandwidth information is not provided.'. Potentially the kernel could
display this better.
Also note https://marc.info/?l=linux-acpi&m=156506549410279&w=2 submitted as
hmat_build_table_structs only calls build_hmat_mpda with flags=0 or
HMAT_PROX_INIT_VALID (0x1) which is right looking at ACPI-6.3. An Ack/(Nack if
I'm wrong) there would be good to have both kernel and this patch series
working together.
for entire series:
Reviewed-by: Daniel Black <dan...@linux.ibm.com>