Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information

Tao Xu Tue, 13 Aug 2019 19:59:05 -0700

On 8/13/2019 11:11 PM, Eric Blake wrote:

On 8/9/19 1:57 AM, Tao wrote:

From: Liu Jingqi <jingqi....@intel.com>


Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi <jingqi....@intel.com>
Signed-off-by: Tao Xu <tao3...@intel.com>
---

Changes in v9:
     - change the CLI input way, make it more user firendly (Daniel Black)
     use latency=NUM[p|n|u]s and bandwidth=NUM[M|G|P](B/s) as input and drop
     the base-lat and base-bw input.


Why are you hand-rolling yet another scaling parser instead of reusing
one that's already in-tree?

Because there are no time scaling parser and QMP 'size' type will use kbas default. It is a tricky issue because the entry in HMAT is small(max0xffff) and we need to store the unit in HMAT.


But as you mentioned blew, 'str' is not a good choice for QMP.
Therefore, what about this solution:

For bandwidth, reuse the qemu_strtosz_MiB() (because the smllest unit isMB/s). For latency, write a time scaling parser named as"qemu_strtotime_ps()" and "qemu_strtotime_ns()" in util/cutils.c. Andthen use it to pre-convert them into the single scale (QMP interface canuse).

At last, in HMAT, we auto store the data, separate it into the same baseunit and entry, and show error if overflow. Then the HMAT can support aslarge as possible.


I am wondering if this solution is OK.

+++ b/hw/core/numa.c

+void parse_numa_hmat_lb(MachineState *ms, NumaHmatLBOptions *node,
+                        Error **errp)
+{

+    if (node->has_latency) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = 
hmat_lb;
+        } else if (hmat_lb->latency[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the latency for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        ret = qemu_strtoui(node->latency, &endptr, 10, &latency);
+        if (ret < 0) {
+            error_setg(errp, "Invalid latency %s", node->latency);
+            return;
+        }
+
+        if (*endptr == '\0') {
+            base_lat = 1;
+        } else if (*(endptr + 1) == 's') {
+            switch (*endptr) {
+            case 'p':
+                base_lat = 1;
+                break;
+            case 'n':
+                base_lat = PICO_PER_NSEC;
+                break;
+            case 'u':
+                base_lat = PICO_PER_USEC;
+                break;


Hmm - this is a different scaling than any of our existing parsers
(which assume multiples k/M/G..., not subdivisions u/n/s)

+    if (node->has_bandwidth) {
+        hmat_lb = ms->numa_state->hmat_lb[node->hierarchy][node->data_type];
+
+        if (!hmat_lb) {
+            hmat_lb = g_malloc0(sizeof(*hmat_lb));
+            ms->numa_state->hmat_lb[node->hierarchy][node->data_type] = 
hmat_lb;
+        } else if (hmat_lb->bandwidth[node->initiator][node->target]) {
+            error_setg(errp, "Duplicate configuration of the bandwidth for "
+                       "initiator=%" PRIu16 " and target=%" PRIu16 ".",
+                       node->initiator, node->target);
+            return;
+        }
+
+        ret = qemu_strtoui(node->bandwidth, &endptr, 10, &bandwidth);
+        if (ret < 0) {
+            error_setg(errp, "Invalid bandwidth %s", node->bandwidth);
+            return;
+        }
+
+        switch (toupper(*endptr)) {
+        case '\0':
+        case 'M':
+            base_bw = 1;
+            break;
+        case 'G':
+            base_bw = UINT64_C(1) << 10;
+            break;
+        case 'P':
+            base_bw = UINT64_C(1) << 20;
+            break;


But this one, in addition to being wrong (P is 1<<30, not 1<<20), should
definitely be reusing qemu_strtosz_metric() or similar (look in
util/cutils.c).

+++ b/qapi/machine.json
@@ -377,10 +377,12 @@
  #
  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
  #
+# @hmat-lb: memory latency and bandwidth information (Since: 4.2)
+#
  # Since: 2.1
  ##
  { 'enum': 'NumaOptionsType',
-  'data': [ 'node', 'dist', 'cpu' ] }
+  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }

+##
+# @HmatLBDataType:
+#
+# Data type in the System Locality Latency
+# and Bandwidth Information Structure of HMAT (Heterogeneous
+# Memory Attribute Table)
+#
+# For more information of @HmatLBDataType see
+# the chapter 5.2.27.4: Table 5-142:  Field "Data Type" of ACPI 6.3 spec.
+#
+# @access-latency: access latency (picoseconds)
+#
+# @read-latency: read latency (picoseconds)
+#
+# @write-latency: write latency (picoseconds)
+#
+# @access-bandwidth: access bandwidth (MB/s)
+#
+# @read-bandwidth: read bandwidth (MB/s)
+#
+# @write-bandwidth: write bandwidth (MB/s)


Are these really the best scales?

+
+##
+# @NumaHmatLBOptions:
+#
+# Set the system locality latency and bandwidth information
+# between Initiator and Target proximity Domains.
+#
+# For more information of @NumaHmatLBOptions see
+# the chapter 5.2.27.4: Table 5-142 of ACPI 6.3 spec.
+#
+# @initiator: the Initiator Proximity Domain.
+#
+# @target: the Target Proximity Domain.
+#
+# @hierarchy: the Memory Hierarchy. Indicates the performance
+#             of memory or side cache.
+#
+# @data-type: presents the type of data, access/read/write
+#             latency or hit latency.
+#
+# @latency: the value of latency from @initiator to @target proximity domain,
+#           the latency units are "ps(picosecond)", "ns(nanosecond)" or
+#           "us(microsecond)".
+#
+# @bandwidth: the value of bandwidth between @initiator and @target proximity
+#             domain, the bandwidth units are "MB(/s)","GB(/s)" or "PB(/s)".
+#
+# Since: 4.2
+##
+{ 'struct': 'NumaHmatLBOptions',
+    'data': {
+    'initiator': 'uint16',
+    'target': 'uint16',
+    'hierarchy': 'HmatLBMemoryHierarchy',
+    'data-type': 'HmatLBDataType',
+    '*latency': 'str',
+    '*bandwidth': 'str' }}


...and then parsing strings instead of taking raw integers?  Parsing
strings is okay for HMP, but for QMP, our goal should be a single
representation with no additional sugar on top.  Latency and bandwidth
should be int in a single scale.

+++ b/qemu-options.hx
@@ -164,16 +164,19 @@ DEF("numa", HAS_ARG, QEMU_OPTION_numa,
      "-numa 
node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
      "-numa 
node[,memdev=id][,cpus=firstcpu[-lastcpu]][,nodeid=node][,initiator=node]\n"
      "-numa dist,src=source,dst=destination,val=distance\n"
-    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n",
+    "-numa cpu,node-id=node[,socket-id=x][,core-id=y][,thread-id=z]\n"
+    "-numa 
hmat-lb,initiator=node,target=node,hierarchy=memory|first-level|second-level|third-level,data-type=access-latency|read-latency|write-latency[,latency=lat][,bandwidth=bw]\n",


Command-line parsing can then take human-written scaled numbers, and
pre-convert them into the single scale accepted by the QMP interface.

Re: [Qemu-devel] [PATCH v9 09/11] numa: Extend the CLI to provide memory latency and bandwidth information

Reply via email to