From: Wim ten Have <wim.ten.h...@oracle.com>

This patch extends guest domain administration adding support to advertise
node sibling distances when configuring HVM numa guests.

NUMA (non-uniform memory access), a method of configuring a cluster of nodes
within a single multiprocessing system such that it shares processor
local memory amongst others improving performance and the ability of the
system to be expanded.

A NUMA system could be illustrated as shown below. Within this 4-node
system, every socket is equipped with its own distinct memory. The whole
typically resembles a SMP (symmetric multiprocessing) system being a
"tightly-coupled," "share everything" system in which multiple processors
are working under a single operating system and can access each others'
memory over multiple "Bus Interconnect" paths.

        +-----+-----+-----+         +-----+-----+-----+
        |  M  | CPU | CPU |         | CPU | CPU |  M  |
        |  E  |     |     |         |     |     |  E  |
        |  M  +- Socket0 -+         +- Socket3 -+  M  |
        |  O  |     |     |         |     |     |  O  |
        |  R  | CPU | CPU <---------> CPU | CPU |  R  |
        |  Y  |     |     |         |     |     |  Y  |
        +-----+--^--+-----+         +-----+--^--+-----+
                 |                           |
                 |      Bus Interconnect     |
                 |                           |
        +-----+--v--+-----+         +-----+--v--+-----+
        |  M  |     |     |         |     |     |  M  |
        |  E  | CPU | CPU <---------> CPU | CPU |  E  |
        |  M  |     |     |         |     |     |  M  |
        |  O  +- Socket1 -+         +- Socket2 -+  O  |
        |  R  |     |     |         |     |     |  R  |
        |  Y  | CPU | CPU |         | CPU | CPU |  Y  |
        +-----+-----+-----+         +-----+-----+-----+

In contrast there is the limitation of a flat SMP system (not illustrated)
under which the bus (data and address path) can easily become a performance
bottleneck under high activity as sockets are added.
NUMA adds an intermediate level of memory shared amongst a few cores per
socket as illustrated above, so that data accesses do not have to travel
over a single bus.

Unfortunately the way NUMA does this adds its own limitations. This,
as visualized in the illustration above, happens when data is stored in
memory associated with Socket2 and is accessed by a CPU (core) in Socket0.
The processors use the "Bus Interconnect" to create gateways between the
sockets (nodes) enabling inter-socket access to memory. These "Bus
Interconnect" hops add data access delays when a CPU (core) accesses
memory associated with a remote socket (node).

For terminology we refer to sockets as "nodes" where access to each
others' distinct resources such as memory make them "siblings" with a
designated "distance" between them.  A specific design is described under
the ACPI (Advanced Configuration and Power Interface Specification)
within the chapter explaining the system's SLIT (System Locality Distance
Information Table).

These patches extend core libvirt's XML description of a virtual machine's
hardware to include NUMA distance information for sibling nodes, which
is then passed to Xen guests via libxl. Recently qemu landed support for
constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
distance for different NUMA nodes"), hence these core libvirt extensions
can also help other drivers in supporting this feature.

The XML changes made allow to describe the <cell> node/sockets <distances>
amongst <sibling> node identifiers and propagate these towards the numa
domain functionality finally adding support to libxl.

[below is an example illustrating a 4 node/socket <cell> setup]

    <cpu>
      <numa>
        <cell id='0' cpus='0,4-7' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='21'/>
            <sibling id='2' value='31'/>
            <sibling id='3' value='41'/>
          </distances>
        </cell>
        <cell id='1' cpus='1,8-10,12-15' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='21'/>
            <sibling id='1' value='10'/>
            <sibling id='2' value='21'/>
            <sibling id='3' value='31'/>
          </distances>
        </cell>
        <cell id='2' cpus='2,11' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='31'/>
            <sibling id='1' value='21'/>
            <sibling id='2' value='10'/>
            <sibling id='3' value='21'/>
          </distances>
        </cell>
        <cell id='3' cpus='3' memory='2097152' unit='KiB'>
          <distances>
            <sibling id='0' value='41'/>
            <sibling id='1' value='31'/>
            <sibling id='2' value='21'/>
            <sibling id='3' value='10'/>
          </distances>
        </cell>
      </numa>
    </cpu>

By default on libxl, if no <distances> are given to describe the distance data
between different <cell>s, this patch will default to a scheme using 10
for local and 20 for any remote node/socket, which is the assumption of
guest OS when no SLIT is specified. While SLIT is optional, libxl requires
that distances are set nonetheless.

On Linux systems the SLIT detail can be listed with help of the 'numactl -H'
command. The above HVM guest would show the following output.

    [root@f25 ~]# numactl -H
    available: 4 nodes (0-3)
    node 0 cpus: 0 4 5 6 7
    node 0 size: 1988 MB
    node 0 free: 1743 MB
    node 1 cpus: 1 8 9 10 12 13 14 15
    node 1 size: 1946 MB
    node 1 free: 1885 MB
    node 2 cpus: 2 11
    node 2 size: 2011 MB
    node 2 free: 1912 MB
    node 3 cpus: 3
    node 3 size: 2010 MB
    node 3 free: 1980 MB
    node distances:
    node   0   1   2   3
      0:  10  21  31  41
      1:  21  10  21  31
      2:  31  21  10  21
      3:  41  31  21  10

Wim ten Have (5):
  numa: rename function virDomainNumaDefCPUFormat
  numa: describe siblings distances within cells
  xenconfig: add domxml conversions for xen-xl
  libxl: vnuma support
  xlconfigtest: add tests for numa cell sibling distances

 docs/formatdomain.html.in                          |  63 +++-
 docs/schemas/basictypes.rng                        |   7 +
 docs/schemas/cputypes.rng                          |  18 ++
 src/conf/cpu_conf.c                                |   2 +-
 src/conf/numa_conf.c                               | 342 ++++++++++++++++++++-
 src/conf/numa_conf.h                               |  22 +-
 src/libvirt_private.syms                           |   5 +
 src/libxl/libxl_conf.c                             | 120 ++++++++
 src/libxl/libxl_driver.c                           |   3 +-
 src/xenconfig/xen_xl.c                             | 333 ++++++++++++++++++++
 .../test-fullvirt-vnuma-autocomplete.cfg           |  26 ++
 .../test-fullvirt-vnuma-autocomplete.xml           |  85 +++++
 .../test-fullvirt-vnuma-nodistances.cfg            |  26 ++
 .../test-fullvirt-vnuma-nodistances.xml            |  53 ++++
 .../test-fullvirt-vnuma-partialdist.cfg            |  26 ++
 .../test-fullvirt-vnuma-partialdist.xml            |  60 ++++
 tests/xlconfigdata/test-fullvirt-vnuma.cfg         |  26 ++
 tests/xlconfigdata/test-fullvirt-vnuma.xml         |  81 +++++
 tests/xlconfigtest.c                               |   6 +
 19 files changed, 1295 insertions(+), 9 deletions(-)
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-autocomplete.cfg
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-autocomplete.xml
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.cfg
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-nodistances.xml
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-partialdist.cfg
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma-partialdist.xml
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.cfg
 create mode 100644 tests/xlconfigdata/test-fullvirt-vnuma.xml

-- 
2.9.5

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to