Hi,

This is v3 of the series [1] that I posted to introduce cluster cpu topology
besides now existing sockets, cores, and threads for ARM platform.

Description:
In implementations of ARM architecture, at most there could be a
cpu hierarchy like "sockets/dies/clusters/cores/threads" defined.
For example, ARM64 server chip Kunpeng 920 totally has 2 sockets,
2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in
each NUMA node, 4 cores in each cluster, and doesn't support SMT.
Clusters within the same NUMA share a L3 cache and cores within
the same cluster share a L2 cache.

The cache affinity of ARM cluster has been proved to improve the
kernel scheduling performance and a patchset [2] has been posted,
where a general sched_domain for clusters was added and a cluster
level was added in the arch-neutral cpu topology struct like below.

struct cpu_topology {
    int thread_id;
    int core_id;
    int cluster_id;
    int package_id;
    int llc_id;
    cpumask_t thread_sibling;
    cpumask_t core_sibling;
    cpumask_t cluster_sibling;
    cpumask_t llc_sibling;
}

In virtuallization, exposing the cluster level topology to guest
kernel may also improve the scheduling performance. So let's add
the -smp, clusters=* command line support for ARM cpu, then users
will be able to define a four-level cpu hierarchy for machines
and it will be sockets/clusters/cores/threads.

In this series, we only add the cluster concept of cpu topology
for ARM platform currently, and only focus on exposure of the
topology to guest through ACPI and DT.

[1] 
https://patchwork.kernel.org/project/qemu-devel/cover/20210413083147.34236-1-wangyana...@huawei.com/
[2] 
https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210420001844.9116-1-song.bao....@hisilicon.com/

Test results about exposure of topology:
After applying this patch series, launch a guest with virt-6.1.

Cmdline: -smp 96, sockets=2, clusters=12, cores=4, threads=1
Output:
linux-atxcNc:~ # lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):           2
NUMA node(s):        1
Vendor ID:           0x48

Topology information of clusters can also be got:
cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list: 0-3
cat /sys/devices/system/cpu/cpu4/topology/cluster_cpus_list: 4-7
cat /sys/devices/system/cpu/cpu8/topology/cluster_cpus_list: 8-11
...

cat /sys/devices/system/cpu/cpu95/topology/cluster_cpus_list: 92-95

THINGS TO DO SOON:
1) Run some benchmark to test the scheduling improvement of guest kernel
   introduced by cluster level virtual topology
2) Add some QEMU tests about ARM vcpu topology, ACPI PPTT table, and DT
   cpu nodes. Will post in a separate patchset later.

---

Changelogs:
v2->v3:
- Address comments from Philippe, and Andrew. Thanks!
- Rebased the code on v3 of series " hw/arm/virt: Introduce cpu topology 
support"
- v2: 
https://patchwork.kernel.org/project/qemu-devel/cover/20210413083147.34236-1-wangyana...@huawei.com/

v1->v2:
- Only focus on cluster support for ARM platform
- v1: 
https://patchwork.kernel.org/project/qemu-devel/cover/20210331095343.12172-1-wangyana...@huawei.com/

---

Yanan Wang (4):
  vl.c: Add -smp, clusters=* command line support for ARM cpu
  hw/arm/virt: Add cluster level to device tree
  hw/arm/virt-acpi-build: Add cluster level to PPTT table
  hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse

 hw/arm/virt-acpi-build.c | 45 ++++++++++++++++++++++++----------------
 hw/arm/virt.c            | 44 +++++++++++++++++++++++----------------
 include/hw/arm/virt.h    |  1 +
 qemu-options.hx          | 26 +++++++++++++----------
 softmmu/vl.c             |  3 +++
 5 files changed, 72 insertions(+), 47 deletions(-)

-- 
2.19.1


Reply via email to