Hi, This is v3 of the series [1] that I posted to introduce cluster cpu topology besides now existing sockets, cores, and threads for ARM platform.
Description: In implementations of ARM architecture, at most there could be a cpu hierarchy like "sockets/dies/clusters/cores/threads" defined. For example, ARM64 server chip Kunpeng 920 totally has 2 sockets, 2 NUMA nodes (also means cpu dies) in each socket, 6 clusters in each NUMA node, 4 cores in each cluster, and doesn't support SMT. Clusters within the same NUMA share a L3 cache and cores within the same cluster share a L2 cache. The cache affinity of ARM cluster has been proved to improve the kernel scheduling performance and a patchset [2] has been posted, where a general sched_domain for clusters was added and a cluster level was added in the arch-neutral cpu topology struct like below. struct cpu_topology { int thread_id; int core_id; int cluster_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; cpumask_t llc_sibling; } In virtuallization, exposing the cluster level topology to guest kernel may also improve the scheduling performance. So let's add the -smp, clusters=* command line support for ARM cpu, then users will be able to define a four-level cpu hierarchy for machines and it will be sockets/clusters/cores/threads. In this series, we only add the cluster concept of cpu topology for ARM platform currently, and only focus on exposure of the topology to guest through ACPI and DT. [1] https://patchwork.kernel.org/project/qemu-devel/cover/20210413083147.34236-1-wangyana...@huawei.com/ [2] https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210420001844.9116-1-song.bao....@hisilicon.com/ Test results about exposure of topology: After applying this patch series, launch a guest with virt-6.1. Cmdline: -smp 96, sockets=2, clusters=12, cores=4, threads=1 Output: linux-atxcNc:~ # lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s): 1 Vendor ID: 0x48 Topology information of clusters can also be got: cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list: 0-3 cat /sys/devices/system/cpu/cpu4/topology/cluster_cpus_list: 4-7 cat /sys/devices/system/cpu/cpu8/topology/cluster_cpus_list: 8-11 ... cat /sys/devices/system/cpu/cpu95/topology/cluster_cpus_list: 92-95 THINGS TO DO SOON: 1) Run some benchmark to test the scheduling improvement of guest kernel introduced by cluster level virtual topology 2) Add some QEMU tests about ARM vcpu topology, ACPI PPTT table, and DT cpu nodes. Will post in a separate patchset later. --- Changelogs: v2->v3: - Address comments from Philippe, and Andrew. Thanks! - Rebased the code on v3 of series " hw/arm/virt: Introduce cpu topology support" - v2: https://patchwork.kernel.org/project/qemu-devel/cover/20210413083147.34236-1-wangyana...@huawei.com/ v1->v2: - Only focus on cluster support for ARM platform - v1: https://patchwork.kernel.org/project/qemu-devel/cover/20210331095343.12172-1-wangyana...@huawei.com/ --- Yanan Wang (4): vl.c: Add -smp, clusters=* command line support for ARM cpu hw/arm/virt: Add cluster level to device tree hw/arm/virt-acpi-build: Add cluster level to PPTT table hw/arm/virt: Parse -smp cluster parameter in virt_smp_parse hw/arm/virt-acpi-build.c | 45 ++++++++++++++++++++++++---------------- hw/arm/virt.c | 44 +++++++++++++++++++++++---------------- include/hw/arm/virt.h | 1 + qemu-options.hx | 26 +++++++++++++---------- softmmu/vl.c | 3 +++ 5 files changed, 72 insertions(+), 47 deletions(-) -- 2.19.1