Re: [RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support
On 3/1/2021 5:48 PM, Andrew Jones wrote: On Fri, Feb 26, 2021 at 04:41:45PM +0800, Ying Fang wrote: On 2/25/2021 8:02 PM, Andrew Jones wrote: On Thu, Feb 25, 2021 at 04:56:22PM +0800, Ying Fang wrote: An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Dario Faggioli's talk in [0] also shows the virtual topology may has impact on sched performace. Thus this patch series is posted to introduce cpu topology support for arm platform. Both fdt and ACPI are introduced to present the cpu topology. To describe the cpu topology via ACPI, a PPTT table is introduced according to the processor hierarchy node structure. This series is derived from [1], in [1] we are trying to bring both cpu and cache topology support for arm platform, but there is still some issues to solve to support the cache hierarchy. So we split the cpu topology part out and send it seperately. The patch series to support cache hierarchy will be send later since Salil Mehta's cpu hotplug feature need the cpu topology enabled first and he is waiting for it to be upstreamed. This patch series was initially based on the patches posted by Andrew Jones [2]. I jumped in on it since some OS vendor cooperative partner are eager for it. Thanks for Andrew's contribution. After applying this patch series, launch a guest with virt-6.0 and cpu topology configured with sockets:cores:threads = 2:4:2, you will get the bellow messages with the lscpu command. - Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 What CPU model was used? Did it actually support threads? If these were It's tested on Huawei Kunpeng 920 CPU model and vcpu host-passthrough. It does not support threads for now, but the next version 930 may support it. Here we emulate a virtual cpu topology, a virtual 2 threads is used to do the test. KVM VCPUs, then I guess MPIDR.MT was not set on the CPUs. Apparently that didn't confuse Linux? See [1] for how I once tried to deal with threads. [1] https://github.com/rhdrjones/qemu/commit/60218e0dd7b331031b644872d56f2aca42d0ff1e If ACPI PPTT table is specified, the linux kernel won't check the MPIDR register to populate cpu topology. Moreover MPIDR does not ensure a right cpu topology. So it won't be a problem if MPIDR.MT is not set. OK, so Linux doesn't care about MPIDR.MT with ACPI. What happens with DT? Behind the logical of Linux kernel, it tries to parse cpu topology in smp_prepare_cpus (arch/arm64/kernel/topology.c). If cpu topology is provided via DT, Linux kernel won't check MPIDR any more. This is the same with ACPI enabled. Core(s) per socket: 4 Socket(s): 2 Good, but what happens if you specify '-smp 16'? Do you get 16 sockets ^^ You didn't answer this question. The latest qemu use smp_parse the parse -smp command line, by default if -smp 16 is given, arm64 virt machine will get 16 sockets. each with 1 core? Or, do you get 1 socket with 16 cores? And, which do we want and why? If you look at [2], then you'll see I was assuming we want to prefer cores over sockets, since without topology descriptions that's what the Linux guest kernel would do. [2] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd Thanks, I'll check the default way Linux does. NUMA node(s):2 Why do we have two NUMA nodes in the guest? The two sockets in the guest should not imply this. The two NUMA nodes are emulated by Qemu since we already have guest numa topology feature. That's what I suspected, and I presume only a single node is present when you don't use QEMU's NUMA feature - even when you supply a VCPU topology with multiple sockets? Agreed, I would like single numa node too if we do not use guest numa feature. Here I provide the guest with two numa nodes and set the cpu affinity only to do a test. Thanks, drew So the two sockets in the guest has nothing to do with it. Actually even one socket may have two numa nodes in it in real cpu model. Thanks, drew Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 Stepping:0x1 BogoMIPS:200.00 NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 [0] https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse [1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html [2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com Ying Fang (5
Re: [RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure
On 3/1/2021 11:50 PM, Michael S. Tsirkin wrote: On Mon, Mar 01, 2021 at 10:39:19AM +0100, Andrew Jones wrote: On Fri, Feb 26, 2021 at 10:23:03AM +0800, Ying Fang wrote: On 2/25/2021 7:47 PM, Andrew Jones wrote: On Thu, Feb 25, 2021 at 04:56:26PM +0800, Ying Fang wrote: Add the processor hierarchy node structures to build ACPI information for CPU topology. Since the private resources may be used to describe cache hierarchy and it is variable among different topology level, three helpers are introduced to describe the hierarchy. (1) build_socket_hierarchy for socket description (2) build_processor_hierarchy for processor description (3) build_smt_hierarchy for thread (logic processor) description Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 40 + include/hw/acpi/acpi-defs.h | 13 include/hw/acpi/aml-build.h | 7 +++ 3 files changed, 60 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index a2cd7a5830..a0af3e9d73 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms, table_data->len - slit_start, 1, oem_id, oem_table_id); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4); Missing '/* Flags */' Will fix. +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID | + ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD | + ACPI_PPTT_ACPI_LEAF_NODE, 4); /* Flags */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ 'parent' not capitalized. We want these comments to exactly match the text in the spec. Will fix. +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index cf9f44299c..45e10d886f 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -618,4 +618,17 @@ struct AcpiIortRC { } QEMU_PACKED; typedef struct AcpiIortRC AcpiIortRC; +enum { +ACPI_PPTT_TYPE_PROCESSOR = 0, +ACPI_PPTT_TYPE_CACHE, +ACPI_PPTT_TYPE_ID, +ACPI_PPTT_TYPE_RESERVED +}; + +#define ACPI_PPTT_PHYSICAL_PACKAGE (1) +#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID (1 << 1) +#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD (1 << 2) /* ACPI 6.3 */ +#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3) /* ACPI 6.3 */ +#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4) /* ACPI 6.3 */ You need to quote specific place in spec where this appeared, not just version. and what about previous ones? Thanks, Will fix. + #endif diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 380d3e3924..7f0ca1a198 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms, const char *oem_id, const char *oem_table_id); +void build_socket_hier
Re: [RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support
On 2/25/2021 8:02 PM, Andrew Jones wrote: On Thu, Feb 25, 2021 at 04:56:22PM +0800, Ying Fang wrote: An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Dario Faggioli's talk in [0] also shows the virtual topology may has impact on sched performace. Thus this patch series is posted to introduce cpu topology support for arm platform. Both fdt and ACPI are introduced to present the cpu topology. To describe the cpu topology via ACPI, a PPTT table is introduced according to the processor hierarchy node structure. This series is derived from [1], in [1] we are trying to bring both cpu and cache topology support for arm platform, but there is still some issues to solve to support the cache hierarchy. So we split the cpu topology part out and send it seperately. The patch series to support cache hierarchy will be send later since Salil Mehta's cpu hotplug feature need the cpu topology enabled first and he is waiting for it to be upstreamed. This patch series was initially based on the patches posted by Andrew Jones [2]. I jumped in on it since some OS vendor cooperative partner are eager for it. Thanks for Andrew's contribution. After applying this patch series, launch a guest with virt-6.0 and cpu topology configured with sockets:cores:threads = 2:4:2, you will get the bellow messages with the lscpu command. - Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 What CPU model was used? Did it actually support threads? If these were It's tested on Huawei Kunpeng 920 CPU model and vcpu host-passthrough. It does not support threads for now, but the next version 930 may support it. Here we emulate a virtual cpu topology, a virtual 2 threads is used to do the test. KVM VCPUs, then I guess MPIDR.MT was not set on the CPUs. Apparently that didn't confuse Linux? See [1] for how I once tried to deal with threads. [1] https://github.com/rhdrjones/qemu/commit/60218e0dd7b331031b644872d56f2aca42d0ff1e If ACPI PPTT table is specified, the linux kernel won't check the MPIDR register to populate cpu topology. Moreover MPIDR does not ensure a right cpu topology. So it won't be a problem if MPIDR.MT is not set. Core(s) per socket: 4 Socket(s): 2 Good, but what happens if you specify '-smp 16'? Do you get 16 sockets each with 1 core? Or, do you get 1 socket with 16 cores? And, which do we want and why? If you look at [2], then you'll see I was assuming we want to prefer cores over sockets, since without topology descriptions that's what the Linux guest kernel would do. [2] https://github.com/rhdrjones/qemu/commit/c0670b1bccb4d08c7cf7c6957cc8878a2af131dd NUMA node(s):2 Why do we have two NUMA nodes in the guest? The two sockets in the guest should not imply this. The two NUMA nodes are emulated by Qemu since we already have guest numa topology feature. So the two sockets in the guest has nothing to do with it. Actually even one socket may have two numa nodes in it in real cpu model. Thanks, drew Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 Stepping:0x1 BogoMIPS:200.00 NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 [0] https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse [1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html [2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com Ying Fang (5): device_tree: Add qemu_fdt_add_path hw/arm/virt: Add cpu-map to device tree hw/arm/virt-acpi-build: distinguish possible and present cpus hw/acpi/aml-build: add processor hierarchy node structure hw/arm/virt-acpi-build: add PPTT table hw/acpi/aml-build.c | 40 ++ hw/arm/virt-acpi-build.c | 64 +--- hw/arm/virt.c| 40 +- include/hw/acpi/acpi-defs.h | 13 include/hw/acpi/aml-build.h | 7 include/hw/arm/virt.h| 1 + include/sysemu/device_tree.h | 1 + softmmu/device_tree.c| 45 +++-- 8 files changed, 204 insertions(+), 7 deletions(-) -- 2.23.0 .
Re: [RFC PATCH 5/5] hw/arm/virt-acpi-build: add PPTT table
On 2/25/2021 7:38 PM, Andrew Jones wrote: This is just [*] with some minor code changes [*] https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11 so it's disappointing that my name is nowhere to be found on it. Also, the explanation of the DT and ACPI differences has been dropped from the commit message of [*]. I'm not sure why. Will fix that. I will add SOB of you then you can help to comment on it. Thanks, drew On Thu, Feb 25, 2021 at 04:56:27PM +0800, Ying Fang wrote: Add the Processor Properties Topology Table (PPTT) to present CPU topology information to the guest. A three-level cpu topology is built in accord with the linux kernel currently does. Tested-by: Jiajie Li Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 50 1 file changed, 50 insertions(+) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index bb91152fe2..38d50ce66c 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -436,6 +436,50 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) vms->oem_table_id); } +static void +build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) +{ +int pptt_start = table_data->len; +int uid = 0, cpus = 0, socket = 0; +MachineState *ms = MACHINE(vms); +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; + +acpi_data_push(table_data, sizeof(AcpiTableHeader)); + +for (socket = 0; cpus < ms->possible_cpus->len; socket++) { +uint32_t socket_offset = table_data->len - pptt_start; +int core; + +build_socket_hierarchy(table_data, 0, socket); + +for (core = 0; core < smp_cores; core++) { +uint32_t core_offset = table_data->len - pptt_start; +int thread; + +if (smp_threads <= 1) { +build_processor_hierarchy(table_data, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID | + ACPI_PPTT_ACPI_LEAF_NODE, + socket_offset, uid++); + } else { +build_processor_hierarchy(table_data, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID, + socket_offset, core); +for (thread = 0; thread < smp_threads; thread++) { +build_thread_hierarchy(table_data, core_offset, uid++); +} + } +} +cpus += smp_cores * smp_threads; +} + +build_header(linker, table_data, + (void *)(table_data->data + pptt_start), "PPTT", + table_data->len - pptt_start, 2, + vms->oem_id, vms->oem_table_id); +} + /* GTDT */ static void build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -688,6 +732,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) unsigned dsdt, xsdt; GArray *tables_blob = tables->table_data; MachineState *ms = MACHINE(vms); +bool cpu_topology_enabled = !vmc->no_cpu_topology; table_offsets = g_array_new(false, true /* clear */, sizeof(uint32_t)); @@ -707,6 +752,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); +if (ms->smp.cpus > 1 && cpu_topology_enabled) { +acpi_add_table(table_offsets, tables_blob); +build_pptt(tables_blob, tables->linker, vms); +} + acpi_add_table(table_offsets, tables_blob); build_gtdt(tables_blob, tables->linker, vms); -- 2.23.0 .
Re: [RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure
On 2/25/2021 7:47 PM, Andrew Jones wrote: On Thu, Feb 25, 2021 at 04:56:26PM +0800, Ying Fang wrote: Add the processor hierarchy node structures to build ACPI information for CPU topology. Since the private resources may be used to describe cache hierarchy and it is variable among different topology level, three helpers are introduced to describe the hierarchy. (1) build_socket_hierarchy for socket description (2) build_processor_hierarchy for processor description (3) build_smt_hierarchy for thread (logic processor) description Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 40 + include/hw/acpi/acpi-defs.h | 13 include/hw/acpi/aml-build.h | 7 +++ 3 files changed, 60 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index a2cd7a5830..a0af3e9d73 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms, table_data->len - slit_start, 1, oem_id, oem_table_id); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4); Missing '/* Flags */' Will fix. +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID | + ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD | + ACPI_PPTT_ACPI_LEAF_NODE, 4); /* Flags */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ 'parent' not capitalized. We want these comments to exactly match the text in the spec. Will fix. +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index cf9f44299c..45e10d886f 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -618,4 +618,17 @@ struct AcpiIortRC { } QEMU_PACKED; typedef struct AcpiIortRC AcpiIortRC; +enum { +ACPI_PPTT_TYPE_PROCESSOR = 0, +ACPI_PPTT_TYPE_CACHE, +ACPI_PPTT_TYPE_ID, +ACPI_PPTT_TYPE_RESERVED +}; + +#define ACPI_PPTT_PHYSICAL_PACKAGE (1) +#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID (1 << 1) +#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD (1 << 2) /* ACPI 6.3 */ +#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3) /* ACPI 6.3 */ +#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4) /* ACPI 6.3 */ + #endif diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 380d3e3924..7f0ca1a198 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms, const char *oem_id, const char *oem_table_id); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); Why does build_processor_hierarchy() take a flags argument
Re: [RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path
On 2/25/2021 9:25 PM, Andrew Jones wrote: On Thu, Feb 25, 2021 at 08:54:40PM +0800, Ying Fang wrote: On 2/25/2021 7:03 PM, Andrew Jones wrote: Hi Ying Fang, I don't see any change in this patch from what I have in my tree, so this should be From: Andrew Jones Thanks, drew Yes, I picked it from your qemu branch: https://github.com/rhdrjones/qemu/commit/ecfc1565f22187d2c715a99bbcd35cf3a7e428fa So what can I do to make it "From: Andrew Jones " ? Can I made it by using git commit --amend like below ? git commit --amend --author "Andrew Jones " That's one way to fix it now, but normally when you apply/cherry-pick a patch it will keep the authorship. Then, all you have to do is post like usual and the "From: ..." will show up automatically. Hmm, I know cherry-pick can do that. But sometimes there maybe conflicts, so I have to backport it by hand and copy the commit msg back, thus the authorship may be lost. Thanks, drew On Thu, Feb 25, 2021 at 04:56:23PM +0800, Ying Fang wrote: qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it also adds any missing parent nodes. We also tweak an error message of qemu_fdt_add_subnode(). Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- include/sysemu/device_tree.h | 1 + softmmu/device_tree.c| 45 ++-- 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c index b9a3ddc518..1e3857ca0c 100644 --- a/softmmu/device_tree.c +++ b/softmmu/device_tree.c @@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) retval = fdt_add_subnode(fdt, parent, basename); if (retval < 0) { -error_report("FDT: Failed to create subnode %s: %s", name, - fdt_strerror(retval)); +error_report("%s: Failed to create subnode %s: %s", + __func__, name, fdt_strerror(retval)); exit(1); } @@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +/* + * Like qemu_fdt_add_subnode(), but will add all missing + * subnodes in the path. + */ +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *dupname, *basename, *p; +int parent, retval = -1; + +if (path[0] != '/') { +return retval; +} + +parent = fdt_path_offset(fdt, "/"); +p = dupname = g_strdup(path); + +while (p) { +*p = '/'; +basename = p + 1; +p = strchr(p + 1, '/'); +if (p) { +*p = '\0'; +} +retval = fdt_path_offset(fdt, dupname); +if (retval < 0 && retval != -FDT_ERR_NOTFOUND) { +error_report("%s: Invalid path %s: %s", + __func__, path, fdt_strerror(retval)); +exit(1); +} else if (retval == -FDT_ERR_NOTFOUND) { +retval = fdt_add_subnode(fdt, parent, basename); +if (retval < 0) { +break; +} +} +parent = retval; +} + +g_free(dupname); +return retval; +} + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = current_machine->dumpdtb; -- 2.23.0 . .
Re: [RFC PATCH 2/5] hw/arm/virt: Add cpu-map to device tree
On 2/25/2021 7:16 PM, Andrew Jones wrote: Hi Ying Fang, The only difference between this and what I have in my tree[*] is the removal of the socket node (which has been in the Linux docs since June 2019). Any reason why you removed that node? In any case, I think I deserve a bit more credit for this patch. Sorry, you surely deserve it. I forget to add it here. Should I have a SOB of you here ? The latest linux kernel use a four level cpu topology defined in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/cpu/cpu-topology.txt?h=v5.11 ie. socket node, cluster node, core node, thread node. The linux kernel 4.19 LTS use a three level cpu topology defined in Documentation/devicetree/bindings/arm/topology.txt ie. cluster node, core node, thread node. Currently Qemu x86 has 4 level of cpu topology as: socket, die, core, thread. Should arm64 active like it here ? Further more, latest linux kernel define the cpu topology struct as. So maybe it only cares about the socket, core, thread topology levels. struct cpu_topology { int thread_id; int core_id; int package_id; int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t llc_sibling; }; [*] https://github.com/rhdrjones/qemu/commit/35feecdd43475608c8f55973a0c159eac4aafefd Thanks, drew On Thu, Feb 25, 2021 at 04:56:24PM +0800, Ying Fang wrote: Support device tree CPU topology descriptions. Signed-off-by: Ying Fang --- hw/arm/virt.c | 38 +- include/hw/arm/virt.h | 1 + 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 371147f3ae..c133b342b8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -351,10 +351,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) int cpu; int addr_cells = 1; const MachineState *ms = MACHINE(vms); +const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); int smp_cpus = ms->smp.cpus; /* - * From Documentation/devicetree/bindings/arm/cpus.txt + * See Linux Documentation/devicetree/bindings/arm/cpus.yaml * On ARM v8 64-bit systems value should be set to 2, * that corresponds to the MPIDR_EL1 register size. * If MPIDR_EL1[63:32] value is equal to 0 on all CPUs @@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) ms->possible_cpus->cpus[cs->cpu_index].props.node_id); } +if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) { +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +} + g_free(nodename); } + +if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) { +/* + * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt + */ +qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map"); + +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { +char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu); +char *map_path; + +if (ms->smp.threads > 1) { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d/%s%d", +"cluster", cpu / (ms->smp.cores * ms->smp.threads), a cluster node may be replaced by socket to keep accord with the latest kernel. +"core", (cpu / ms->smp.threads) % ms->smp.cores, +"thread", cpu % ms->smp.threads); +} else { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d", +"cluster", cpu / ms->smp.cores, +"core", cpu % ms->smp.cores); +} +qemu_fdt_add_path(vms->fdt, map_path); +qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path); +g_free(map_path); +g_free(cpu_path); +} +} } static void fdt_add_its_gic_node(VirtMachineState *vms) @@ -2742,6 +2777,7 @@ static void virt_machine_5_2_options(MachineClass *mc) virt_machine_6_0_options(mc); compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); vmc->no_secure_gpio = true; +vmc->no_cpu_topology = true; } DEFINE_VIRT_MACHINE(5, 2) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index ee9a93101e..7ef6d08ac3 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -129,6 +129,7 @@ struct VirtMachineClass { bool no_kvm_steal_time; bool acpi_expose_flash; bool no_secure_gpio; +bool no_cpu_topology; }; struct VirtMachineState { -- 2.23.0 .
Re: [RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path
On 2/25/2021 7:03 PM, Andrew Jones wrote: Hi Ying Fang, I don't see any change in this patch from what I have in my tree, so this should be From: Andrew Jones Thanks, drew Yes, I picked it from your qemu branch: https://github.com/rhdrjones/qemu/commit/ecfc1565f22187d2c715a99bbcd35cf3a7e428fa So what can I do to make it "From: Andrew Jones " ? Can I made it by using git commit --amend like below ? git commit --amend --author "Andrew Jones " On Thu, Feb 25, 2021 at 04:56:23PM +0800, Ying Fang wrote: qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it also adds any missing parent nodes. We also tweak an error message of qemu_fdt_add_subnode(). Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- include/sysemu/device_tree.h | 1 + softmmu/device_tree.c| 45 ++-- 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c index b9a3ddc518..1e3857ca0c 100644 --- a/softmmu/device_tree.c +++ b/softmmu/device_tree.c @@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) retval = fdt_add_subnode(fdt, parent, basename); if (retval < 0) { -error_report("FDT: Failed to create subnode %s: %s", name, - fdt_strerror(retval)); +error_report("%s: Failed to create subnode %s: %s", + __func__, name, fdt_strerror(retval)); exit(1); } @@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +/* + * Like qemu_fdt_add_subnode(), but will add all missing + * subnodes in the path. + */ +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *dupname, *basename, *p; +int parent, retval = -1; + +if (path[0] != '/') { +return retval; +} + +parent = fdt_path_offset(fdt, "/"); +p = dupname = g_strdup(path); + +while (p) { +*p = '/'; +basename = p + 1; +p = strchr(p + 1, '/'); +if (p) { +*p = '\0'; +} +retval = fdt_path_offset(fdt, dupname); +if (retval < 0 && retval != -FDT_ERR_NOTFOUND) { +error_report("%s: Invalid path %s: %s", + __func__, path, fdt_strerror(retval)); +exit(1); +} else if (retval == -FDT_ERR_NOTFOUND) { +retval = fdt_add_subnode(fdt, parent, basename); +if (retval < 0) { +break; +} +} +parent = retval; +} + +g_free(dupname); +return retval; +} + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = current_machine->dumpdtb; -- 2.23.0 .
[RFC PATCH 2/5] hw/arm/virt: Add cpu-map to device tree
Support device tree CPU topology descriptions. Signed-off-by: Ying Fang --- hw/arm/virt.c | 38 +- include/hw/arm/virt.h | 1 + 2 files changed, 38 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 371147f3ae..c133b342b8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -351,10 +351,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) int cpu; int addr_cells = 1; const MachineState *ms = MACHINE(vms); +const VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); int smp_cpus = ms->smp.cpus; /* - * From Documentation/devicetree/bindings/arm/cpus.txt + * See Linux Documentation/devicetree/bindings/arm/cpus.yaml * On ARM v8 64-bit systems value should be set to 2, * that corresponds to the MPIDR_EL1 register size. * If MPIDR_EL1[63:32] value is equal to 0 on all CPUs @@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) ms->possible_cpus->cpus[cs->cpu_index].props.node_id); } +if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) { +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +} + g_free(nodename); } + +if (ms->smp.cpus > 1 && !vmc->no_cpu_topology) { +/* + * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt + */ +qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map"); + +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { +char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu); +char *map_path; + +if (ms->smp.threads > 1) { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d/%s%d", +"cluster", cpu / (ms->smp.cores * ms->smp.threads), +"core", (cpu / ms->smp.threads) % ms->smp.cores, +"thread", cpu % ms->smp.threads); +} else { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d", +"cluster", cpu / ms->smp.cores, +"core", cpu % ms->smp.cores); +} +qemu_fdt_add_path(vms->fdt, map_path); +qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path); +g_free(map_path); +g_free(cpu_path); +} +} } static void fdt_add_its_gic_node(VirtMachineState *vms) @@ -2742,6 +2777,7 @@ static void virt_machine_5_2_options(MachineClass *mc) virt_machine_6_0_options(mc); compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); vmc->no_secure_gpio = true; +vmc->no_cpu_topology = true; } DEFINE_VIRT_MACHINE(5, 2) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index ee9a93101e..7ef6d08ac3 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -129,6 +129,7 @@ struct VirtMachineClass { bool no_kvm_steal_time; bool acpi_expose_flash; bool no_secure_gpio; +bool no_cpu_topology; }; struct VirtMachineState { -- 2.23.0
[RFC PATCH 3/5] hw/arm/virt-acpi-build: distinguish possible and present cpus
When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled in madt. Furthermore, it is also needed if we are going to support CPU hotplug in the future. This patch is a rework based on Andrew Jones's contribution at https://lists.gnu.org/archive/html/qemu-arm/2018-07/msg00076.html Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 14 ++ hw/arm/virt.c| 2 ++ 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index f9c9df916c..bb91152fe2 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -61,13 +61,16 @@ static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { -MachineState *ms = MACHINE(vms); +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; uint16_t i; -for (i = 0; i < ms->smp.cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); +if (possible_cpus->cpus[i].cpu == NULL) { +aml_append(dev, aml_name_decl("_STA", aml_int(0))); +} aml_append(scope, dev); } } @@ -479,6 +482,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const int *irqmap = vms->irqmap; AcpiMadtGenericDistributor *gicd; AcpiMadtGenericMsiFrame *gic_msi; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; int i; acpi_data_push(table_data, sizeof(AcpiMultipleApicTable)); @@ -489,7 +493,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -504,7 +508,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicc->cpu_interface_number = cpu_to_le32(i); gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); gicc->uid = cpu_to_le32(i); -gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +if (possible_cpus->cpus[i].cpu != NULL) { +gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +} if (arm_feature(>env, ARM_FEATURE_PMU)) { gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ)); diff --git a/hw/arm/virt.c b/hw/arm/virt.c index c133b342b8..75659502e2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2047,6 +2047,8 @@ static void machvirt_init(MachineState *machine) qdev_realize(DEVICE(cpuobj), NULL, _fatal); object_unref(cpuobj); +/* Initialize cpu member here since cpu hotplug is not supported yet */ +machine->possible_cpus->cpus[n].cpu = cpuobj; } fdt_add_timer_nodes(vms); fdt_add_cpu_nodes(vms); -- 2.23.0
[RFC PATCH 1/5] device_tree: Add qemu_fdt_add_path
qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it also adds any missing parent nodes. We also tweak an error message of qemu_fdt_add_subnode(). Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- include/sysemu/device_tree.h | 1 + softmmu/device_tree.c| 45 ++-- 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c index b9a3ddc518..1e3857ca0c 100644 --- a/softmmu/device_tree.c +++ b/softmmu/device_tree.c @@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) retval = fdt_add_subnode(fdt, parent, basename); if (retval < 0) { -error_report("FDT: Failed to create subnode %s: %s", name, - fdt_strerror(retval)); +error_report("%s: Failed to create subnode %s: %s", + __func__, name, fdt_strerror(retval)); exit(1); } @@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +/* + * Like qemu_fdt_add_subnode(), but will add all missing + * subnodes in the path. + */ +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *dupname, *basename, *p; +int parent, retval = -1; + +if (path[0] != '/') { +return retval; +} + +parent = fdt_path_offset(fdt, "/"); +p = dupname = g_strdup(path); + +while (p) { +*p = '/'; +basename = p + 1; +p = strchr(p + 1, '/'); +if (p) { +*p = '\0'; +} +retval = fdt_path_offset(fdt, dupname); +if (retval < 0 && retval != -FDT_ERR_NOTFOUND) { +error_report("%s: Invalid path %s: %s", + __func__, path, fdt_strerror(retval)); +exit(1); +} else if (retval == -FDT_ERR_NOTFOUND) { +retval = fdt_add_subnode(fdt, parent, basename); +if (retval < 0) { +break; +} +} +parent = retval; +} + +g_free(dupname); +return retval; +} + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = current_machine->dumpdtb; -- 2.23.0
[RFC PATCH 4/5] hw/acpi/aml-build: add processor hierarchy node structure
Add the processor hierarchy node structures to build ACPI information for CPU topology. Since the private resources may be used to describe cache hierarchy and it is variable among different topology level, three helpers are introduced to describe the hierarchy. (1) build_socket_hierarchy for socket description (2) build_processor_hierarchy for processor description (3) build_smt_hierarchy for thread (logic processor) description Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 40 + include/hw/acpi/acpi-defs.h | 13 include/hw/acpi/aml-build.h | 7 +++ 3 files changed, 60 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index a2cd7a5830..a0af3e9d73 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1888,6 +1888,46 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms, table_data->len - slit_start, 1, oem_id, oem_table_id); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, ACPI_PPTT_PHYSICAL_PACKAGE, 4); +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, ACPI_PPTT_TYPE_PROCESSOR); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID | + ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD | + ACPI_PPTT_ACPI_LEAF_NODE, 4); /* Flags */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index cf9f44299c..45e10d886f 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -618,4 +618,17 @@ struct AcpiIortRC { } QEMU_PACKED; typedef struct AcpiIortRC AcpiIortRC; +enum { +ACPI_PPTT_TYPE_PROCESSOR = 0, +ACPI_PPTT_TYPE_CACHE, +ACPI_PPTT_TYPE_ID, +ACPI_PPTT_TYPE_RESERVED +}; + +#define ACPI_PPTT_PHYSICAL_PACKAGE (1) +#define ACPI_PPTT_ACPI_PROCESSOR_ID_VALID (1 << 1) +#define ACPI_PPTT_ACPI_PROCESSOR_IS_THREAD (1 << 2) /* ACPI 6.3 */ +#define ACPI_PPTT_ACPI_LEAF_NODE(1 << 3) /* ACPI 6.3 */ +#define ACPI_PPTT_ACPI_IDENTICAL(1 << 4) /* ACPI 6.3 */ + #endif diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 380d3e3924..7f0ca1a198 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -462,6 +462,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms, const char *oem_id, const char *oem_table_id); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_thread_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id); -- 2.23.0
[RFC PATCH 5/5] hw/arm/virt-acpi-build: add PPTT table
Add the Processor Properties Topology Table (PPTT) to present CPU topology information to the guest. A three-level cpu topology is built in accord with the linux kernel currently does. Tested-by: Jiajie Li Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 50 1 file changed, 50 insertions(+) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index bb91152fe2..38d50ce66c 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -436,6 +436,50 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) vms->oem_table_id); } +static void +build_pptt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) +{ +int pptt_start = table_data->len; +int uid = 0, cpus = 0, socket = 0; +MachineState *ms = MACHINE(vms); +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; + +acpi_data_push(table_data, sizeof(AcpiTableHeader)); + +for (socket = 0; cpus < ms->possible_cpus->len; socket++) { +uint32_t socket_offset = table_data->len - pptt_start; +int core; + +build_socket_hierarchy(table_data, 0, socket); + +for (core = 0; core < smp_cores; core++) { +uint32_t core_offset = table_data->len - pptt_start; +int thread; + +if (smp_threads <= 1) { +build_processor_hierarchy(table_data, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID | + ACPI_PPTT_ACPI_LEAF_NODE, + socket_offset, uid++); + } else { +build_processor_hierarchy(table_data, + ACPI_PPTT_ACPI_PROCESSOR_ID_VALID, + socket_offset, core); +for (thread = 0; thread < smp_threads; thread++) { +build_thread_hierarchy(table_data, core_offset, uid++); +} + } +} +cpus += smp_cores * smp_threads; +} + +build_header(linker, table_data, + (void *)(table_data->data + pptt_start), "PPTT", + table_data->len - pptt_start, 2, + vms->oem_id, vms->oem_table_id); +} + /* GTDT */ static void build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -688,6 +732,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) unsigned dsdt, xsdt; GArray *tables_blob = tables->table_data; MachineState *ms = MACHINE(vms); +bool cpu_topology_enabled = !vmc->no_cpu_topology; table_offsets = g_array_new(false, true /* clear */, sizeof(uint32_t)); @@ -707,6 +752,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); +if (ms->smp.cpus > 1 && cpu_topology_enabled) { +acpi_add_table(table_offsets, tables_blob); +build_pptt(tables_blob, tables->linker, vms); +} + acpi_add_table(table_offsets, tables_blob); build_gtdt(tables_blob, tables->linker, vms); -- 2.23.0
[RFC PATCH 0/5] hw/arm/virt: Introduce cpu topology support
An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Dario Faggioli's talk in [0] also shows the virtual topology may has impact on sched performace. Thus this patch series is posted to introduce cpu topology support for arm platform. Both fdt and ACPI are introduced to present the cpu topology. To describe the cpu topology via ACPI, a PPTT table is introduced according to the processor hierarchy node structure. This series is derived from [1], in [1] we are trying to bring both cpu and cache topology support for arm platform, but there is still some issues to solve to support the cache hierarchy. So we split the cpu topology part out and send it seperately. The patch series to support cache hierarchy will be send later since Salil Mehta's cpu hotplug feature need the cpu topology enabled first and he is waiting for it to be upstreamed. This patch series was initially based on the patches posted by Andrew Jones [2]. I jumped in on it since some OS vendor cooperative partner are eager for it. Thanks for Andrew's contribution. After applying this patch series, launch a guest with virt-6.0 and cpu topology configured with sockets:cores:threads = 2:4:2, you will get the bellow messages with the lscpu command. - Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 2 NUMA node(s):2 Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 Stepping:0x1 BogoMIPS:200.00 NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 [0] https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse [1] https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02166.html [2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com Ying Fang (5): device_tree: Add qemu_fdt_add_path hw/arm/virt: Add cpu-map to device tree hw/arm/virt-acpi-build: distinguish possible and present cpus hw/acpi/aml-build: add processor hierarchy node structure hw/arm/virt-acpi-build: add PPTT table hw/acpi/aml-build.c | 40 ++ hw/arm/virt-acpi-build.c | 64 +--- hw/arm/virt.c| 40 +- include/hw/acpi/acpi-defs.h | 13 include/hw/acpi/aml-build.h | 7 include/hw/arm/virt.h| 1 + include/sysemu/device_tree.h | 1 + softmmu/device_tree.c| 45 +++-- 8 files changed, 204 insertions(+), 7 deletions(-) -- 2.23.0
Re: [PATCH v4 0/7] block: Add retry for werror=/rerror= mechanism
Kindly ping for it. Thanks for Stefan's suggestion, we have re-implement the concept by introducing the 'retry' feature base on the werror=/rerror= mechanism. Hope this thread won't be missed. Any comments and reviews are wellcome. Thanks. Ying Fang. On 12/15/2020 8:30 PM, Jiahui Cen wrote: A VM in the cloud environment may use a virutal disk as the backend storage, and there are usually filesystems on the virtual block device. When backend storage is temporarily down, any I/O issued to the virtual block device will cause an error. For example, an error occurred in ext4 filesystem would make the filesystem readonly. In production environment, a cloud backend storage can be soon recovered. For example, an IP-SAN may be down due to network failure and will be online soon after network is recovered. However, the error in the filesystem may not be recovered unless a device reattach or system restart. Thus an I/O retry mechanism is in need to implement a self-healing system. This patch series propose to extend the werror=/rerror= mechanism to add a 'retry' feature. It can automatically retry failed I/O requests on error without sending error back to guest, and guest can get back running smoothly when I/O is recovred. v3->v4: * Adapt to werror=/rerror= mechanism. v2->v3: * Add a doc to describe I/O hang. v1->v2: * Rebase to fix compile problems. * Fix incorrect remove of rehandle list. * Provide rehandle pause interface. REF: https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg06560.html Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang Jiahui Cen (7): qapi/block-core: Add retry option for error action block-backend: Introduce retry timer block-backend: Add device specific retry callback block-backend: Enable retry action on errors block-backend: Add timeout support for retry block: Add error retry param setting virtio_blk: Add support for retry on errors block/block-backend.c | 66 blockdev.c | 52 +++ hw/block/block.c | 10 +++ hw/block/virtio-blk.c | 19 +- include/hw/block/block.h | 7 ++- include/sysemu/block-backend.h | 10 +++ qapi/block-core.json | 4 +- 7 files changed, 162 insertions(+), 6 deletions(-)
Re: [RFC PATCH v3 10/13] target/arm/cpu: Add cpu cache description for arm
On 11/30/2020 9:00 PM, Peter Maydell wrote: On Mon, 9 Nov 2020 at 03:05, Ying Fang wrote: Add the CPUCacheInfo structure to hold cpu cache information for ARM cpus. A classic three level cache topology is used here. The default cache capacity is given and userspace can overwrite these values. Signed-off-by: Ying Fang --- target/arm/cpu.c | 42 ++ target/arm/cpu.h | 27 +++ 2 files changed, 69 insertions(+) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 056319859f..f1bac7452c 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "cpu.h" #include "internals.h" +#include "qemu/units.h" #include "exec/exec-all.h" #include "hw/qdev-properties.h" #if !defined(CONFIG_USER_ONLY) @@ -997,6 +998,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz) return (Aff1 << ARM_AFF1_SHIFT) | Aff0; } +static CPUCaches default_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x02, +}, Would it be possible to populate this structure from the CLIDR/CCSIDR ID register values, rather than having to specify the same thing in two places? Sorry I missed this reply. I had tried to fetch CLIDR/CCSID ID register values of host cpu from KVM, however I did not get the value expected. May I made some mistakes in KVM side. Thanks for your guide, I'll try to populate them again. thanks -- PMM . Thanks. Ying.
Re: [PATCH] hw/arm/virt: Remove virt machine state 'smp_cpus'
On 12/16/2020 1:48 AM, Andrew Jones wrote: virt machine's 'smp_cpus' and machine->smp.cpus must always have the same value. And, anywhere we have virt machine state we have machine state. So let's remove the redundancy. Also, to make it easier to see that machine->smp is the true source for "smp_cpus" and "max_cpus", avoid passing them in function parameters, preferring instead to get them from the state. No functional change intended. Signed-off-by: Andrew Jones Reviewed-by: Ying Fang --- hw/arm/virt-acpi-build.c | 9 + hw/arm/virt.c| 24 +++- include/hw/arm/virt.h| 3 +-- 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 711cf2069fe8..9d9ee2405345 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -59,11 +59,12 @@ #define ACPI_BUILD_TABLE_SIZE 0x2 -static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) +static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { +MachineState *ms = MACHINE(vms); uint16_t i; -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < ms->smp.cpus; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); @@ -484,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < vms->smp_cpus; i++) { +for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -603,7 +604,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, vms->smp_cpus); +acpi_dsdt_add_cpus(scope, vms); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 556592012ee0..534d306f3104 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -323,7 +323,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/timer"); @@ -347,9 +347,9 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) static void fdt_add_cpu_nodes(const VirtMachineState *vms) { -int cpu; -int addr_cells = 1; const MachineState *ms = MACHINE(vms); +int smp_cpus = ms->smp.cpus, cpu; +int addr_cells = 1; /* * From Documentation/devicetree/bindings/arm/cpus.txt @@ -364,7 +364,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) * The simplest way to go is to examine affinity IDs of all our CPUs. If * at least one of them has Aff3 populated, we set #address-cells to 2. */ -for (cpu = 0; cpu < vms->smp_cpus; cpu++) { +for (cpu = 0; cpu < smp_cpus; cpu++) { ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); if (armcpu->mp_affinity & ARM_AFF3_MASK) { @@ -377,7 +377,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0); -for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) { +for (cpu = smp_cpus - 1; cpu >= 0; cpu--) { char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); CPUState *cs = CPU(armcpu); @@ -387,8 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", armcpu->dtb_compatible); -if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED -&& vms->smp_cpus > 1) { +if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED && smp_cpus > 1) { qemu_fdt_setprop_string(vms->fdt, nodename, "enable-
Re: [RFC PATCH v3 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus
On 11/9/2020 6:45 PM, Salil Mehta wrote: Hi Fangying, A trivial thing. This patch looks bit of a noise in this patch-set. Better to send it as a separate patch-set and get it accepted. Hmm, this patch looks like a code reactor for the somewhat confusing *smp_cpus* which will tidy the code. Maybe Andrew could do that. Thanks From: fangying Sent: Monday, November 9, 2020 3:05 AM To: peter.mayd...@linaro.org Cc: qemu-devel@nongnu.org; qemu-...@nongnu.org; drjo...@redhat.com; imamm...@redhat.com; shannon.zha...@gmail.com; alistair.fran...@wdc.com; Zhanghailiang ; Salil Mehta Subject: [RFC PATCH v3 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus From: Andrew Jones Prefer to spell out the smp.cpus and smp.max_cpus machine state variables in order to make grepping easier and to avoid any confusion as to what cpu count is being used where. Signed-off-by: Andrew Jones --- hw/arm/virt-acpi-build.c | 8 +++ hw/arm/virt.c| 51 +++- include/hw/arm/virt.h| 2 +- 3 files changed, 29 insertions(+), 32 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 9747a6458f..a222981737 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,11 +57,11 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) +static void acpi_dsdt_add_cpus(Aml *scope, int cpus) { uint16_t i; -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < cpus; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); @@ -480,7 +480,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < vms->smp_cpus; i++) { +for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -599,7 +599,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, vms->smp_cpus); +acpi_dsdt_add_cpus(scope, ms->smp.cpus); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e465a988d6..0069fa1298 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -322,7 +322,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/timer"); @@ -363,7 +363,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) * The simplest way to go is to examine affinity IDs of all our CPUs. If * at least one of them has Aff3 populated, we set #address-cells to 2. */ -for (cpu = 0; cpu < vms->smp_cpus; cpu++) { +for (cpu = 0; cpu < ms->smp.cpus; cpu++) { ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); if (armcpu->mp_affinity & ARM_AFF3_MASK) { @@ -376,7 +376,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0); -for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) { +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); CPUState *cs = CPU(armcpu); @@ -387,7 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) armcpu->dtb_compatible); if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED -&& vms->smp_cpus > 1) { +&& ms->smp.cpus > 1) { qemu_fdt_setprop_string(vms->fdt, nodename, "enable-method", "psci"); } @@ -533,7 +533,7 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/pmu"); @@ -622,14
Re: Question on UEFI ACPI tables setup and probing on arm64
On 11/7/2020 1:09 AM, Laszlo Ersek wrote: On 11/05/20 05:30, Ying Fang wrote: I see it in Qemu the *loader_start* is fixed at 1 GiB on the physical address space which points to the DRAM base. In ArmVirtQemu.dsc PcdDeviceTreeInitialBaseAddress is set 0x4000 with correspondence. Here I also see the discussion about DRAM base for ArmVirtQemu. https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03127.html I am still not sure how UEFI knows that it is running on a ArmVirtQemu machine type. It doesn't know. It remains a convention. This part is not auto-detected; the constants in QEMU and edk2 are independently open-coded, their values were synchronized by human effort initially. The user or the management layer have to make sure they boot a UEFI firmware binary on the machine type that is compatible with the machine type. There is some meta-data to help with that: Thanks so much for the reply, I now have the basic understanding how QEMU and EDK2 works together after reading the docs and code there. Does UEFI derive it from the fdt *compatible* property ? Please see the schema "docs/interop/firmware.json" in the QEMU tree; in particular the @FirmwareTarget element. For an actual example: QEMU bundles some edk2 firmware binaries (purely as a convenience, not for production), and those are accompanied by matching descriptor files. See "pc-bios/descriptors/60-edk2-aarch64.json". (It is a template that's fixed up during QEMU installation, but that's tangential here.) "targets": [ { "architecture": "aarch64", "machines": [ "virt-*" ] } ], Thanks, I'll look closer into it. Thanks Laszlo .
[RFC PATCH v3 13/13] hw/arm/virt-acpi-build: Enable cpu and cache topology
A helper struct AcpiCacheOffset is introduced to describe the offset of three level caches. The cache hierarchy is built according to ACPI spec v6.3 5.2.29.2. Let's enable CPU cache topology now. Signed-off-by: Ying Fang --- hw/acpi/aml-build.c | 19 +- hw/arm/virt-acpi-build.c| 52 - include/hw/acpi/acpi-defs.h | 6 + include/hw/acpi/aml-build.h | 7 ++--- 4 files changed, 68 insertions(+), 16 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 1a38110149..93a81fbaf5 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1799,27 +1799,32 @@ void build_cache_hierarchy(GArray *tbl, /* * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) */ -void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +void build_socket_hierarchy(GArray *tbl, uint32_t parent, +uint32_t offset, uint32_t id) { build_append_byte(tbl, 0); /* Type 0 - processor */ -build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_byte(tbl, 24); /* Length, with private resources */ build_append_int_noprefix(tbl, 0, 2); /* Reserved */ build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ build_append_int_noprefix(tbl, parent, 4); /* Parent */ build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ -build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, 1, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, offset, 4); /* Private resources */ } -void build_processor_hierarchy(GArray *tbl, uint32_t flags, - uint32_t parent, uint32_t id) +void build_processor_hierarchy(GArray *tbl, uint32_t flags, uint32_t parent, + AcpiCacheOffset offset, uint32_t id) { build_append_byte(tbl, 0); /* Type 0 - processor */ -build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_byte(tbl, 32); /* Length, with private resources */ build_append_int_noprefix(tbl, 0, 2); /* Reserved */ build_append_int_noprefix(tbl, flags, 4); /* Flags */ build_append_int_noprefix(tbl, parent, 4); /* Parent */ build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ -build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, 3, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, offset.l1d_offset, 4);/* Private resources */ +build_append_int_noprefix(tbl, offset.l1i_offset, 4);/* Private resources */ +build_append_int_noprefix(tbl, offset.l2_offset, 4); /* Private resources */ } void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 5784370257..ad49006b42 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -429,29 +429,69 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } -static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +static inline void arm_acpi_cache_info(CPUCacheInfo *cpu_cache, + AcpiCacheInfo *acpi_cache) { +acpi_cache->size = cpu_cache->size; +acpi_cache->sets = cpu_cache->sets; +acpi_cache->associativity = cpu_cache->associativity; +acpi_cache->attributes = cpu_cache->attributes; +acpi_cache->line_size = cpu_cache->line_size; +} + +static void build_pptt(GArray *table_data, BIOSLinker *linker, + VirtMachineState *vms) +{ +MachineState *ms = MACHINE(vms); int pptt_start = table_data->len; int uid = 0, cpus = 0, socket; unsigned int smp_cores = ms->smp.cores; unsigned int smp_threads = ms->smp.threads; +AcpiCacheOffset offset; +ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpus)); +AcpiCacheInfo cache_info; acpi_data_push(table_data, sizeof(AcpiTableHeader)); for (socket = 0; cpus < ms->possible_cpus->len; socket++) { -uint32_t socket_offset = table_data->len - pptt_start; +uint32_t l3_offset = table_data->len - pptt_start; +uint32_t socket_offset; int core; -build_socket_hierarchy(table_data, 0, socket); +/* L3 cache type structure */ +arm_acpi_cache_info(cpu->caches.l3_cache, _info); +build_cache_hierarchy(table_data, 0, _info); + +socket_offset = table_data->len - pptt_start; +build_socket_hierarchy(table_data, 0, l3_offset, socket); for (core = 0; core < smp_cores; core++) { uint32_t core_offset = table_data->len - pptt_start; int th
[RFC PATCH v3 03/13] hw/arm/virt: Replace smp_parse with one that prefers cores
From: Andrew Jones The virt machine type has never used the CPU topology parameters, other than number of online CPUs and max CPUs. When choosing how to allocate those CPUs the default has been to assume cores. In preparation for using the other CPU topology parameters let's use an smp_parse that prefers cores over sockets. We can also enforce the topology matches max_cpus check because we have no legacy to preserve. Signed-off-by: Andrew Jones --- hw/arm/virt.c | 76 +++ 1 file changed, 76 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ea24b576c6..ba902b53ba 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -78,6 +78,8 @@ #include "hw/virtio/virtio-iommu.h" #include "hw/char/pl011.h" #include "qemu/guest-random.h" +#include "qapi/qmp/qerror.h" +#include "sysemu/replay.h" #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \ static void virt_##major##_##minor##_class_init(ObjectClass *oc, \ @@ -2444,6 +2446,79 @@ static int virt_kvm_type(MachineState *ms, const char *type_str) return requested_pa_size > 40 ? requested_pa_size : 0; } +/* + * Unlike smp_parse() in hw/core/machine.c, we prefer cores over sockets, + * e.g. '-smp 8' creates 1 socket with 8 cores. Whereas '-smp 8' with + * hw/core/machine.c's smp_parse() creates 8 sockets, each with 1 core. + * Additionally, we can enforce the topology matches max_cpus check, + * because we have no legacy to preserve. + */ +static void virt_smp_parse(MachineState *ms, QemuOpts *opts) +{ +if (opts) { +unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); +unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned cores = qemu_opt_get_number(opts, "cores", 0); +unsigned threads = qemu_opt_get_number(opts, "threads", 0); + +/* + * Compute missing values; prefer cores over sockets and + * sockets over threads. + */ +if (cpus == 0 || cores == 0) { +sockets = sockets > 0 ? sockets : 1; +threads = threads > 0 ? threads : 1; +if (cpus == 0) { +cores = cores > 0 ? cores : 1; +cpus = cores * threads * sockets; +} else { +ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); +cores = ms->smp.max_cpus / (sockets * threads); +} +} else if (sockets == 0) { +threads = threads > 0 ? threads : 1; +sockets = cpus / (cores * threads); +sockets = sockets > 0 ? sockets : 1; +} else if (threads == 0) { +threads = cpus / (cores * sockets); +threads = threads > 0 ? threads : 1; +} else if (sockets * cores * threads < cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) < " + "smp_cpus (%u)", + sockets, cores, threads, cpus); +exit(1); +} + +ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); + +if (ms->smp.max_cpus < cpus) { +error_report("maxcpus must be equal to or greater than smp"); +exit(1); +} + +if (sockets * cores * threads != ms->smp.max_cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u)" + "!= maxcpus (%u)", + sockets, cores, threads, + ms->smp.max_cpus); +exit(1); +} + +ms->smp.cpus = cpus; +ms->smp.cores = cores; +ms->smp.threads = threads; +ms->smp.sockets = sockets; +} + +if (ms->smp.cpus > 1) { +Error *blocker = NULL; +error_setg(, QERR_REPLAY_NOT_SUPPORTED, "smp"); +replay_add_blocker(blocker); +} +} + static void virt_machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -2469,6 +2544,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = virt_cpu_index_to_props; mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15"); mc->get_default_cpu_node_id = virt_get_default_cpu_node_id; +mc->smp_parse = virt_smp_parse; mc->kvm_type = virt_kvm_type; assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; -- 2.23.0
[RFC PATCH v3 08/13] hw/acpi/aml-build: add processor hierarchy node structure
Add the processor hierarchy node structures to build ACPI information for cpu topology. Three helpers are introduced: (1) build_socket_hierarchy for socket description structure (2) build_processor_hierarchy for processor description structure (3) build_smt_hierarchy for thread (logic processor) description structure We split the processor hierarchy node structure descriptions into three helpers even if there are some identical code snippets between them. The reason is that the private resources are variable among different topology level. This will make the ACPI PPTT table much more readable and easy to construct. Cc: Igor Mammedov Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 37 + include/hw/acpi/aml-build.h | 7 +++ 2 files changed, 44 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 3792ba96ce..d1aa9fd716 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1770,6 +1770,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0);/* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */ +build_append_int_noprefix(tbl, parent , 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index fe0055fffb..56474835a7 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -437,6 +437,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id); -- 2.23.0
[RFC PATCH v3 11/13] hw/arm/virt: add fdt cache information
Support devicetree cpu cache information descriptions Signed-off-by: Ying Fang --- hw/arm/virt.c | 92 +++ 1 file changed, 92 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b6cebb5549..21275e03c2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -346,6 +346,89 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags); } +static void fdt_add_l3cache_nodes(const VirtMachineState *vms) +{ +int i; +const MachineState *ms = MACHINE(vms); +ARMCPU *cpu = ARM_CPU(first_cpu); +unsigned int smp_cores = ms->smp.cores; +unsigned int sockets = ms->smp.max_cpus / smp_cores; + +for (i = 0; i < sockets; i++) { +char *nodename = g_strdup_printf("/cpus/l3-cache%d", i); +qemu_fdt_add_subnode(vms->fdt, nodename); +qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache"); +qemu_fdt_setprop_string(vms->fdt, nodename, "cache-unified", "true"); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-level", 3); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size", + cpu->caches.l3_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size", + cpu->caches.l3_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets", + cpu->caches.l3_cache->sets); +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +g_free(nodename); +} +} + +static void fdt_add_l2cache_nodes(const VirtMachineState *vms) +{ +int i, j; +const MachineState *ms = MACHINE(vms); +unsigned int smp_cores = ms->smp.cores; +signed int sockets = ms->smp.max_cpus / smp_cores; +ARMCPU *cpu = ARM_CPU(first_cpu); + +for (i = 0; i < sockets; i++) { +char *next_path = g_strdup_printf("/cpus/l3-cache%d", i); +for (j = 0; j < smp_cores; j++) { +char *nodename = g_strdup_printf("/cpus/l2-cache%d", + i * smp_cores + j); +qemu_fdt_add_subnode(vms->fdt, nodename); +qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache"); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size", + cpu->caches.l2_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size", + cpu->caches.l2_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets", + cpu->caches.l2_cache->sets); +qemu_fdt_setprop_phandle(vms->fdt, nodename, + "next-level-cache", next_path); +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +g_free(nodename); +} +g_free(next_path); +} +} + +static void fdt_add_l1cache_prop(const VirtMachineState *vms, +char *nodename, int cpu_index) +{ + +ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpu_index)); +CPUCaches caches = cpu->caches; + +char *cachename = g_strdup_printf("/cpus/l2-cache%d", cpu_index); + +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-size", + caches.l1d_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-line-size", + caches.l1d_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-sets", + caches.l1d_cache->sets); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-size", + caches.l1i_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-line-size", + caches.l1i_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-sets", + caches.l1i_cache->sets); +qemu_fdt_setprop_phandle(vms->fdt, nodename, "next-level-cache", + cachename); +g_free(cachename); +} + static void fdt_add_cpu_nodes(const VirtMachineState *vms) { int cpu; @@ -379,6 +462,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#s
[RFC PATCH v3 10/13] target/arm/cpu: Add cpu cache description for arm
Add the CPUCacheInfo structure to hold cpu cache information for ARM cpus. A classic three level cache topology is used here. The default cache capacity is given and userspace can overwrite these values. Signed-off-by: Ying Fang --- target/arm/cpu.c | 42 ++ target/arm/cpu.h | 27 +++ 2 files changed, 69 insertions(+) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 056319859f..f1bac7452c 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "cpu.h" #include "internals.h" +#include "qemu/units.h" #include "exec/exec-all.h" #include "hw/qdev-properties.h" #if !defined(CONFIG_USER_ONLY) @@ -997,6 +998,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz) return (Aff1 << ARM_AFF1_SHIFT) | Aff0; } +static CPUCaches default_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x02, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x04, +}, +.l2_cache = &(CPUCacheInfo) { +.type = UNIFIED_CACHE, +.level = 2, +.size = 512 * KiB, +.line_size = 64, +.associativity = 8, +.sets = 1024, +.attributes = 0x0a, +}, +.l3_cache = &(CPUCacheInfo) { +.type = UNIFIED_CACHE, +.level = 3, +.size = 65536 * KiB, +.line_size = 64, +.associativity = 15, +.sets = 2048, +.attributes = 0x0a, +}, +}; + static void cpreg_hashtable_data_destroy(gpointer data) { /* @@ -1841,6 +1881,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp) } } +cpu->caches = default_cache_info; + qemu_init_vcpu(cs); cpu_reset(cs); diff --git a/target/arm/cpu.h b/target/arm/cpu.h index cfff1b5c8f..dbc33a9802 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -746,6 +746,30 @@ typedef enum ARMPSCIState { typedef struct ARMISARegisters ARMISARegisters; +/* Cache information type */ +enum CacheType { +DATA_CACHE, +INSTRUCTION_CACHE, +UNIFIED_CACHE +}; + +typedef struct CPUCacheInfo { +enum CacheType type; /* Cache Type*/ +uint8_t level; +uint32_t size;/* Size in bytes */ +uint16_t line_size; /* Line size in bytes */ +uint8_t associativity;/* Cache associativity */ +uint32_t sets;/* Number of sets */ +uint8_t attributes; /* Cache attributest */ +} CPUCacheInfo; + +typedef struct CPUCaches { +CPUCacheInfo *l1d_cache; +CPUCacheInfo *l1i_cache; +CPUCacheInfo *l2_cache; +CPUCacheInfo *l3_cache; +} CPUCaches; + /** * ARMCPU: * @env: #CPUARMState @@ -987,6 +1011,9 @@ struct ARMCPU { /* Generic timer counter frequency, in Hz */ uint64_t gt_cntfrq_hz; + +/* CPU cache information */ +CPUCaches caches; }; unsigned int gt_cntfrq_period_ns(ARMCPU *cpu); -- 2.23.0
[RFC PATCH v3 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus
From: Andrew Jones Prefer to spell out the smp.cpus and smp.max_cpus machine state variables in order to make grepping easier and to avoid any confusion as to what cpu count is being used where. Signed-off-by: Andrew Jones --- hw/arm/virt-acpi-build.c | 8 +++ hw/arm/virt.c| 51 +++- include/hw/arm/virt.h| 2 +- 3 files changed, 29 insertions(+), 32 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 9747a6458f..a222981737 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,11 +57,11 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) +static void acpi_dsdt_add_cpus(Aml *scope, int cpus) { uint16_t i; -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < cpus; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); @@ -480,7 +480,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < vms->smp_cpus; i++) { +for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -599,7 +599,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, vms->smp_cpus); +acpi_dsdt_add_cpus(scope, ms->smp.cpus); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e465a988d6..0069fa1298 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -322,7 +322,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/timer"); @@ -363,7 +363,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) * The simplest way to go is to examine affinity IDs of all our CPUs. If * at least one of them has Aff3 populated, we set #address-cells to 2. */ -for (cpu = 0; cpu < vms->smp_cpus; cpu++) { +for (cpu = 0; cpu < ms->smp.cpus; cpu++) { ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); if (armcpu->mp_affinity & ARM_AFF3_MASK) { @@ -376,7 +376,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0); -for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) { +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); CPUState *cs = CPU(armcpu); @@ -387,7 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) armcpu->dtb_compatible); if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED -&& vms->smp_cpus > 1) { +&& ms->smp.cpus > 1) { qemu_fdt_setprop_string(vms->fdt, nodename, "enable-method", "psci"); } @@ -533,7 +533,7 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/pmu"); @@ -622,14 +622,13 @@ static void create_gic(VirtMachineState *vms) SysBusDevice *gicbusdev; const char *gictype; int type = vms->gic_version, i; -unsigned int smp_cpus = ms->smp.cpus; uint32_t nb_redist_regions = 0; gictype = (type == 3) ? gicv3_class_name() : gic_class_name(); vms->gic = qdev_new(gictype); qdev_prop_set_uint32(vms->gic, "revision", type); -qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus); +qdev_prop_set_uint32(vms->gic, "num-cpu", ms->smp.cpus); /* Note that the num-irq property counts both internal and external * interrupts; there are always 32 of the former (mandated by GIC spec). */ @@ -641,7 +640,7 @@ static void
[RFC PATCH v3 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus
When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled in madt. Furthermore, it is also needed if we are going to support CPU hotplug in the future. This patch is a rework based on Andrew Jones's contribution at https://lists.gnu.org/archive/html/qemu-arm/2018-07/msg00076.html Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 17 - hw/arm/virt.c| 3 +++ 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index a222981737..9edd6385dc 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,14 +57,18 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int cpus) +static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { uint16_t i; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; -for (i = 0; i < cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); +if (possible_cpus->cpus[i].cpu == NULL) { +aml_append(dev, aml_name_decl("_STA", aml_int(0))); +} aml_append(scope, dev); } } @@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const int *irqmap = vms->irqmap; AcpiMadtGenericDistributor *gicd; AcpiMadtGenericMsiFrame *gic_msi; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; int i; acpi_data_push(table_data, sizeof(AcpiMultipleApicTable)); @@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicc->cpu_interface_number = cpu_to_le32(i); gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); gicc->uid = cpu_to_le32(i); -gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +if (possible_cpus->cpus[i].cpu != NULL) { +gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +} if (arm_feature(>env, ARM_FEATURE_PMU)) { gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ)); @@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, ms->smp.cpus); +acpi_dsdt_add_cpus(scope, vms); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { diff --git a/hw/arm/virt.c b/hw/arm/virt.c index d23b941020..b6cebb5549 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1977,6 +1977,9 @@ static void machvirt_init(MachineState *machine) qdev_realize(DEVICE(cpuobj), NULL, _fatal); object_unref(cpuobj); + +/* Initialize cpu member here since cpu hotplug is not supported yet */ +machine->possible_cpus->cpus[n].cpu = cpuobj; } fdt_add_timer_nodes(vms); fdt_add_cpu_nodes(vms); -- 2.23.0
[RFC PATCH v3 05/13] hw: add compat machines for 5.3
Add 5.3 machine types for arm/i440fx/q35/s390x/spapr. Signed-off-by: Ying Fang --- hw/arm/virt.c | 9 - hw/core/machine.c | 3 +++ hw/i386/pc.c | 3 +++ hw/i386/pc_piix.c | 15 ++- hw/i386/pc_q35.c | 14 +- hw/ppc/spapr.c | 15 +-- hw/s390x/s390-virtio-ccw.c | 14 +- include/hw/boards.h| 3 +++ include/hw/i386/pc.h | 3 +++ 9 files changed, 73 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ba902b53ba..ff8a14439e 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2665,10 +2665,17 @@ static void machvirt_machine_init(void) } type_init(machvirt_machine_init); +static void virt_machine_5_3_options(MachineClass *mc) +{ +} +DEFINE_VIRT_MACHINE_AS_LATEST(5, 3) + static void virt_machine_5_2_options(MachineClass *mc) { +virt_machine_5_3_options(mc); +compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); } -DEFINE_VIRT_MACHINE_AS_LATEST(5, 2) +DEFINE_VIRT_MACHINE(5, 2) static void virt_machine_5_1_options(MachineClass *mc) { diff --git a/hw/core/machine.c b/hw/core/machine.c index 7e2f4ec08e..6dc77699a9 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -28,6 +28,9 @@ #include "hw/mem/nvdimm.h" #include "migration/vmstate.h" +GlobalProperty hw_compat_5_2[] = { }; +const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2); + GlobalProperty hw_compat_5_1[] = { { "vhost-scsi", "num_queues", "1"}, { "vhost-user-blk", "num-queues", "1"}, diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e87be5d29a..eaa046ff5d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -97,6 +97,9 @@ #include "trace.h" #include CONFIG_DEVICES +GlobalProperty pc_compat_5_2[] = { }; +const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2); + GlobalProperty pc_compat_5_1[] = { { "ICH9-LPC", "x-smi-cpu-hotplug", "off" }, }; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 3c2ae0612b..01254090ce 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m) machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE); } -static void pc_i440fx_5_2_machine_options(MachineClass *m) +static void pc_i440fx_5_3_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_i440fx_machine_options(m); @@ -435,6 +435,19 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_I440FX_MACHINE(v5_3, "pc-i440fx-5.3", NULL, + pc_i440fx_5_3_machine_options); + +static void pc_i440fx_5_2_machine_options(MachineClass *m) +{ +PCMachineClass *pcmc = PC_MACHINE_CLASS(m); +pc_i440fx_machine_options(m); +m->alias = NULL; +m->is_default = false; +compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len); +compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len); +} + DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL, pc_i440fx_5_2_machine_options); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index a3f4959c43..dd14803edb 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m) m->max_cpus = 288; } -static void pc_q35_5_2_machine_options(MachineClass *m) +static void pc_q35_5_3_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_machine_options(m); @@ -352,6 +352,18 @@ static void pc_q35_5_2_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_Q35_MACHINE(v5_3, "pc-q35-5.3", NULL, + pc_q35_5_3_machine_options); + +static void pc_q35_5_2_machine_options(MachineClass *m) +{ +PCMachineClass *pcmc = PC_MACHINE_CLASS(m); +pc_q35_machine_options(m); +m->alias = NULL; +compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len); +compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len); +} + DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL, pc_q35_5_2_machine_options); diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2db810f73a..c292a3edd9 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4511,15 +4511,26 @@ static void spapr_machine_latest_class_options(MachineClass *mc) }\ type_init(spapr_machine_register_##suffix) +/* + * pseries-5.3 + */ +static void spapr_machine_5_3_class_options(MachineClass *mc) +{ +/* Defaults for the latest behaviour inherited from the base class */ +} + +DEFINE_SPAPR_MACHINE(5_3, "5.3", true); + /* * pseries-5.2 */ static void spapr
[RFC PATCH v3 02/13] hw/arm/virt: Remove unused variable
From: Andrew Jones We no longer use the smp_cpus virtual machine state variable. Remove it. Signed-off-by: Andrew Jones --- hw/arm/virt.c | 2 -- include/hw/arm/virt.h | 1 - 2 files changed, 3 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0069fa1298..ea24b576c6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1820,8 +1820,6 @@ static void machvirt_init(MachineState *machine) exit(1); } -vms->smp_cpus = smp_cpus; - if (vms->virt && kvm_enabled()) { error_report("mach-virt: KVM does not support providing " "Virtualization extensions to the guest CPU"); diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 953d94acc0..010f24f580 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -151,7 +151,6 @@ struct VirtMachineState { MemMapEntry *memmap; char *pciehb_nodename; const int *irqmap; -int smp_cpus; void *fdt; int fdt_size; uint32_t clock_phandle; -- 2.23.0
[RFC PATCH v3 09/13] hw/arm/virt-acpi-build: add PPTT table
Add the Processor Properties Topology Table (PPTT) to present cpu topology information to the guest. Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 42 1 file changed, 42 insertions(+) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 9edd6385dc..5784370257 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -429,6 +429,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } +static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +{ +int pptt_start = table_data->len; +int uid = 0, cpus = 0, socket; +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; + +acpi_data_push(table_data, sizeof(AcpiTableHeader)); + +for (socket = 0; cpus < ms->possible_cpus->len; socket++) { +uint32_t socket_offset = table_data->len - pptt_start; +int core; + +build_socket_hierarchy(table_data, 0, socket); + +for (core = 0; core < smp_cores; core++) { +uint32_t core_offset = table_data->len - pptt_start; +int thread; + +if (smp_threads <= 1) { +build_processor_hierarchy(table_data, 2, socket_offset, uid++); + } else { +build_processor_hierarchy(table_data, 0, socket_offset, core); +for (thread = 0; thread < smp_threads; thread++) { +build_smt_hierarchy(table_data, core_offset, uid++); +} + } +} +cpus += smp_cores * smp_threads; +} + +build_header(linker, table_data, + (void *)(table_data->data + pptt_start), "PPTT", + table_data->len - pptt_start, 2, NULL, NULL); +} + /* GTDT */ static void build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -669,6 +705,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) unsigned dsdt, xsdt; GArray *tables_blob = tables->table_data; MachineState *ms = MACHINE(vms); +bool cpu_topology_enabled = !vmc->ignore_cpu_topology; table_offsets = g_array_new(false, true /* clear */, sizeof(uint32_t)); @@ -688,6 +725,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); +if (cpu_topology_enabled) { +acpi_add_table(table_offsets, tables_blob); +build_pptt(tables_blob, tables->linker, ms); +} + acpi_add_table(table_offsets, tables_blob); build_gtdt(tables_blob, tables->linker, vms); -- 2.23.0
[RFC PATCH v3 00/13] hw/arm/virt: Introduce cpu and cache topology support
An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Cpu cache information may also have slight impact on the sched domain, and even userspace software may check the cpu cache information to do some optimizations. Dario Faggioli's talk in [0] also shows the virtual topology may has impact on sched performace. Thus this patch series is posted to provide cpu and cache topology support for arm platform. Both fdt and ACPI are introduced to present the cpu and cache topology. To describe the cpu topology via ACPI, a PPTT table is introduced according to the processor hierarchy node structure. To describe the cpu cache information, a default cache hierarchy is given and built according to the cache type structure defined by ACPI, it can be made configurable later. The RFC v1 was posted at [1], we tried to map the MPIDR register into cpu topology, however it is totally wrong. Andrew points it out that Linux kernel is goint to stop using MPIDR for topology information [2]. The root cause is the MPIDR register has been abused by ARM OEM manufactures. It is only used as an identifer for a specific cpu, not representation of the topology. Moreover this v2 is rebased on Andrew's latest branch shared [4]. This patch series was initially based on the patches posted by Andrew Jones [3]. I jumped in on it since some OS vendor cooperative partner are eager for it. Thanks for Andrew's contribution. After applying this patch series, launch a guest with virt-5.3 and cpu topology configured with sockets:cores:threads = 2:4:2, you will get the bellow messages with the lscpu command. - Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 2 NUMA node(s):2 Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 Stepping:0x1 BogoMIPS:200.00 L1d cache: 512 KiB L1i cache: 512 KiB L2 cache:4 MiB L3 cache:128 MiB NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 changelog v2 -> v3: - Make use of possible_cpus->cpus[i].cpu to check against current online cpus v1 -> v2: - Rebased to the latest branch shared by Andrew Jones [4] - Stop mapping MPIDR into vcpu topology [0] https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse [1] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg06027.html [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20200829130016.26106-1-valentin.schnei...@arm.com/ [3] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com [4] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh Andrew Jones (5): hw/arm/virt: Spell out smp.cpus and smp.max_cpus hw/arm/virt: Remove unused variable hw/arm/virt: Replace smp_parse with one that prefers cores device_tree: Add qemu_fdt_add_path hw/arm/virt: DT: add cpu-map Ying Fang (8): hw: add compat machines for 5.3 hw/arm/virt-acpi-build: distinguish possible and present cpus hw/acpi/aml-build: add processor hierarchy node structure hw/arm/virt-acpi-build: add PPTT table target/arm/cpu: Add cpu cache description for arm hw/arm/virt: add fdt cache information hw/acpi/aml-build: Build ACPI cpu cache hierarchy information hw/arm/virt-acpi-build: Enable cpu and cache topology device_tree.c| 45 +- hw/acpi/aml-build.c | 68 + hw/arm/virt-acpi-build.c | 99 - hw/arm/virt.c| 273 +++ hw/core/machine.c| 3 + hw/i386/pc.c | 3 + hw/i386/pc_piix.c| 15 +- hw/i386/pc_q35.c | 14 +- hw/ppc/spapr.c | 15 +- hw/s390x/s390-virtio-ccw.c | 14 +- include/hw/acpi/acpi-defs.h | 14 ++ include/hw/acpi/aml-build.h | 11 ++ include/hw/arm/virt.h| 4 +- include/hw/boards.h | 3 + include/hw/i386/pc.h | 3 + include/sysemu/device_tree.h | 1 + target/arm/cpu.c | 42 ++ target/arm/cpu.h | 27 18 files changed, 609 insertions(+), 45 deletions(-) -- 2.23.0
[RFC PATCH v3 12/13] hw/acpi/aml-build: Build ACPI cpu cache hierarchy information
To build cache information, An AcpiCacheInfo structure is defined to hold the type 1 cache structure according to ACPI spec v6.3 5.2.29.2. A helper function build_cache_hierarchy is also introduced to encode the cache information. Signed-off-by: Ying Fang --- hw/acpi/aml-build.c | 26 ++ include/hw/acpi/acpi-defs.h | 8 include/hw/acpi/aml-build.h | 3 +++ 3 files changed, 37 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index d1aa9fd716..1a38110149 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1770,6 +1770,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* ACPI 6.3: 5.29.2 Cache type structure (Type 1) */ +static void build_cache_head(GArray *tbl, uint32_t next_level) +{ +build_append_byte(tbl, 1); +build_append_byte(tbl, 24); +build_append_int_noprefix(tbl, 0, 2); +build_append_int_noprefix(tbl, 0x7f, 4); +build_append_int_noprefix(tbl, next_level, 4); +} + +static void build_cache_tail(GArray *tbl, AcpiCacheInfo *cache_info) +{ +build_append_int_noprefix(tbl, cache_info->size, 4); +build_append_int_noprefix(tbl, cache_info->sets, 4); +build_append_byte(tbl, cache_info->associativity); +build_append_byte(tbl, cache_info->attributes); +build_append_int_noprefix(tbl, cache_info->line_size, 2); +} + +void build_cache_hierarchy(GArray *tbl, + uint32_t next_level, AcpiCacheInfo *cache_info) +{ +build_cache_head(tbl, next_level); +build_cache_tail(tbl, cache_info); +} + /* * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) */ diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index 38a42f409a..3df38ab449 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -618,4 +618,12 @@ struct AcpiIortRC { } QEMU_PACKED; typedef struct AcpiIortRC AcpiIortRC; +typedef struct AcpiCacheInfo { +uint32_t size; +uint32_t sets; +uint8_t associativity; +uint8_t attributes; +uint16_t line_size; +} AcpiCacheInfo; + #endif diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 56474835a7..01078753a8 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -437,6 +437,9 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_cache_hierarchy(GArray *tbl, + uint32_t next_level, AcpiCacheInfo *cache_info); + void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); void build_processor_hierarchy(GArray *tbl, uint32_t flags, -- 2.23.0
[RFC PATCH v3 04/13] device_tree: Add qemu_fdt_add_path
From: Andrew Jones qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it also adds any missing parent nodes. We also tweak an error message of qemu_fdt_add_subnode(). We'll make use of the new function in a coming patch. Signed-off-by: Andrew Jones --- device_tree.c| 45 ++-- include/sysemu/device_tree.h | 1 + 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/device_tree.c b/device_tree.c index b335dae707..c080909bb9 100644 --- a/device_tree.c +++ b/device_tree.c @@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) retval = fdt_add_subnode(fdt, parent, basename); if (retval < 0) { -error_report("FDT: Failed to create subnode %s: %s", name, - fdt_strerror(retval)); +error_report("%s: Failed to create subnode %s: %s", + __func__, name, fdt_strerror(retval)); exit(1); } @@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +/* + * Like qemu_fdt_add_subnode(), but will add all missing + * subnodes in the path. + */ +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *dupname, *basename, *p; +int parent, retval = -1; + +if (path[0] != '/') { +return retval; +} + +parent = fdt_path_offset(fdt, "/"); +p = dupname = g_strdup(path); + +while (p) { +*p = '/'; +basename = p + 1; +p = strchr(p + 1, '/'); +if (p) { +*p = '\0'; +} +retval = fdt_path_offset(fdt, dupname); +if (retval < 0 && retval != -FDT_ERR_NOTFOUND) { +error_report("%s: Invalid path %s: %s", + __func__, path, fdt_strerror(retval)); +exit(1); +} else if (retval == -FDT_ERR_NOTFOUND) { +retval = fdt_add_subnode(fdt, parent, basename); +if (retval < 0) { +break; +} +} +parent = retval; +} + +g_free(dupname); +return retval; +} + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb"); diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ -- 2.23.0
[RFC PATCH v3 06/13] hw/arm/virt: DT: add cpu-map
From: Andrew Jones Support devicetree CPU topology descriptions. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt.c | 40 +++- include/hw/arm/virt.h | 1 + 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ff8a14439e..d23b941020 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -351,9 +351,10 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) int cpu; int addr_cells = 1; const MachineState *ms = MACHINE(vms); +VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); /* - * From Documentation/devicetree/bindings/arm/cpus.txt + * See Linux Documentation/devicetree/bindings/arm/cpus.yaml * On ARM v8 64-bit systems value should be set to 2, * that corresponds to the MPIDR_EL1 register size. * If MPIDR_EL1[63:32] value is equal to 0 on all CPUs @@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) ms->possible_cpus->cpus[cs->cpu_index].props.node_id); } +if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) { +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +} + g_free(nodename); } + +if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) { +/* + * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt + */ +qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map"); + +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { +char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu); +char *map_path; + +if (ms->smp.threads > 1) { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d/%s%d", +"cluster", cpu / (ms->smp.cores * ms->smp.threads), +"core", (cpu / ms->smp.threads) % ms->smp.cores, +"thread", cpu % ms->smp.threads); +} else { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d", +"cluster", cpu / ms->smp.cores, +"core", cpu % ms->smp.cores); +} +qemu_fdt_add_path(vms->fdt, map_path); +qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path); +g_free(map_path); +g_free(cpu_path); +} +} } static void fdt_add_its_gic_node(VirtMachineState *vms) @@ -2672,8 +2707,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 3) static void virt_machine_5_2_options(MachineClass *mc) { +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc)); + virt_machine_5_3_options(mc); compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); +vmc->ignore_cpu_topology = true; } DEFINE_VIRT_MACHINE(5, 2) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 010f24f580..917bd8b645 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -118,6 +118,7 @@ typedef enum VirtGICType { struct VirtMachineClass { MachineClass parent; bool disallow_affinity_adjustment; +bool ignore_cpu_topology; bool no_its; bool no_pmu; bool claim_edge_triggered_timers; -- 2.23.0
Re: Question on UEFI ACPI tables setup and probing on arm64
On 11/5/2020 5:46 AM, Laszlo Ersek wrote: +Ard, +Drew On 11/03/20 13:39, Igor Mammedov wrote: On Fri, 30 Oct 2020 10:50:01 +0800 Ying Fang wrote: Hi, I have a question on UEFI/ACPI tables setup and probing on arm64 platform. CCing Laszlo, who might know how it's implemented. Currently on arm64 platform guest can be booted with both fdt and ACPI supported. If ACPI is enabled, [1] says the only defined method for passing ACPI tables to the kernel is via the UEFI system configuration table. So AFAIK, ACPI Should be dependent on UEFI. That's correct. The ACPI entry point (RSD PTR) on AARCH64 is defined in terms of UEFI. What's more [2] says UEFI kernel support on the ARM architectures is only available through a *stub*. The stub populates the FDT /chosen node with some UEFI parameters describing the UEFI location info. Yes. So i dump /sys/firmware/fdt from the guest, it does have something like: /dts-v1/; / { #size-cells = <0x02>; #address-cells = <0x02>; chosen { linux,uefi-mmap-desc-ver = <0x01>; linux,uefi-mmap-desc-size = <0x30>; linux,uefi-mmap-size = <0x810>; linux,uefi-mmap-start = <0x04 0x3c0ce018>; linux,uefi-system-table = <0x04 0x3f8b0018>; bootargs = "BOOT_IMAGE=/vmlinuz-4.19.90-2003.4.0.0036.oe1.aarch64 root=/dev/mapper/openeuler-root ro rd.lvm.lv=openeuler/root rd.lvm.lv=openeuler/swap video=VGA-1:640x480-32@60me smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 crashkernel=1024M,high video=efifb:off video=VGA-1:640x480-32@60me"; linux,initrd-end = <0x04 0x3a85a5da>; linux,initrd-start = <0x04 0x392f2000>; }; }; But the question is that I did not see any code adding the uefi in fdt chosen node in *arm_load_dtb* or anywhere else. That's because the "UEFI stub" is a part of the guest kernel. It wraps the guest kernel image into a UEFI application binary. For a while, the guest kernel runs as a UEFI application, stashing some UEFI artifacts in *a* device tree, and then (after some other heavy lifting) jumping into the kernel proper. Qemu only maps the OVMF binary file into a pflash device. So I'm really confused on how UEFI information is provided to guest by qemu. Does anybody know of the details about it ? It's complex, unfortunately. (1) QEMU always generates a DTB for the guest firmware. This DTB is placed at the base of the guest RAM. See the arm_load_dtb() call in virt_machine_done() [hw/arm/virt.c] in QEMU. I think. Hi Laszlo. Thanks so much for sharing the details with us. The reply nearly covers the boot sequence of aarch64 on the whole. I see it in Qemu the *loader_start* is fixed at 1 GiB on the physical address space which points to the DRAM base. In ArmVirtQemu.dsc PcdDeviceTreeInitialBaseAddress is set 0x4000 with correspondence. Here I also see the discussion about DRAM base for ArmVirtQemu. https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03127.html I am still not sure how UEFI knows that it is running on a ArmVirtQemu machine type. Does UEFI derive it from the fdt *compatible* property ? (2) QEMU generates ACPI content, and exposes it via fw_cfg. See the virt_acpi_setup() call in the same virt_machine_done() function [hw/arm/virt.c] in QEMU. (3) The fw_cfg device itself is apparent to the guest firmware via the DTB from point (1). See the following steps in edk2: (3a) "ArmVirtPkg/Library/PlatformPeiLib/PlatformPeiLib.c" This saves the initial DTB (from the base of guest RAM, where it could be overwritten by whatever) to a dynamically allocated area. This "stashing" occurs early. (3b) "ArmVirtPkg/FdtClientDxe/FdtClientDxe.c" This driver exposes the (dynamically reallocated / copied) DTB via a custom UEFI protocol to the rest of the firmware. (This happens much later.) This protocol / driver can be considered the "owner" of the stashed DTB from (3a). (3c) "ArmVirtPkg/Library/QemuFwCfgLib/QemuFwCfgLib.c" This is the fw_cfg device access library, discovering the fw_cfg registers via the above UEFI protocol. The library is linked into each firmware module that needs fw_cfg access. (4) The firmware interprets QEMU's DTB for actual content (parsing values, configuring hardware, accessing devices). This occurs in a whole bunch of locations, mostly via consuming the custom protocol from (3b). Some info that's needed very early is parsed out of the DTB right in step (3a). (5) The guest firmware has a dedicated driver that checks whether QEMU was configured with ACPI enabled or disabled, and publishes that choice to the rest of the firmware. This is necessary because some firmware actions / infrastructure parts cannot (must not) proceed until this decision has been interpreted. See in edk2: - ArmVirtPkg/PlatformHasAcpiDtDxe This
Re: [RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message
On 10/30/2020 1:20 AM, Andrew Jones wrote: You need to remove 'Message' from the summary. On Tue, Oct 20, 2020 at 09:14:34PM +0800, Ying Fang wrote: When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled. Signed-off-by: Andrew Jones I guess my s-o-b is here because this is a rework of https://github.com/rhdrjones/qemu/commit/b18d7a889f424b8a8679c43d7f4804fdeeeaf3fd I think it changed enough you could just drop my authorship. A based-on comment in the commit message would be more than enough. Comment on the patch below. Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index a222981737..fae5a26741 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,14 +57,18 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int cpus) +static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { uint16_t i; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; -for (i = 0; i < cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); +if (possible_cpus->cpus[i].cpu == NULL) { +aml_append(dev, aml_name_decl("_STA", aml_int(0))); +} aml_append(scope, dev); } } @@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const int *irqmap = vms->irqmap; AcpiMadtGenericDistributor *gicd; AcpiMadtGenericMsiFrame *gic_msi; +int possible_cpus = MACHINE(vms)->possible_cpus->len; int i; acpi_data_push(table_data, sizeof(AcpiMultipleApicTable)); @@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { +for (i = 0; i < possible_cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicc->cpu_interface_number = cpu_to_le32(i); gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); gicc->uid = cpu_to_le32(i); -gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +if (i < MACHINE(vms)->smp.cpus) { Shouldn't this be if (possible_cpus->cpus[i].cpu != NULL) { +gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +} I now realized that I switched to use current cpu number as the limit to make GIC flags enabled here. However to judge NULL is much more suitable here. Thanks, Ying. if (arm_feature(>env, ARM_FEATURE_PMU)) { gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ)); @@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, ms->smp.cpus); +acpi_dsdt_add_cpus(scope, vms); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { -- 2.23.0 Thanks, drew .
Re: [RFC PATCH v2 09/13] hw/arm/virt-acpi-build: add PPTT table
On 10/30/2020 12:56 AM, Andrew Jones wrote: On Tue, Oct 20, 2020 at 09:14:36PM +0800, Ying Fang wrote: Add the Processor Properties Topology Table (PPTT) to present CPU topology information to the guest. Signed-off-by: Andrew Jones I don't know why I have an s-o-b here. I guess it's because this code looks nearly identical to what I wrote, except for using the new and, IMO, unnecessary build_socket_hierarchy and build_smt_hierarchy functions. IMHO, you should drop the last patch and just take https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11 as it is, unless it needs to be fixed somehow Thanks, drew This patch is based on your branch however it is slightly modified. As described in: [RFC,v2,08/13] hw/acpi/aml-build: add processor hierarchy node structure The wrapper function build_socket_hierarchy and build_smt_hierarchy are introduced to make later patch much more readable and make preparations for cache hierarchy. Hope it won't make you confused. I will drop your branch patch and give details in the commit message in the next post. Thanks, Ying Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 42 1 file changed, 42 insertions(+) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index fae5a26741..e1f3ea50ad 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -429,6 +429,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } +static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +{ +int pptt_start = table_data->len; +int uid = 0, cpus = 0, socket; +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; + +acpi_data_push(table_data, sizeof(AcpiTableHeader)); + +for (socket = 0; cpus < ms->possible_cpus->len; socket++) { +uint32_t socket_offset = table_data->len - pptt_start; +int core; + +build_socket_hierarchy(table_data, 0, socket); + +for (core = 0; core < smp_cores; core++) { +uint32_t core_offset = table_data->len - pptt_start; +int thread; + +if (smp_threads <= 1) { +build_processor_hierarchy(table_data, 2, socket_offset, uid++); + } else { +build_processor_hierarchy(table_data, 0, socket_offset, core); +for (thread = 0; thread < smp_threads; thread++) { +build_smt_hierarchy(table_data, core_offset, uid++); +} + } +} +cpus += smp_cores * smp_threads; +} + +build_header(linker, table_data, + (void *)(table_data->data + pptt_start), "PPTT", + table_data->len - pptt_start, 2, NULL, NULL); +} + /* GTDT */ static void build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -669,6 +705,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) unsigned dsdt, xsdt; GArray *tables_blob = tables->table_data; MachineState *ms = MACHINE(vms); +bool cpu_topology_enabled = !vmc->ignore_cpu_topology; table_offsets = g_array_new(false, true /* clear */, sizeof(uint32_t)); @@ -688,6 +725,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); +if (cpu_topology_enabled) { +acpi_add_table(table_offsets, tables_blob); +build_pptt(tables_blob, tables->linker, ms); +} + acpi_add_table(table_offsets, tables_blob); build_gtdt(tables_blob, tables->linker, vms); -- 2.23.0 .
Re: [RFC PATCH v2 08/13] hw/acpi/aml-build: add processor hierarchy node structure
On 10/30/2020 1:24 AM, Andrew Jones wrote: On Tue, Oct 20, 2020 at 09:14:35PM +0800, Ying Fang wrote: Add the processor hierarchy node structures to build ACPI information for CPU topology. Three helpers are introduced: (1) build_socket_hierarchy for socket description structure (2) build_processor_hierarchy for processor description structure (3) build_smt_hierarchy for thread (logic processor) description structure I see now the reason to introduce three functions is because the last patch adds different private resources. You should point that plan out in this commit message. Yes, the private resources are used to describe cache hierarchy and it is variable among different topology level. I will point it out in the commit message to avoid any confusion. Thanks, Ying Thanks, drew Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 37 + include/hw/acpi/aml-build.h | 7 +++ 2 files changed, 44 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 3792ba96ce..da3b41b514 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1770,6 +1770,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0);/* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, add private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index fe0055fffb..56474835a7 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -437,6 +437,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id); -- 2.23.0 .
Re: [RFC PATCH v2 05/13] hw: add compat machines for 5.3
On 10/30/2020 1:08 AM, Andrew Jones wrote: On Tue, Oct 20, 2020 at 09:14:32PM +0800, Ying Fang wrote: Add 5.2 machine types for arm/i440fx/q35/s390x/spapr. ^ 5.3 Thanks. Will fix, careless spelling mistake. Thanks, drew Signed-off-by: Ying Fang --- hw/arm/virt.c | 9 - hw/core/machine.c | 3 +++ hw/i386/pc.c | 3 +++ hw/i386/pc_piix.c | 15 ++- hw/i386/pc_q35.c | 14 +- hw/ppc/spapr.c | 15 +-- hw/s390x/s390-virtio-ccw.c | 14 +- include/hw/boards.h| 3 +++ include/hw/i386/pc.h | 3 +++ 9 files changed, 73 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ba902b53ba..ff8a14439e 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2665,10 +2665,17 @@ static void machvirt_machine_init(void) } type_init(machvirt_machine_init); +static void virt_machine_5_3_options(MachineClass *mc) +{ +} +DEFINE_VIRT_MACHINE_AS_LATEST(5, 3) + static void virt_machine_5_2_options(MachineClass *mc) { +virt_machine_5_3_options(mc); +compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); } -DEFINE_VIRT_MACHINE_AS_LATEST(5, 2) +DEFINE_VIRT_MACHINE(5, 2) static void virt_machine_5_1_options(MachineClass *mc) { diff --git a/hw/core/machine.c b/hw/core/machine.c index 7e2f4ec08e..6dc77699a9 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -28,6 +28,9 @@ #include "hw/mem/nvdimm.h" #include "migration/vmstate.h" +GlobalProperty hw_compat_5_2[] = { }; +const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2); + GlobalProperty hw_compat_5_1[] = { { "vhost-scsi", "num_queues", "1"}, { "vhost-user-blk", "num-queues", "1"}, diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e87be5d29a..eaa046ff5d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -97,6 +97,9 @@ #include "trace.h" #include CONFIG_DEVICES +GlobalProperty pc_compat_5_2[] = { }; +const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2); + GlobalProperty pc_compat_5_1[] = { { "ICH9-LPC", "x-smi-cpu-hotplug", "off" }, }; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 3c2ae0612b..01254090ce 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m) machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE); } -static void pc_i440fx_5_2_machine_options(MachineClass *m) +static void pc_i440fx_5_3_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_i440fx_machine_options(m); @@ -435,6 +435,19 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_I440FX_MACHINE(v5_3, "pc-i440fx-5.3", NULL, + pc_i440fx_5_3_machine_options); + +static void pc_i440fx_5_2_machine_options(MachineClass *m) +{ +PCMachineClass *pcmc = PC_MACHINE_CLASS(m); +pc_i440fx_machine_options(m); +m->alias = NULL; +m->is_default = false; +compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len); +compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len); +} + DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL, pc_i440fx_5_2_machine_options); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index a3f4959c43..dd14803edb 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m) m->max_cpus = 288; } -static void pc_q35_5_2_machine_options(MachineClass *m) +static void pc_q35_5_3_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_machine_options(m); @@ -352,6 +352,18 @@ static void pc_q35_5_2_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_Q35_MACHINE(v5_3, "pc-q35-5.3", NULL, + pc_q35_5_3_machine_options); + +static void pc_q35_5_2_machine_options(MachineClass *m) +{ +PCMachineClass *pcmc = PC_MACHINE_CLASS(m); +pc_q35_machine_options(m); +m->alias = NULL; +compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len); +compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len); +} + DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL, pc_q35_5_2_machine_options); diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2db810f73a..c292a3edd9 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4511,15 +4511,26 @@ static void spapr_machine_latest_class_options(MachineClass *mc) }\ type_init(spapr_machine_register_##suffix) +/* + * pseries-
Re: [RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message
On 10/30/2020 1:20 AM, Andrew Jones wrote: You need to remove 'Message' from the summary. On Tue, Oct 20, 2020 at 09:14:34PM +0800, Ying Fang wrote: When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled. Signed-off-by: Andrew Jones I guess my s-o-b is here because this is a rework of https://github.com/rhdrjones/qemu/commit/b18d7a889f424b8a8679c43d7f4804fdeeeaf3fd The s-o-b is given since this one is based on your branch. I think it changed enough you could just drop my authorship. A based-on comment in the commit message would be more than enough. Thanks. Will fix it. Hope it won't make you confused. Comment on the patch below. Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index a222981737..fae5a26741 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,14 +57,18 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int cpus) +static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { uint16_t i; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; -for (i = 0; i < cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); +if (possible_cpus->cpus[i].cpu == NULL) { +aml_append(dev, aml_name_decl("_STA", aml_int(0))); +} aml_append(scope, dev); } } @@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const int *irqmap = vms->irqmap; AcpiMadtGenericDistributor *gicd; AcpiMadtGenericMsiFrame *gic_msi; +int possible_cpus = MACHINE(vms)->possible_cpus->len; int i; acpi_data_push(table_data, sizeof(AcpiMultipleApicTable)); @@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { +for (i = 0; i < possible_cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicc->cpu_interface_number = cpu_to_le32(i); gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); gicc->uid = cpu_to_le32(i); -gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +if (i < MACHINE(vms)->smp.cpus) { Shouldn't this be Yes, Stupid mistake. Maybe it was lost when I am doing the rebase. Will fix that. Thanks for your patience in the reply and review. Ying Fang. if (possible_cpus->cpus[i].cpu != NULL) { +gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +} if (arm_feature(>env, ARM_FEATURE_PMU)) { gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ)); @@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, ms->smp.cpus); +acpi_dsdt_add_cpus(scope, vms); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { -- 2.23.0 Thanks, drew .
Question on UEFI ACPI tables setup and probing on arm64
Hi, I have a question on UEFI/ACPI tables setup and probing on arm64 platform. Currently on arm64 platform guest can be booted with both fdt and ACPI supported. If ACPI is enabled, [1] says the only defined method for passing ACPI tables to the kernel is via the UEFI system configuration table. So AFAIK, ACPI Should be dependent on UEFI. What's more [2] says UEFI kernel support on the ARM architectures is only available through a *stub*. The stub populates the FDT /chosen node with some UEFI parameters describing the UEFI location info. So i dump /sys/firmware/fdt from the guest, it does have something like: /dts-v1/; / { #size-cells = <0x02>; #address-cells = <0x02>; chosen { linux,uefi-mmap-desc-ver = <0x01>; linux,uefi-mmap-desc-size = <0x30>; linux,uefi-mmap-size = <0x810>; linux,uefi-mmap-start = <0x04 0x3c0ce018>; linux,uefi-system-table = <0x04 0x3f8b0018>; bootargs = "BOOT_IMAGE=/vmlinuz-4.19.90-2003.4.0.0036.oe1.aarch64 root=/dev/mapper/openeuler-root ro rd.lvm.lv=openeuler/root rd.lvm.lv=openeuler/swap video=VGA-1:640x480-32@60me smmu.bypassdev=0x1000:0x17 smmu.bypassdev=0x1000:0x15 crashkernel=1024M,high video=efifb:off video=VGA-1:640x480-32@60me"; linux,initrd-end = <0x04 0x3a85a5da>; linux,initrd-start = <0x04 0x392f2000>; }; }; But the question is that I did not see any code adding the uefi in fdt chosen node in *arm_load_dtb* or anywhere else. Qemu only maps the OVMF binary file into a pflash device. So I'm really confused on how UEFI information is provided to guest by qemu. Does anybody know of the details about it ? [1] https://www.kernel.org/doc/html/latest/arm64/arm-acpi.html [2] https://www.kernel.org/doc/Documentation/arm/uefi.rst Thanks. Ying
[RFC PATCH v2 13/13] hw/arm/virt-acpi-build: Enable CPU cache topology
A helper struct AcpiCacheOffset is introduced to describe the offset of three level caches. The cache hierarchy is built according to ACPI spec v6.3 5.2.29.2. Let's enable CPU cache topology now. Signed-off-by: Ying Fang --- hw/acpi/aml-build.c | 19 +- hw/arm/virt-acpi-build.c| 52 - include/hw/acpi/acpi-defs.h | 6 + include/hw/acpi/aml-build.h | 7 ++--- 4 files changed, 68 insertions(+), 16 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 6f0e8df49b..f449fa27e7 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1799,27 +1799,32 @@ void build_cache_hierarchy(GArray *tbl, /* * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) */ -void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +void build_socket_hierarchy(GArray *tbl, uint32_t parent, +uint32_t offset, uint32_t id) { build_append_byte(tbl, 0); /* Type 0 - processor */ -build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_byte(tbl, 24); /* Length, no private resources */ build_append_int_noprefix(tbl, 0, 2); /* Reserved */ build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ build_append_int_noprefix(tbl, parent, 4); /* Parent */ build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ -build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, 1, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, offset, 4); /* Private resources */ } -void build_processor_hierarchy(GArray *tbl, uint32_t flags, - uint32_t parent, uint32_t id) +void build_processor_hierarchy(GArray *tbl, uint32_t flags, uint32_t parent, + AcpiCacheOffset offset, uint32_t id) { build_append_byte(tbl, 0); /* Type 0 - processor */ -build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_byte(tbl, 32); /* Length, no private resources */ build_append_int_noprefix(tbl, 0, 2); /* Reserved */ build_append_int_noprefix(tbl, flags, 4); /* Flags */ build_append_int_noprefix(tbl, parent, 4); /* Parent */ build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ -build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, 3, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, offset.l1d_offset, 4);/* Private resources */ +build_append_int_noprefix(tbl, offset.l1i_offset, 4);/* Private resources */ +build_append_int_noprefix(tbl, offset.l2_offset, 4); /* Private resources */ } void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index e1f3ea50ad..8a026ba24e 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -429,29 +429,69 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } -static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +static inline void arm_acpi_cache_info(CPUCacheInfo *cpu_cache, + AcpiCacheInfo *acpi_cache) { +acpi_cache->size = cpu_cache->size; +acpi_cache->sets = cpu_cache->sets; +acpi_cache->associativity = cpu_cache->associativity; +acpi_cache->attributes = cpu_cache->attributes; +acpi_cache->line_size = cpu_cache->line_size; +} + +static void build_pptt(GArray *table_data, BIOSLinker *linker, + VirtMachineState *vms) +{ +MachineState *ms = MACHINE(vms); int pptt_start = table_data->len; int uid = 0, cpus = 0, socket; unsigned int smp_cores = ms->smp.cores; unsigned int smp_threads = ms->smp.threads; +AcpiCacheOffset offset; +ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpus)); +AcpiCacheInfo cache_info; acpi_data_push(table_data, sizeof(AcpiTableHeader)); for (socket = 0; cpus < ms->possible_cpus->len; socket++) { -uint32_t socket_offset = table_data->len - pptt_start; +uint32_t l3_offset = table_data->len - pptt_start; +uint32_t socket_offset; int core; -build_socket_hierarchy(table_data, 0, socket); +/* L3 cache type structure */ +arm_acpi_cache_info(cpu->caches.l3_cache, _info); +build_cache_hierarchy(table_data, 0, _info); + +socket_offset = table_data->len - pptt_start; +build_socket_hierarchy(table_data, 0, l3_offset, socket); for (core = 0; core < smp_cores; core++) { uint32_t core_offset = table_data->len - pptt_start; int th
[RFC PATCH v2 04/13] device_tree: Add qemu_fdt_add_path
From: Andrew Jones qemu_fdt_add_path() works like qemu_fdt_add_subnode(), except it also adds any missing parent nodes. We also tweak an error message of qemu_fdt_add_subnode(). We'll make use of the new function in a coming patch. Signed-off-by: Andrew Jones --- device_tree.c| 45 ++-- include/sysemu/device_tree.h | 1 + 2 files changed, 44 insertions(+), 2 deletions(-) diff --git a/device_tree.c b/device_tree.c index b335dae707..c080909bb9 100644 --- a/device_tree.c +++ b/device_tree.c @@ -515,8 +515,8 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) retval = fdt_add_subnode(fdt, parent, basename); if (retval < 0) { -error_report("FDT: Failed to create subnode %s: %s", name, - fdt_strerror(retval)); +error_report("%s: Failed to create subnode %s: %s", + __func__, name, fdt_strerror(retval)); exit(1); } @@ -524,6 +524,47 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +/* + * Like qemu_fdt_add_subnode(), but will add all missing + * subnodes in the path. + */ +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *dupname, *basename, *p; +int parent, retval = -1; + +if (path[0] != '/') { +return retval; +} + +parent = fdt_path_offset(fdt, "/"); +p = dupname = g_strdup(path); + +while (p) { +*p = '/'; +basename = p + 1; +p = strchr(p + 1, '/'); +if (p) { +*p = '\0'; +} +retval = fdt_path_offset(fdt, dupname); +if (retval < 0 && retval != -FDT_ERR_NOTFOUND) { +error_report("%s: Invalid path %s: %s", + __func__, path, fdt_strerror(retval)); +exit(1); +} else if (retval == -FDT_ERR_NOTFOUND) { +retval = fdt_add_subnode(fdt, parent, basename); +if (retval < 0) { +break; +} +} +parent = retval; +} + +g_free(dupname); +return retval; +} + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb"); diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ -- 2.23.0
[RFC PATCH v2 03/13] hw/arm/virt: Replace smp_parse with one that prefers cores
From: Andrew Jones The virt machine type has never used the CPU topology parameters, other than number of online CPUs and max CPUs. When choosing how to allocate those CPUs the default has been to assume cores. In preparation for using the other CPU topology parameters let's use an smp_parse that prefers cores over sockets. We can also enforce the topology matches max_cpus check because we have no legacy to preserve. Signed-off-by: Andrew Jones --- hw/arm/virt.c | 76 +++ 1 file changed, 76 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ea24b576c6..ba902b53ba 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -78,6 +78,8 @@ #include "hw/virtio/virtio-iommu.h" #include "hw/char/pl011.h" #include "qemu/guest-random.h" +#include "qapi/qmp/qerror.h" +#include "sysemu/replay.h" #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \ static void virt_##major##_##minor##_class_init(ObjectClass *oc, \ @@ -2444,6 +2446,79 @@ static int virt_kvm_type(MachineState *ms, const char *type_str) return requested_pa_size > 40 ? requested_pa_size : 0; } +/* + * Unlike smp_parse() in hw/core/machine.c, we prefer cores over sockets, + * e.g. '-smp 8' creates 1 socket with 8 cores. Whereas '-smp 8' with + * hw/core/machine.c's smp_parse() creates 8 sockets, each with 1 core. + * Additionally, we can enforce the topology matches max_cpus check, + * because we have no legacy to preserve. + */ +static void virt_smp_parse(MachineState *ms, QemuOpts *opts) +{ +if (opts) { +unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); +unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); +unsigned cores = qemu_opt_get_number(opts, "cores", 0); +unsigned threads = qemu_opt_get_number(opts, "threads", 0); + +/* + * Compute missing values; prefer cores over sockets and + * sockets over threads. + */ +if (cpus == 0 || cores == 0) { +sockets = sockets > 0 ? sockets : 1; +threads = threads > 0 ? threads : 1; +if (cpus == 0) { +cores = cores > 0 ? cores : 1; +cpus = cores * threads * sockets; +} else { +ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); +cores = ms->smp.max_cpus / (sockets * threads); +} +} else if (sockets == 0) { +threads = threads > 0 ? threads : 1; +sockets = cpus / (cores * threads); +sockets = sockets > 0 ? sockets : 1; +} else if (threads == 0) { +threads = cpus / (cores * sockets); +threads = threads > 0 ? threads : 1; +} else if (sockets * cores * threads < cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u) < " + "smp_cpus (%u)", + sockets, cores, threads, cpus); +exit(1); +} + +ms->smp.max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus); + +if (ms->smp.max_cpus < cpus) { +error_report("maxcpus must be equal to or greater than smp"); +exit(1); +} + +if (sockets * cores * threads != ms->smp.max_cpus) { +error_report("cpu topology: " + "sockets (%u) * cores (%u) * threads (%u)" + "!= maxcpus (%u)", + sockets, cores, threads, + ms->smp.max_cpus); +exit(1); +} + +ms->smp.cpus = cpus; +ms->smp.cores = cores; +ms->smp.threads = threads; +ms->smp.sockets = sockets; +} + +if (ms->smp.cpus > 1) { +Error *blocker = NULL; +error_setg(, QERR_REPLAY_NOT_SUPPORTED, "smp"); +replay_add_blocker(blocker); +} +} + static void virt_machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -2469,6 +2544,7 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) mc->cpu_index_to_instance_props = virt_cpu_index_to_props; mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15"); mc->get_default_cpu_node_id = virt_get_default_cpu_node_id; +mc->smp_parse = virt_smp_parse; mc->kvm_type = virt_kvm_type; assert(!mc->get_hotplug_handler); mc->get_hotplug_handler = virt_machine_get_hotplug_handler; -- 2.23.0
[RFC PATCH v2 07/13] hw/arm/virt-acpi-build: distinguish possible and present cpus Message
When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index a222981737..fae5a26741 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,14 +57,18 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int cpus) +static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { uint16_t i; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; -for (i = 0; i < cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); +if (possible_cpus->cpus[i].cpu == NULL) { +aml_append(dev, aml_name_decl("_STA", aml_int(0))); +} aml_append(scope, dev); } } @@ -470,6 +474,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const int *irqmap = vms->irqmap; AcpiMadtGenericDistributor *gicd; AcpiMadtGenericMsiFrame *gic_msi; +int possible_cpus = MACHINE(vms)->possible_cpus->len; int i; acpi_data_push(table_data, sizeof(AcpiMultipleApicTable)); @@ -480,7 +485,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { +for (i = 0; i < possible_cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -495,7 +500,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicc->cpu_interface_number = cpu_to_le32(i); gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); gicc->uid = cpu_to_le32(i); -gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +if (i < MACHINE(vms)->smp.cpus) { +gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +} if (arm_feature(>env, ARM_FEATURE_PMU)) { gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ)); @@ -599,7 +606,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, ms->smp.cpus); +acpi_dsdt_add_cpus(scope, vms); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { -- 2.23.0
[RFC PATCH v2 05/13] hw: add compat machines for 5.3
Add 5.2 machine types for arm/i440fx/q35/s390x/spapr. Signed-off-by: Ying Fang --- hw/arm/virt.c | 9 - hw/core/machine.c | 3 +++ hw/i386/pc.c | 3 +++ hw/i386/pc_piix.c | 15 ++- hw/i386/pc_q35.c | 14 +- hw/ppc/spapr.c | 15 +-- hw/s390x/s390-virtio-ccw.c | 14 +- include/hw/boards.h| 3 +++ include/hw/i386/pc.h | 3 +++ 9 files changed, 73 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ba902b53ba..ff8a14439e 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2665,10 +2665,17 @@ static void machvirt_machine_init(void) } type_init(machvirt_machine_init); +static void virt_machine_5_3_options(MachineClass *mc) +{ +} +DEFINE_VIRT_MACHINE_AS_LATEST(5, 3) + static void virt_machine_5_2_options(MachineClass *mc) { +virt_machine_5_3_options(mc); +compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); } -DEFINE_VIRT_MACHINE_AS_LATEST(5, 2) +DEFINE_VIRT_MACHINE(5, 2) static void virt_machine_5_1_options(MachineClass *mc) { diff --git a/hw/core/machine.c b/hw/core/machine.c index 7e2f4ec08e..6dc77699a9 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -28,6 +28,9 @@ #include "hw/mem/nvdimm.h" #include "migration/vmstate.h" +GlobalProperty hw_compat_5_2[] = { }; +const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2); + GlobalProperty hw_compat_5_1[] = { { "vhost-scsi", "num_queues", "1"}, { "vhost-user-blk", "num-queues", "1"}, diff --git a/hw/i386/pc.c b/hw/i386/pc.c index e87be5d29a..eaa046ff5d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -97,6 +97,9 @@ #include "trace.h" #include CONFIG_DEVICES +GlobalProperty pc_compat_5_2[] = { }; +const size_t pc_compat_5_2_len = G_N_ELEMENTS(pc_compat_5_2); + GlobalProperty pc_compat_5_1[] = { { "ICH9-LPC", "x-smi-cpu-hotplug", "off" }, }; diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 3c2ae0612b..01254090ce 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -426,7 +426,7 @@ static void pc_i440fx_machine_options(MachineClass *m) machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE); } -static void pc_i440fx_5_2_machine_options(MachineClass *m) +static void pc_i440fx_5_3_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_i440fx_machine_options(m); @@ -435,6 +435,19 @@ static void pc_i440fx_5_2_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_I440FX_MACHINE(v5_3, "pc-i440fx-5.3", NULL, + pc_i440fx_5_3_machine_options); + +static void pc_i440fx_5_2_machine_options(MachineClass *m) +{ +PCMachineClass *pcmc = PC_MACHINE_CLASS(m); +pc_i440fx_machine_options(m); +m->alias = NULL; +m->is_default = false; +compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len); +compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len); +} + DEFINE_I440FX_MACHINE(v5_2, "pc-i440fx-5.2", NULL, pc_i440fx_5_2_machine_options); diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index a3f4959c43..dd14803edb 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -344,7 +344,7 @@ static void pc_q35_machine_options(MachineClass *m) m->max_cpus = 288; } -static void pc_q35_5_2_machine_options(MachineClass *m) +static void pc_q35_5_3_machine_options(MachineClass *m) { PCMachineClass *pcmc = PC_MACHINE_CLASS(m); pc_q35_machine_options(m); @@ -352,6 +352,18 @@ static void pc_q35_5_2_machine_options(MachineClass *m) pcmc->default_cpu_version = 1; } +DEFINE_Q35_MACHINE(v5_3, "pc-q35-5.3", NULL, + pc_q35_5_3_machine_options); + +static void pc_q35_5_2_machine_options(MachineClass *m) +{ +PCMachineClass *pcmc = PC_MACHINE_CLASS(m); +pc_q35_machine_options(m); +m->alias = NULL; +compat_props_add(m->compat_props, hw_compat_5_2, hw_compat_5_2_len); +compat_props_add(m->compat_props, pc_compat_5_2, pc_compat_5_2_len); +} + DEFINE_Q35_MACHINE(v5_2, "pc-q35-5.2", NULL, pc_q35_5_2_machine_options); diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2db810f73a..c292a3edd9 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4511,15 +4511,26 @@ static void spapr_machine_latest_class_options(MachineClass *mc) }\ type_init(spapr_machine_register_##suffix) +/* + * pseries-5.3 + */ +static void spapr_machine_5_3_class_options(MachineClass *mc) +{ +/* Defaults for the latest behaviour inherited from the base class */ +} + +DEFINE_SPAPR_MACHINE(5_3, "5.3", true); + /* * pseries-5.2 */ static void spapr
[RFC PATCH v2 09/13] hw/arm/virt-acpi-build: add PPTT table
Add the Processor Properties Topology Table (PPTT) to present CPU topology information to the guest. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 42 1 file changed, 42 insertions(+) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index fae5a26741..e1f3ea50ad 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -429,6 +429,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } +static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +{ +int pptt_start = table_data->len; +int uid = 0, cpus = 0, socket; +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; + +acpi_data_push(table_data, sizeof(AcpiTableHeader)); + +for (socket = 0; cpus < ms->possible_cpus->len; socket++) { +uint32_t socket_offset = table_data->len - pptt_start; +int core; + +build_socket_hierarchy(table_data, 0, socket); + +for (core = 0; core < smp_cores; core++) { +uint32_t core_offset = table_data->len - pptt_start; +int thread; + +if (smp_threads <= 1) { +build_processor_hierarchy(table_data, 2, socket_offset, uid++); + } else { +build_processor_hierarchy(table_data, 0, socket_offset, core); +for (thread = 0; thread < smp_threads; thread++) { +build_smt_hierarchy(table_data, core_offset, uid++); +} + } +} +cpus += smp_cores * smp_threads; +} + +build_header(linker, table_data, + (void *)(table_data->data + pptt_start), "PPTT", + table_data->len - pptt_start, 2, NULL, NULL); +} + /* GTDT */ static void build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -669,6 +705,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) unsigned dsdt, xsdt; GArray *tables_blob = tables->table_data; MachineState *ms = MACHINE(vms); +bool cpu_topology_enabled = !vmc->ignore_cpu_topology; table_offsets = g_array_new(false, true /* clear */, sizeof(uint32_t)); @@ -688,6 +725,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); +if (cpu_topology_enabled) { +acpi_add_table(table_offsets, tables_blob); +build_pptt(tables_blob, tables->linker, ms); +} + acpi_add_table(table_offsets, tables_blob); build_gtdt(tables_blob, tables->linker, vms); -- 2.23.0
[RFC PATCH v2 12/13] hw/acpi/aml-build: build ACPI CPU cache hierarchy information
To build cache information, An AcpiCacheInfo structure is defined to hold the Type 1 cache structure according to ACPI spec v6.3 5.2.29.2. A helper function build_cache_hierarchy is introduced to encode the cache information. Signed-off-by: Ying Fang --- hw/acpi/aml-build.c | 26 ++ include/hw/acpi/acpi-defs.h | 8 include/hw/acpi/aml-build.h | 3 +++ 3 files changed, 37 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index da3b41b514..6f0e8df49b 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1770,6 +1770,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* ACPI 6.3: 5.29.2 Cache type structure (Type 1) */ +static void build_cache_head(GArray *tbl, uint32_t next_level) +{ +build_append_byte(tbl, 1); +build_append_byte(tbl, 24); +build_append_int_noprefix(tbl, 0, 2); +build_append_int_noprefix(tbl, 0x7f, 4); +build_append_int_noprefix(tbl, next_level, 4); +} + +static void build_cache_tail(GArray *tbl, AcpiCacheInfo *cache_info) +{ +build_append_int_noprefix(tbl, cache_info->size, 4); +build_append_int_noprefix(tbl, cache_info->sets, 4); +build_append_byte(tbl, cache_info->associativity); +build_append_byte(tbl, cache_info->attributes); +build_append_int_noprefix(tbl, cache_info->line_size, 2); +} + +void build_cache_hierarchy(GArray *tbl, + uint32_t next_level, AcpiCacheInfo *cache_info) +{ +build_cache_head(tbl, next_level); +build_cache_tail(tbl, cache_info); +} + /* * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) */ diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index 38a42f409a..3df38ab449 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -618,4 +618,12 @@ struct AcpiIortRC { } QEMU_PACKED; typedef struct AcpiIortRC AcpiIortRC; +typedef struct AcpiCacheInfo { +uint32_t size; +uint32_t sets; +uint8_t associativity; +uint8_t attributes; +uint16_t line_size; +} AcpiCacheInfo; + #endif diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 56474835a7..01078753a8 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -437,6 +437,9 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_cache_hierarchy(GArray *tbl, + uint32_t next_level, AcpiCacheInfo *cache_info); + void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); void build_processor_hierarchy(GArray *tbl, uint32_t flags, -- 2.23.0
[RFC PATCH v2 08/13] hw/acpi/aml-build: add processor hierarchy node structure
Add the processor hierarchy node structures to build ACPI information for CPU topology. Three helpers are introduced: (1) build_socket_hierarchy for socket description structure (2) build_processor_hierarchy for processor description structure (3) build_smt_hierarchy for thread (logic processor) description structure Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 37 + include/hw/acpi/aml-build.h | 7 +++ 2 files changed, 44 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 3792ba96ce..da3b41b514 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1770,6 +1770,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0);/* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, add private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index fe0055fffb..56474835a7 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -437,6 +437,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id); -- 2.23.0
[RFC PATCH v2 06/13] hw/arm/virt: DT: add cpu-map
From: Andrew Jones Support devicetree CPU topology descriptions. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt.c | 40 +++- include/hw/arm/virt.h | 1 + 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ff8a14439e..d23b941020 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -351,9 +351,10 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) int cpu; int addr_cells = 1; const MachineState *ms = MACHINE(vms); +VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); /* - * From Documentation/devicetree/bindings/arm/cpus.txt + * See Linux Documentation/devicetree/bindings/arm/cpus.yaml * On ARM v8 64-bit systems value should be set to 2, * that corresponds to the MPIDR_EL1 register size. * If MPIDR_EL1[63:32] value is equal to 0 on all CPUs @@ -407,8 +408,42 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) ms->possible_cpus->cpus[cs->cpu_index].props.node_id); } +if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) { +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +} + g_free(nodename); } + +if (ms->smp.cpus > 1 && !vmc->ignore_cpu_topology) { +/* + * See Linux Documentation/devicetree/bindings/cpu/cpu-topology.txt + */ +qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map"); + +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { +char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu); +char *map_path; + +if (ms->smp.threads > 1) { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d/%s%d", +"cluster", cpu / (ms->smp.cores * ms->smp.threads), +"core", (cpu / ms->smp.threads) % ms->smp.cores, +"thread", cpu % ms->smp.threads); +} else { +map_path = g_strdup_printf( +"/cpus/cpu-map/%s%d/%s%d", +"cluster", cpu / ms->smp.cores, +"core", cpu % ms->smp.cores); +} +qemu_fdt_add_path(vms->fdt, map_path); +qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path); +g_free(map_path); +g_free(cpu_path); +} +} } static void fdt_add_its_gic_node(VirtMachineState *vms) @@ -2672,8 +2707,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 3) static void virt_machine_5_2_options(MachineClass *mc) { +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc)); + virt_machine_5_3_options(mc); compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); +vmc->ignore_cpu_topology = true; } DEFINE_VIRT_MACHINE(5, 2) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 010f24f580..917bd8b645 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -118,6 +118,7 @@ typedef enum VirtGICType { struct VirtMachineClass { MachineClass parent; bool disallow_affinity_adjustment; +bool ignore_cpu_topology; bool no_its; bool no_pmu; bool claim_edge_triggered_timers; -- 2.23.0
[RFC PATCH v2 11/13] hw/arm/virt: add fdt cache information
Support devicetree CPU cache information descriptions Signed-off-by: Ying Fang --- hw/arm/virt.c | 92 +++ 1 file changed, 92 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index d23b941020..adcfa52854 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -346,6 +346,89 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags); } +static void fdt_add_l3cache_nodes(const VirtMachineState *vms) +{ +int i; +const MachineState *ms = MACHINE(vms); +ARMCPU *cpu = ARM_CPU(first_cpu); +unsigned int smp_cores = ms->smp.cores; +unsigned int sockets = ms->smp.max_cpus / smp_cores; + +for (i = 0; i < sockets; i++) { +char *nodename = g_strdup_printf("/cpus/l3-cache%d", i); +qemu_fdt_add_subnode(vms->fdt, nodename); +qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache"); +qemu_fdt_setprop_string(vms->fdt, nodename, "cache-unified", "true"); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-level", 3); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size", + cpu->caches.l3_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size", + cpu->caches.l3_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets", + cpu->caches.l3_cache->sets); +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +g_free(nodename); +} +} + +static void fdt_add_l2cache_nodes(const VirtMachineState *vms) +{ +int i, j; +const MachineState *ms = MACHINE(vms); +unsigned int smp_cores = ms->smp.cores; +signed int sockets = ms->smp.max_cpus / smp_cores; +ARMCPU *cpu = ARM_CPU(first_cpu); + +for (i = 0; i < sockets; i++) { +char *next_path = g_strdup_printf("/cpus/l3-cache%d", i); +for (j = 0; j < smp_cores; j++) { +char *nodename = g_strdup_printf("/cpus/l2-cache%d", + i * smp_cores + j); +qemu_fdt_add_subnode(vms->fdt, nodename); +qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache"); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size", + cpu->caches.l2_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size", + cpu->caches.l2_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets", + cpu->caches.l2_cache->sets); +qemu_fdt_setprop_phandle(vms->fdt, nodename, + "next-level-cache", next_path); +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +g_free(nodename); +} +g_free(next_path); +} +} + +static void fdt_add_l1cache_prop(const VirtMachineState *vms, +char *nodename, int cpu_index) +{ + +ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpu_index)); +CPUCaches caches = cpu->caches; + +char *cachename = g_strdup_printf("/cpus/l2-cache%d", cpu_index); + +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-size", + caches.l1d_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-line-size", + caches.l1d_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-sets", + caches.l1d_cache->sets); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-size", + caches.l1i_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-line-size", + caches.l1i_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-sets", + caches.l1i_cache->sets); +qemu_fdt_setprop_phandle(vms->fdt, nodename, "next-level-cache", + cachename); +g_free(cachename); +} + static void fdt_add_cpu_nodes(const VirtMachineState *vms) { int cpu; @@ -379,6 +462,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#s
[RFC PATCH v2 10/13] target/arm/cpu: Add CPU cache description for arm
Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus. A classic three level cache topology is used here. The default cache capacity is given and userspace can overwrite these values. Signed-off-by: Ying Fang --- target/arm/cpu.c | 42 ++ target/arm/cpu.h | 27 +++ 2 files changed, 69 insertions(+) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 056319859f..f1bac7452c 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "cpu.h" #include "internals.h" +#include "qemu/units.h" #include "exec/exec-all.h" #include "hw/qdev-properties.h" #if !defined(CONFIG_USER_ONLY) @@ -997,6 +998,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz) return (Aff1 << ARM_AFF1_SHIFT) | Aff0; } +static CPUCaches default_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x02, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x04, +}, +.l2_cache = &(CPUCacheInfo) { +.type = UNIFIED_CACHE, +.level = 2, +.size = 512 * KiB, +.line_size = 64, +.associativity = 8, +.sets = 1024, +.attributes = 0x0a, +}, +.l3_cache = &(CPUCacheInfo) { +.type = UNIFIED_CACHE, +.level = 3, +.size = 65536 * KiB, +.line_size = 64, +.associativity = 15, +.sets = 2048, +.attributes = 0x0a, +}, +}; + static void cpreg_hashtable_data_destroy(gpointer data) { /* @@ -1841,6 +1881,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp) } } +cpu->caches = default_cache_info; + qemu_init_vcpu(cs); cpu_reset(cs); diff --git a/target/arm/cpu.h b/target/arm/cpu.h index cfff1b5c8f..dbc33a9802 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -746,6 +746,30 @@ typedef enum ARMPSCIState { typedef struct ARMISARegisters ARMISARegisters; +/* Cache information type */ +enum CacheType { +DATA_CACHE, +INSTRUCTION_CACHE, +UNIFIED_CACHE +}; + +typedef struct CPUCacheInfo { +enum CacheType type; /* Cache Type*/ +uint8_t level; +uint32_t size;/* Size in bytes */ +uint16_t line_size; /* Line size in bytes */ +uint8_t associativity;/* Cache associativity */ +uint32_t sets;/* Number of sets */ +uint8_t attributes; /* Cache attributest */ +} CPUCacheInfo; + +typedef struct CPUCaches { +CPUCacheInfo *l1d_cache; +CPUCacheInfo *l1i_cache; +CPUCacheInfo *l2_cache; +CPUCacheInfo *l3_cache; +} CPUCaches; + /** * ARMCPU: * @env: #CPUARMState @@ -987,6 +1011,9 @@ struct ARMCPU { /* Generic timer counter frequency, in Hz */ uint64_t gt_cntfrq_hz; + +/* CPU cache information */ +CPUCaches caches; }; unsigned int gt_cntfrq_period_ns(ARMCPU *cpu); -- 2.23.0
[RFC PATCH v2 01/13] hw/arm/virt: Spell out smp.cpus and smp.max_cpus
From: Andrew Jones Prefer to spell out the smp.cpus and smp.max_cpus machine state variables in order to make grepping easier and to avoid any confusion as to what cpu count is being used where. Signed-off-by: Andrew Jones --- hw/arm/virt-acpi-build.c | 8 +++ hw/arm/virt.c| 51 +++- include/hw/arm/virt.h| 2 +- 3 files changed, 29 insertions(+), 32 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 9747a6458f..a222981737 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -57,11 +57,11 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) +static void acpi_dsdt_add_cpus(Aml *scope, int cpus) { uint16_t i; -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < cpus; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); @@ -480,7 +480,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < vms->smp_cpus; i++) { +for (i = 0; i < MACHINE(vms)->smp.cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -599,7 +599,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, vms->smp_cpus); +acpi_dsdt_add_cpus(scope, ms->smp.cpus); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e465a988d6..0069fa1298 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -322,7 +322,7 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/timer"); @@ -363,7 +363,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) * The simplest way to go is to examine affinity IDs of all our CPUs. If * at least one of them has Aff3 populated, we set #address-cells to 2. */ -for (cpu = 0; cpu < vms->smp_cpus; cpu++) { +for (cpu = 0; cpu < ms->smp.cpus; cpu++) { ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); if (armcpu->mp_affinity & ARM_AFF3_MASK) { @@ -376,7 +376,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#size-cells", 0x0); -for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) { +for (cpu = ms->smp.cpus - 1; cpu >= 0; cpu--) { char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu)); CPUState *cs = CPU(armcpu); @@ -387,7 +387,7 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) armcpu->dtb_compatible); if (vms->psci_conduit != QEMU_PSCI_CONDUIT_DISABLED -&& vms->smp_cpus > 1) { +&& ms->smp.cpus > 1) { qemu_fdt_setprop_string(vms->fdt, nodename, "enable-method", "psci"); } @@ -533,7 +533,7 @@ static void fdt_add_pmu_nodes(const VirtMachineState *vms) if (vms->gic_version == VIRT_GIC_VERSION_2) { irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START, GIC_FDT_IRQ_PPI_CPU_WIDTH, - (1 << vms->smp_cpus) - 1); + (1 << MACHINE(vms)->smp.cpus) - 1); } qemu_fdt_add_subnode(vms->fdt, "/pmu"); @@ -622,14 +622,13 @@ static void create_gic(VirtMachineState *vms) SysBusDevice *gicbusdev; const char *gictype; int type = vms->gic_version, i; -unsigned int smp_cpus = ms->smp.cpus; uint32_t nb_redist_regions = 0; gictype = (type == 3) ? gicv3_class_name() : gic_class_name(); vms->gic = qdev_new(gictype); qdev_prop_set_uint32(vms->gic, "revision", type); -qdev_prop_set_uint32(vms->gic, "num-cpu", smp_cpus); +qdev_prop_set_uint32(vms->gic, "num-cpu", ms->smp.cpus); /* Note that the num-irq property counts both internal and external * interrupts; there are always 32 of the former (mandated by GIC spec). */ @@ -641,7 +640,7 @@ static void
[RFC PATCH v2 02/13] hw/arm/virt: Remove unused variable
From: Andrew Jones We no longer use the smp_cpus virtual machine state variable. Remove it. Signed-off-by: Andrew Jones --- hw/arm/virt.c | 2 -- include/hw/arm/virt.h | 1 - 2 files changed, 3 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0069fa1298..ea24b576c6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1820,8 +1820,6 @@ static void machvirt_init(MachineState *machine) exit(1); } -vms->smp_cpus = smp_cpus; - if (vms->virt && kvm_enabled()) { error_report("mach-virt: KVM does not support providing " "Virtualization extensions to the guest CPU"); diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 953d94acc0..010f24f580 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -151,7 +151,6 @@ struct VirtMachineState { MemMapEntry *memmap; char *pciehb_nodename; const int *irqmap; -int smp_cpus; void *fdt; int fdt_size; uint32_t clock_phandle; -- 2.23.0
[RFC PATCH v2 00/13] hw/arm/virt: Introduce cpu and cache topology support
An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Cpu cache information may also have slight impact on the sched domain, and even userspace software may check the cpu cache information to do some optimizations. Thus this patch series is posted to provide cpu and cache topology support for arm. Both fdt and ACPI are introduced to present the cpu and cache topology. To describe the cpu topology via ACPI, a PPTT table is introduced according to the processor hierarchy node structure. To describe the cpu cache information, a default cache hierarchy is given and built according to the cache type structure defined by ACPI, it can be made configurable later. The RFC v1 was posted at [1], we tried to map the MPIDR register into cpu topology, however it is totally wrong. Andrew points it out that Linux kernel is goint to stop using MPIDR for topology information [2]. The root cause is the MPIDR register has been abused by ARM OEM manufactures. It is only used as an identifer for a specific cpu, not representation of the topology. Moreover this v2 is rebased on Andrew's latest branch shared [4]. This patch series was initially based on the patches posted by Andrew Jones [3]. I jumped in on it since some OS vendor cooperative partner are eager for it. Thanks for Andrew's contribution. After applying this patch series, launch a guest with virt-5.3 and cpu topology configured with sockets:cores:threads = 2:4:2, you will get the bellow messages with the lscpu command. Architecture:aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 2 NUMA node(s):2 Vendor ID: HiSilicon Model: 0 Model name: Kunpeng-920 Stepping:0x1 BogoMIPS:200.00 L1d cache: 512 KiB L1i cache: 512 KiB L2 cache:4 MiB L3 cache:128 MiB NUMA node0 CPU(s): 0-7 NUMA node1 CPU(s): 8-15 changelog v1 -> v2: * Rebased to the latest branch shared by Andrew Jones [4] * Stop mapping MPIDR into vcpu topology [1] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg06027.html [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20200829130016.26106-1-valentin.schnei...@arm.com/ [3] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com [4] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh Andrew Jones (5): hw/arm/virt: Spell out smp.cpus and smp.max_cpus hw/arm/virt: Remove unused variable hw/arm/virt: Replace smp_parse with one that prefers cores device_tree: Add qemu_fdt_add_path hw/arm/virt: DT: add cpu-map Ying Fang (8): hw: add compat machines for 5.3 hw/arm/virt-acpi-build: distinguish possible and present cpus Message hw/acpi/aml-build: add processor hierarchy node structure hw/arm/virt-acpi-build: add PPTT table target/arm/cpu: Add CPU cache description for arm hw/arm/virt: add fdt cache information hw/acpi/aml-build: build ACPI CPU cache hierarchy information hw/arm/virt-acpi-build: Enable CPU cache topology device_tree.c| 45 +- hw/acpi/aml-build.c | 68 + hw/arm/virt-acpi-build.c | 99 - hw/arm/virt.c| 270 +++ hw/core/machine.c| 3 + hw/i386/pc.c | 3 + hw/i386/pc_piix.c| 15 +- hw/i386/pc_q35.c | 14 +- hw/ppc/spapr.c | 15 +- hw/s390x/s390-virtio-ccw.c | 14 +- include/hw/acpi/acpi-defs.h | 14 ++ include/hw/acpi/aml-build.h | 11 ++ include/hw/arm/virt.h| 4 +- include/hw/boards.h | 3 + include/hw/i386/pc.h | 3 + include/sysemu/device_tree.h | 1 + target/arm/cpu.c | 42 ++ target/arm/cpu.h | 27 18 files changed, 606 insertions(+), 45 deletions(-) -- 2.23.0
Re: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support
On 10/16/2020 6:07 PM, Andrew Jones wrote: On Fri, Oct 16, 2020 at 05:40:02PM +0800, Ying Fang wrote: On 10/15/2020 3:59 PM, Andrew Jones wrote: On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote: On 10/14/2020 2:08 AM, Andrew Jones wrote: On Tue, Oct 13, 2020 at 12:11:20PM +, Zengtao (B) wrote: Cc valentin -Original Message- From: Qemu-devel [mailto:qemu-devel-bounces+prime.zeng=hisilicon@nongnu.org] On Behalf Of Ying Fang Sent: Thursday, September 17, 2020 11:20 AM To: qemu-devel@nongnu.org Cc: peter.mayd...@linaro.org; drjo...@redhat.com; Zhanghailiang; Chenzhendong (alex); shannon.zha...@gmail.com; qemu-...@nongnu.org; alistair.fran...@wdc.com; fangying; imamm...@redhat.com Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Cpu cache information may also have slight impact on the sched domain, and even userspace software may check the cpu cache information to do some optimizations. Thus this patch series is posted to provide cpu and cache topology support for arm. To make the cpu topology consistent with MPIDR, an vcpu ioctl For aarch64, the cpu topology don't depends on the MPDIR. See https://patchwork.kernel.org/patch/11744387/ The topology should not be inferred from the MPIDR Aff fields, MPIDR is abused by ARM OEM manufactures. It is only used as a identifer for a specific cpu, not representation of the topology. Right, which is why I stated topology should not be inferred from it. but MPIDR is the CPU identifier. When describing a topology with ACPI or DT the CPU elements in the topology description must map to actual CPUs. MPIDR is that mapping link. KVM currently determines what the MPIDR of a VCPU is. If KVM KVM currently assigns MPIDR with vcpu->vcpu_id which mapped into affinity levels. See reset_mpidr in sys_regs.c I know, but how KVM assigns MPIDRs today is not really important to KVM userspace. KVM userspace shouldn't depend on a KVM algorithm, as it could change. userspace is going to determine the VCPU topology, then it also needs control over the MPIDR values, otherwise it becomes quite messy trying to get the mapping right. If we are going to control MPIDR, shall we assign MPIDR with vcpu_id or map topology hierarchy into affinity levels or any other link schema ? We can assign them to whatever we want, as long as they're unique and as long as Aff0 is assigned per the GIC requirements, e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when pinning VCPUs to PCPUs we should ensure that MPIDRs with matching Aff3,Aff2,Aff1 fields should actually be peers with respect to the GIC. Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity. Maybe I should read spec for GICv3. Look at how IPIs are efficiently sent to "peers", where the definition of a peer is that only Aff0 differs in its MPIDR. But, gicv3's optimizations can only handle 16 peers. If we want pinned VCPUs to have the same performance as PCPUS, then we should maintain this Aff0 limit. Yes I see. I think *virt_cpu_mp_affinity* in qemu has limit on the clustersz. It groups every 16 vCPUs into a cluster and then mapped into the first two affinity levels. Thanks. Ying. Thanks, drew We shouldn't try to encode topology in the MPIDR in any way, so we might as well simply increment a counter to assign them, which could possibly be the same as the VCPU ID. Hmm, then we can leave it as it is. Thanks, drew . .
Re: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support
On 10/15/2020 3:59 PM, Andrew Jones wrote: On Thu, Oct 15, 2020 at 10:07:16AM +0800, Ying Fang wrote: On 10/14/2020 2:08 AM, Andrew Jones wrote: On Tue, Oct 13, 2020 at 12:11:20PM +, Zengtao (B) wrote: Cc valentin -Original Message- From: Qemu-devel [mailto:qemu-devel-bounces+prime.zeng=hisilicon@nongnu.org] On Behalf Of Ying Fang Sent: Thursday, September 17, 2020 11:20 AM To: qemu-devel@nongnu.org Cc: peter.mayd...@linaro.org; drjo...@redhat.com; Zhanghailiang; Chenzhendong (alex); shannon.zha...@gmail.com; qemu-...@nongnu.org; alistair.fran...@wdc.com; fangying; imamm...@redhat.com Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Cpu cache information may also have slight impact on the sched domain, and even userspace software may check the cpu cache information to do some optimizations. Thus this patch series is posted to provide cpu and cache topology support for arm. To make the cpu topology consistent with MPIDR, an vcpu ioctl For aarch64, the cpu topology don't depends on the MPDIR. See https://patchwork.kernel.org/patch/11744387/ The topology should not be inferred from the MPIDR Aff fields, MPIDR is abused by ARM OEM manufactures. It is only used as a identifer for a specific cpu, not representation of the topology. Right, which is why I stated topology should not be inferred from it. but MPIDR is the CPU identifier. When describing a topology with ACPI or DT the CPU elements in the topology description must map to actual CPUs. MPIDR is that mapping link. KVM currently determines what the MPIDR of a VCPU is. If KVM KVM currently assigns MPIDR with vcpu->vcpu_id which mapped into affinity levels. See reset_mpidr in sys_regs.c I know, but how KVM assigns MPIDRs today is not really important to KVM userspace. KVM userspace shouldn't depend on a KVM algorithm, as it could change. userspace is going to determine the VCPU topology, then it also needs control over the MPIDR values, otherwise it becomes quite messy trying to get the mapping right. If we are going to control MPIDR, shall we assign MPIDR with vcpu_id or map topology hierarchy into affinity levels or any other link schema ? We can assign them to whatever we want, as long as they're unique and as long as Aff0 is assigned per the GIC requirements, e.g. GICv3 requires that Aff0 be from 0 to 0xf. Also, when pinning VCPUs to PCPUs we should ensure that MPIDRs with matching Aff3,Aff2,Aff1 fields should actually be peers with respect to the GIC. Still not clear why vCPU's MPIDR need to match pPCPU's GIC affinity. Maybe I should read spec for GICv3. We shouldn't try to encode topology in the MPIDR in any way, so we might as well simply increment a counter to assign them, which could possibly be the same as the VCPU ID. Hmm, then we can leave it as it is. Thanks, drew .
Re: [RFC PATCH v2 0/8] block-backend: Introduce I/O hang
On 10/10/2020 10:27 AM, cenjiahui wrote: Hi Kevin, Could you please spend some time reviewing and commenting on this patch series. Thanks, Jiahui Cen This feature is confirmed effective in a cloud storage environment since it can help to improve the availability without pausing the entire guest. Hope it won't be lost on the thread. Any comments or reviews are welcome. On 2020/9/30 17:45, Jiahui Cen wrote: A VM in the cloud environment may use a virutal disk as the backend storage, and there are usually filesystems on the virtual block device. When backend storage is temporarily down, any I/O issued to the virtual block device will cause an error. For example, an error occurred in ext4 filesystem would make the filesystem readonly. However a cloud backend storage can be soon recovered. For example, an IP-SAN may be down due to network failure and will be online soon after network is recovered. The error in the filesystem may not be recovered unless a device reattach or system restart. So an I/O rehandle is in need to implement a self-healing mechanism. This patch series propose a feature called I/O hang. It can rehandle AIOs with EIO error without sending error back to guest. From guest's perspective of view it is just like an IO is hanging and not returned. Guest can get back running smoothly when I/O is recovred with this feature enabled. v1->v2: * Rebase to fix compile problems. * Fix incorrect remove of rehandle list. * Provide rehandle pause interface. Jiahui Cen (8): block-backend: introduce I/O rehandle info block-backend: rehandle block aios when EIO block-backend: add I/O hang timeout block-backend: add I/O rehandle pause/unpause block-backend: enable I/O hang when timeout is set virtio-blk: pause I/O hang when resetting qemu-option: add I/O hang timeout option qapi: add I/O hang and I/O hang timeout qapi event block/block-backend.c | 300 + blockdev.c | 11 ++ hw/block/virtio-blk.c | 8 + include/sysemu/block-backend.h | 5 + qapi/block-core.json | 26 +++ 5 files changed, 350 insertions(+) .
Re: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support
On 10/14/2020 2:08 AM, Andrew Jones wrote: On Tue, Oct 13, 2020 at 12:11:20PM +, Zengtao (B) wrote: Cc valentin -Original Message- From: Qemu-devel [mailto:qemu-devel-bounces+prime.zeng=hisilicon@nongnu.org] On Behalf Of Ying Fang Sent: Thursday, September 17, 2020 11:20 AM To: qemu-devel@nongnu.org Cc: peter.mayd...@linaro.org; drjo...@redhat.com; Zhanghailiang; Chenzhendong (alex); shannon.zha...@gmail.com; qemu-...@nongnu.org; alistair.fran...@wdc.com; fangying; imamm...@redhat.com Subject: [RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Cpu cache information may also have slight impact on the sched domain, and even userspace software may check the cpu cache information to do some optimizations. Thus this patch series is posted to provide cpu and cache topology support for arm. To make the cpu topology consistent with MPIDR, an vcpu ioctl For aarch64, the cpu topology don't depends on the MPDIR. See https://patchwork.kernel.org/patch/11744387/ The topology should not be inferred from the MPIDR Aff fields, MPIDR is abused by ARM OEM manufactures. It is only used as a identifer for a specific cpu, not representation of the topology. but MPIDR is the CPU identifier. When describing a topology with ACPI or DT the CPU elements in the topology description must map to actual CPUs. MPIDR is that mapping link. KVM currently determines what the MPIDR of a VCPU is. If KVM KVM currently assigns MPIDR with vcpu->vcpu_id which mapped into affinity levels. See reset_mpidr in sys_regs.c userspace is going to determine the VCPU topology, then it also needs control over the MPIDR values, otherwise it becomes quite messy trying to get the mapping right. If we are going to control MPIDR, shall we assign MPIDR with vcpu_id or map topology hierarchy into affinity levels or any other link schema ? Thanks, drew . Thanks Ying.
[RFC PATCH 7/7] qapi: add I/O hang and I/O hang timeout qapi event
Sometimes hypervisor management tools like libvirt may need to monitor I/O hang events. Let's report I/O hang and I/O hang timeout event via qapi. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 3 +++ qapi/block-core.json | 26 ++ 2 files changed, 29 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 95b2d6a679..5dc5b11bcc 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2540,6 +2540,7 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O Hang is recovered */ blk->is_iohang_timeout = false; blk->iohang_time = 0; +qapi_event_send_block_io_hang(false); } break; case BLOCK_IO_HANG_STATUS_HANG: @@ -2547,12 +2548,14 @@ static bool blk_iohang_handle(BlockBackend *blk, int new_status) /* Case when I/O hang is first triggered */ blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; need_rehandle = true; +qapi_event_send_block_io_hang(true); } else { if (!blk->is_iohang_timeout) { now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; if (now >= (blk->iohang_time + blk->iohang_timeout)) { /* Case when I/O hang is timeout */ blk->is_iohang_timeout = true; +qapi_event_send_block_io_hang_timeout(true); } else { /* Case when I/O hang is continued */ need_rehandle = true; diff --git a/qapi/block-core.json b/qapi/block-core.json index 3c16f1e11d..7bdf75c6d7 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -5535,3 +5535,29 @@ { 'command': 'blockdev-snapshot-delete-internal-sync', 'data': { 'device': 'str', '*id': 'str', '*name': 'str'}, 'returns': 'SnapshotInfo' } + +## +# @BLOCK_IO_HANG: +# +# Emitted when device I/O hang trigger event begin or end +# +# @set: true if I/O hang begin; false if I/O hang end. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG', + 'data': { 'set': 'bool' }} + +## +# @BLOCK_IO_HANG_TIMEOUT: +# +# Emitted when device I/O hang timeout event set or clear +# +# @set: true if set; false if clear. +# +# Since: 5.2 +# +## +{ 'event': 'BLOCK_IO_HANG_TIMEOUT', + 'data': { 'set': 'bool' }} -- 2.23.0
[RFC PATCH 2/7] block-backend: rehandle block aios when EIO
When a backend device temporarily does not response, like a network disk down due to some network faults, any IO to the coresponding virtual block device in VM would return I/O error. If the hypervisor returns the error to VM, the filesystem on this block device may not work as usual. And in many situations, the returned error is often an EIO. To avoid this unavailablity, we can store the failed AIOs, and resend them later. If the error is temporary, the retries can succeed and the AIOs can be successfully completed. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- block/block-backend.c | 89 +++ 1 file changed, 89 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index bf104a7cf5..90f1ca5753 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -365,6 +365,12 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) notifier_list_init(>remove_bs_notifiers); notifier_list_init(>insert_bs_notifiers); +/* for rehandle */ +blk->reinfo.enable = false; +blk->reinfo.ts = NULL; +atomic_set(>reinfo.in_flight, 0); +QTAILQ_INIT(>reinfo.re_aios); + QLIST_INIT(>aio_notifiers); QTAILQ_INSERT_TAIL(_backends, blk, link); @@ -1425,8 +1431,16 @@ static const AIOCBInfo blk_aio_em_aiocb_info = { .get_aio_context= blk_aio_em_aiocb_get_aio_context, }; +static void blk_rehandle_timer_cb(void *opaque); +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb); + static void blk_aio_complete(BlkAioEmAIOCB *acb) { +if (acb->rwco.blk->reinfo.enable) { +blk_rehandle_aio_complete(acb); +return; +} + if (acb->has_returned) { acb->common.cb(acb->common.opaque, acb->rwco.ret); blk_dec_in_flight(acb->rwco.blk); @@ -1459,6 +1473,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes, .ret= NOT_DONE, }; acb->bytes = bytes; +acb->co_entry = co_entry; acb->has_returned = false; co = qemu_coroutine_create(co_entry, acb); @@ -2054,6 +2069,20 @@ static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context, throttle_group_attach_aio_context(tgm, new_context); bdrv_drained_end(bs); } + +if (blk->reinfo.enable) { +if (blk->reinfo.ts) { +timer_del(blk->reinfo.ts); +timer_free(blk->reinfo.ts); +} +blk->reinfo.ts = aio_timer_new(new_context, QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, + blk); +if (atomic_read(>reinfo.in_flight)) { +timer_mod(blk->reinfo.ts, + qemu_clock_get_ms(QEMU_CLOCK_REALTIME)); +} +} } blk->ctx = new_context; @@ -2405,6 +2434,66 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter) } } +static void blk_rehandle_insert_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ +assert(blk->reinfo.enable); + +atomic_inc(>reinfo.in_flight); +QTAILQ_INSERT_TAIL(>reinfo.re_aios, acb, list); +timer_mod(blk->reinfo.ts, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + + blk->reinfo.timer_interval_ms); +} + +static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) +{ +QTAILQ_REMOVE(>reinfo.re_aios, acb, list); +atomic_dec(>reinfo.in_flight); +} + +static void blk_rehandle_timer_cb(void *opaque) +{ +BlockBackend *blk = opaque; +BlockBackendRehandleInfo *reinfo = >reinfo; +BlkAioEmAIOCB *acb, *tmp; +Coroutine *co; + +aio_context_acquire(blk_get_aio_context(blk)); +QTAILQ_FOREACH_SAFE(acb, >re_aios, list, tmp) { +if (acb->rwco.ret == NOT_DONE) { +continue; +} + +blk_inc_in_flight(acb->rwco.blk); +acb->rwco.ret = NOT_DONE; +acb->has_returned = false; + +co = qemu_coroutine_create(acb->co_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +blk_rehandle_remove_aiocb(acb->rwco.blk, acb); +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} +} +aio_context_release(blk_get_aio_context(blk)); +} + +static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) +{ +if (acb->has_returned) { +blk_dec_in_flight(acb->rwco.blk); +if (acb->rwco.ret == -EIO) { +blk_rehandle_insert_aiocb(acb->rwco.blk, acb); +return; +} + +acb->common.cb(acb->common.opaque, acb->rwco.ret); +qemu_aio_unref(acb); +} +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); -- 2.23.0
[RFC PATCH 5/7] virtio-blk: disable I/O hang when resetting
All AIOs including the hanging AIOs need to be drained when resetting virtio-blk. So it is necessary to disable I/O hang before resetting and enable I/O hang again after resetting if I/O hang is enabled. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- hw/block/virtio-blk.c | 8 1 file changed, 8 insertions(+) diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 2204ba149e..11837a54f5 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -892,6 +892,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) AioContext *ctx; VirtIOBlockReq *req; +if (blk_iohang_is_enabled(s->blk)) { +blk_rehandle_disable(s->blk); +} + ctx = blk_get_aio_context(s->blk); aio_context_acquire(ctx); blk_drain(s->blk); @@ -909,6 +913,10 @@ static void virtio_blk_reset(VirtIODevice *vdev) assert(!s->dataplane_started); blk_set_enable_write_cache(s->blk, s->original_wce); + +if (blk_iohang_is_enabled(s->blk)) { +blk_rehandle_enable(s->blk); +} } /* coalesce internal state, copy to pci i/o region 0 -- 2.23.0
[RFC PATCH 1/7] block-backend: introduce I/O rehandle info
The I/O hang feature is realized based on a rehandle mechanism. Each block backend will have a list to store hanging block AIOs, and a timer to regularly resend these aios. In order to issue the AIOs again, each block AIOs also need to store its coroutine entry. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 24dd0670d1..bf104a7cf5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -35,6 +35,18 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); +/* block backend rehandle timer interval 5s */ +#define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 + +typedef struct BlockBackendRehandleInfo { +bool enable; +QEMUTimer *ts; +unsigned timer_interval_ms; + +unsigned int in_flight; +QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; +} BlockBackendRehandleInfo; + typedef struct BlockBackendAioNotifier { void (*attached_aio_context)(AioContext *new_context, void *opaque); void (*detach_aio_context)(void *opaque); @@ -95,6 +107,8 @@ struct BlockBackend { * Accessed with atomic ops. */ unsigned int in_flight; + +BlockBackendRehandleInfo reinfo; }; typedef struct BlockBackendAIOCB { @@ -350,6 +364,7 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) qemu_co_queue_init(>queued_requests); notifier_list_init(>remove_bs_notifiers); notifier_list_init(>insert_bs_notifiers); + QLIST_INIT(>aio_notifiers); QTAILQ_INSERT_TAIL(_backends, blk, link); @@ -1392,6 +1407,10 @@ typedef struct BlkAioEmAIOCB { BlkRwCo rwco; int bytes; bool has_returned; + +/* for rehandle */ +CoroutineEntry *co_entry; +QTAILQ_ENTRY(BlkAioEmAIOCB) list; } BlkAioEmAIOCB; static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_) -- 2.23.0
[RFC PATCH 3/7] block-backend: add I/O hang timeout
Not all errors would be fixed, so it is better to add a rehandle timeout for I/O hang. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- block/block-backend.c | 99 +- include/sysemu/block-backend.h | 2 + 2 files changed, 100 insertions(+), 1 deletion(-) diff --git a/block/block-backend.c b/block/block-backend.c index 90f1ca5753..d0b2b59f55 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -38,6 +38,11 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +enum BlockIOHangStatus { +BLOCK_IO_HANG_STATUS_NORMAL = 0, +BLOCK_IO_HANG_STATUS_HANG, +}; + typedef struct BlockBackendRehandleInfo { bool enable; QEMUTimer *ts; @@ -109,6 +114,11 @@ struct BlockBackend { unsigned int in_flight; BlockBackendRehandleInfo reinfo; + +int64_t iohang_timeout; /* The I/O hang timeout value in sec. */ +int64_t iohang_time;/* The I/O hang start time */ +bool is_iohang_timeout; +int iohang_status; }; typedef struct BlockBackendAIOCB { @@ -2480,20 +2490,107 @@ static void blk_rehandle_timer_cb(void *opaque) aio_context_release(blk_get_aio_context(blk)); } +static bool blk_iohang_handle(BlockBackend *blk, int new_status) +{ +int64_t now; +int old_status = blk->iohang_status; +bool need_rehandle = false; + +switch (new_status) { +case BLOCK_IO_HANG_STATUS_NORMAL: +if (old_status == BLOCK_IO_HANG_STATUS_HANG) { +/* Case when I/O Hang is recovered */ +blk->is_iohang_timeout = false; +blk->iohang_time = 0; +} +break; +case BLOCK_IO_HANG_STATUS_HANG: +if (old_status != BLOCK_IO_HANG_STATUS_HANG) { +/* Case when I/O hang is first triggered */ +blk->iohang_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; +need_rehandle = true; +} else { +if (!blk->is_iohang_timeout) { +now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; +if (now >= (blk->iohang_time + blk->iohang_timeout)) { +/* Case when I/O hang is timeout */ +blk->is_iohang_timeout = true; +} else { +/* Case when I/O hang is continued */ +need_rehandle = true; +} +} +} +break; +default: +break; +} + +blk->iohang_status = new_status; +return need_rehandle; +} + +static bool blk_rehandle_aio(BlkAioEmAIOCB *acb, bool *has_timeout) +{ +bool need_rehandle = false; + +/* Rehandle aio which returns EIO before hang timeout */ +if (acb->rwco.ret == -EIO) { +if (acb->rwco.blk->is_iohang_timeout) { +/* I/O hang has timeout and not recovered */ +*has_timeout = true; +} else { +need_rehandle = blk_iohang_handle(acb->rwco.blk, + BLOCK_IO_HANG_STATUS_HANG); +/* I/O hang timeout first trigger */ +if (acb->rwco.blk->is_iohang_timeout) { +*has_timeout = true; +} +} +} + +return need_rehandle; +} + static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) { +bool has_timeout = false; +bool need_rehandle = false; + if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); -if (acb->rwco.ret == -EIO) { +need_rehandle = blk_rehandle_aio(acb, _timeout); +if (need_rehandle) { blk_rehandle_insert_aiocb(acb->rwco.blk, acb); return; } acb->common.cb(acb->common.opaque, acb->rwco.ret); + +/* I/O hang return to normal status */ +if (!has_timeout) { +blk_iohang_handle(acb->rwco.blk, BLOCK_IO_HANG_STATUS_NORMAL); +} + qemu_aio_unref(acb); } } +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) +{ +if (!blk) { +return; +} + +blk->is_iohang_timeout = false; +blk->iohang_time = 0; +blk->iohang_timeout = 0; +blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; +if (iohang_timeout > 0) { +blk->iohang_timeout = iohang_timeout; +} +} + void blk_register_buf(BlockBackend *blk, void *host, size_t size) { bdrv_register_buf(blk_bs(blk), host, size); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8203d7f6f9..bfebe3a960 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -268,4 +268,6 @@ const BdrvChild *blk_root(BlockBackend *blk); int blk_make_empty(BlockBackend *blk, Error **errp); +void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout); + #endif -- 2.23.0
[RFC PATCH 6/7] qemu-option: add I/O hang timeout option
I/O hang timeout should be different under different situations. So it is better to provide an option for user to determine I/O hang timeout for each block device. Signed-off-by: Jiahui Cen Signed-off-by: Ying Fang --- blockdev.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/blockdev.c b/blockdev.c index 7f2561081e..ff8cdcd497 100644 --- a/blockdev.c +++ b/blockdev.c @@ -500,6 +500,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, BlockdevDetectZeroesOptions detect_zeroes = BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF; const char *throttling_group = NULL; +int64_t iohang_timeout = 0; /* Check common options by copying from bs_opts to opts, all other options * stay in bs_opts for processing by bdrv_open(). */ @@ -622,6 +623,12 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts, bs->detect_zeroes = detect_zeroes; +/* init timeout value for I/O Hang */ +iohang_timeout = qemu_opt_get_number(opts, "iohang-timeout", 0); +if (iohang_timeout > 0) { +blk_iohang_init(blk, iohang_timeout); +} + block_acct_setup(blk_get_stats(blk), account_invalid, account_failed); if (!parse_stats_intervals(blk_get_stats(blk), interval_list, errp)) { @@ -3786,6 +3793,10 @@ QemuOptsList qemu_common_drive_opts = { .type = QEMU_OPT_BOOL, .help = "whether to account for failed I/O operations " "in the statistics", +},{ +.name = "iohang-timeout", +.type = QEMU_OPT_NUMBER, +.help = "timeout value for I/O Hang", }, { /* end of list */ } }, -- 2.23.0
[RFC PATCH 0/7] block-backend: Introduce I/O hang
A VM in the cloud environment may use a virutal disk as the backend storage, and there are usually filesystems on the virtual block device. When backend storage is temporarily down, any I/O issued to the virtual block device will cause an error. For example, an error occurred in ext4 filesystem would make the filesystem readonly. However a cloud backend storage can be soon recovered. For example, an IP-SAN may be down due to network failure and will be online soon after network is recovered. The error in the filesystem may not be recovered unless a device reattach or system restart. So an I/O rehandle is in need to implement a self-healing mechanism. This patch series propose a feature called I/O hang. It can rehandle AIOs with EIO error without sending error back to guest. From guest's perspective of view it is just like an IO is hanging and not returned. Guest can get back running smoothly when I/O is recovred with this feature enabled. Ying Fang (7): block-backend: introduce I/O rehandle info block-backend: rehandle block aios when EIO block-backend: add I/O hang timeout block-backend: add I/O hang drain when disbale virtio-blk: disable I/O hang when resetting qemu-option: add I/O hang timeout option qapi: add I/O hang and I/O hang timeout qapi event block/block-backend.c | 285 + blockdev.c | 11 ++ hw/block/virtio-blk.c | 8 + include/sysemu/block-backend.h | 5 + qapi/block-core.json | 26 +++ 5 files changed, 335 insertions(+) -- 2.23.0
[RFC PATCH 4/7] block-backend: add I/O hang drain when disbale
To disable I/O hang, all hanging AIOs need to be drained. A rehandle status field is introduced to notify rehandle mechanism not to rehandle failed AIOs when I/O hang is disabled. Signed-off-by: Ying Fang Signed-off-by: Jiahui Cen --- block/block-backend.c | 85 -- include/sysemu/block-backend.h | 3 ++ 2 files changed, 84 insertions(+), 4 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index d0b2b59f55..95b2d6a679 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -37,6 +37,9 @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb); /* block backend rehandle timer interval 5s */ #define BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL 5000 +#define BLOCK_BACKEND_REHANDLE_NORMAL 1 +#define BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED 2 +#define BLOCK_BACKEND_REHANDLE_DRAINED 3 enum BlockIOHangStatus { BLOCK_IO_HANG_STATUS_NORMAL = 0, @@ -50,6 +53,8 @@ typedef struct BlockBackendRehandleInfo { unsigned int in_flight; QTAILQ_HEAD(, BlkAioEmAIOCB) re_aios; + +int status; } BlockBackendRehandleInfo; typedef struct BlockBackendAioNotifier { @@ -471,6 +476,8 @@ static void blk_delete(BlockBackend *blk) assert(!blk->refcnt); assert(!blk->name); assert(!blk->dev); +assert(atomic_read(>reinfo.in_flight) == 0); +blk_rehandle_disable(blk); if (blk->public.throttle_group_member.throttle_state) { blk_io_limits_disable(blk); } @@ -2460,6 +2467,37 @@ static void blk_rehandle_remove_aiocb(BlockBackend *blk, BlkAioEmAIOCB *acb) atomic_dec(>reinfo.in_flight); } +static void blk_rehandle_drain(BlockBackend *blk) +{ +if (blk_bs(blk)) { +bdrv_drained_begin(blk_bs(blk)); +BDRV_POLL_WHILE(blk_bs(blk), atomic_read(>reinfo.in_flight) > 0); +bdrv_drained_end(blk_bs(blk)); +} +} + +static bool blk_rehandle_is_paused(BlockBackend *blk) +{ +return blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED || + blk->reinfo.status == BLOCK_BACKEND_REHANDLE_DRAINED; +} + +static void blk_rehandle_pause(BlockBackend *blk) +{ +BlockBackendRehandleInfo *reinfo = >reinfo; + +aio_context_acquire(blk_get_aio_context(blk)); +if (!reinfo->enable || reinfo->status == BLOCK_BACKEND_REHANDLE_DRAINED) { +aio_context_release(blk_get_aio_context(blk)); +return; +} + +reinfo->status = BLOCK_BACKEND_REHANDLE_DRAIN_REQUESTED; +blk_rehandle_drain(blk); +reinfo->status = BLOCK_BACKEND_REHANDLE_DRAINED; +aio_context_release(blk_get_aio_context(blk)); +} + static void blk_rehandle_timer_cb(void *opaque) { BlockBackend *blk = opaque; @@ -2559,10 +2597,12 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) if (acb->has_returned) { blk_dec_in_flight(acb->rwco.blk); -need_rehandle = blk_rehandle_aio(acb, _timeout); -if (need_rehandle) { -blk_rehandle_insert_aiocb(acb->rwco.blk, acb); -return; +if (!blk_rehandle_is_paused(acb->rwco.blk)) { +need_rehandle = blk_rehandle_aio(acb, _timeout); +if (need_rehandle) { +blk_rehandle_insert_aiocb(acb->rwco.blk, acb); +return; +} } acb->common.cb(acb->common.opaque, acb->rwco.ret); @@ -2576,6 +2616,42 @@ static void blk_rehandle_aio_complete(BlkAioEmAIOCB *acb) } } +void blk_rehandle_enable(BlockBackend *blk) +{ +BlockBackendRehandleInfo *reinfo = >reinfo; + +aio_context_acquire(blk_get_aio_context(blk)); +if (reinfo->enable) { +aio_context_release(blk_get_aio_context(blk)); +return; +} + +reinfo->ts = aio_timer_new(blk_get_aio_context(blk), QEMU_CLOCK_REALTIME, + SCALE_MS, blk_rehandle_timer_cb, blk); +reinfo->timer_interval_ms = BLOCK_BACKEND_REHANDLE_TIMER_INTERVAL; +reinfo->status = BLOCK_BACKEND_REHANDLE_NORMAL; +reinfo->enable = true; +aio_context_release(blk_get_aio_context(blk)); +} + +void blk_rehandle_disable(BlockBackend *blk) +{ +if (!blk->reinfo.enable) { +return; +} + +blk_rehandle_pause(blk); +timer_del(blk->reinfo.ts); +timer_free(blk->reinfo.ts); +blk->reinfo.ts = NULL; +blk->reinfo.enable = false; +} + +bool blk_iohang_is_enabled(BlockBackend *blk) +{ +return blk->iohang_timeout != 0; +} + void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) { if (!blk) { @@ -2588,6 +2664,7 @@ void blk_iohang_init(BlockBackend *blk, int64_t iohang_timeout) blk->iohang_status = BLOCK_IO_HANG_STATUS_NORMAL; if (iohang_timeout > 0) { blk->iohang_timeout = iohang_timeout; +blk_rehandle_enable(blk); } } diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index bfebe3
Re: [RFC PATCH 07/12] hw/acpi/aml-build: add processor hierarchy node structure
On 9/17/2020 4:27 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:28AM +0800, Ying Fang wrote: Add the processor hierarchy node structures to build ACPI information for CPU topology. Three helpers are introduced: (1) build_socket_hierarchy for socket description structure (2) build_processor_hierarchy for processor description structure (3) build_smt_hierarchy for thread (logic processor) description structure Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 37 + include/hw/acpi/aml-build.h | 7 +++ 2 files changed, 44 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index f6fbc9b95d..13eb6e1345 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1754,6 +1754,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} I see you took this from https://patchwork.ozlabs.org/project/qemu-devel/patch/20180704124923.32483-6-drjo...@redhat.com/ (even though you neglected to mention that) I've tweaked my implementation of it slightly per Igor's comments for the refresh. See build_processor_hierarchy_node() in https://github.com/rhdrjones/qemu/commit/439b38d67ca1f2cbfa5b9892a822b651ebd05c11 Ok, I will sync with your work and test it. + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0);/* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, add private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index d27da03d64..ff4c6a38f3 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -435,6 +435,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); Why add build_socket_hierarchy() and build_smt_hierarchy() ? To distinguish between socket, core and thread topology level, build_socket_hierarchy and build_smt_hierarchy are introduced. They will make the code logical in built_pptt much more much straightforward I think. + void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id); -- 2.23.0 Thanks, drew .
Re: [RFC PATCH 09/12] target/arm/cpu: Add CPU cache description for arm
On 9/17/2020 4:39 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:30AM +0800, Ying Fang wrote: Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus. A classic three level cache topology is used here. The default cache capacity is given and userspace can overwrite these values. Doesn't TCG already have some sort of fake cache hierarchy? If so, then TCG may have some sort of fake cache hierarchy via CCSIDR. we shouldn't be adding another one, we should be simply describing the one we already have. For KVM, we shouldn't describe anything other than what is actually on the host. Yes, I agreed. Cache capacity should be the with host otherwise it may have bad impact on guest performance, we can do that by query from the host and make cache capacity configurable from userspace. Dario Faggioli is going to give a talk about it in KVM forum [1]. [1] https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines-friend-or-foe-dario-faggioli-suse?iframe=no=100%=yes=no Thanks. drew .
Re: [RFC PATCH 06/12] hw/arm/virt-acpi-build: distinguish possible and present cpus
On 9/17/2020 4:20 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:27AM +0800, Ying Fang wrote: When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) I approached this in a different way in the refresh, so the this patch was dropped, but the refresh is completely untested, so something similar may still be necessary. Nice work, I'll open it and take a look. Thanks. Thanks, drew .
Re: [RFC PATCH 04/12] device_tree: add qemu_fdt_add_path
On 9/17/2020 4:12 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:25AM +0800, Ying Fang wrote: From: Andrew Jones qemu_fdt_add_path works like qemu_fdt_add_subnode, except it also recursively adds any missing parent nodes. Cc: Peter Crosthwaite Cc: Alexander Graf Signed-off-by: Andrew Jones --- device_tree.c| 24 include/sysemu/device_tree.h | 1 + 2 files changed, 25 insertions(+) diff --git a/device_tree.c b/device_tree.c index b335dae707..1854be3a02 100644 --- a/device_tree.c +++ b/device_tree.c @@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *parent; +int offset; + +offset = fdt_path_offset(fdt, path); +if (offset < 0 && offset != -FDT_ERR_NOTFOUND) { +error_report("%s Couldn't find node %s: %s", __func__, path, + fdt_strerror(offset)); +exit(1); +} + +if (offset != -FDT_ERR_NOTFOUND) { +return offset; +} + +parent = g_strdup(path); +strrchr(parent, '/')[0] = '\0'; +qemu_fdt_add_path(fdt, parent); +g_free(parent); + +return qemu_fdt_add_subnode(fdt, path); +} Igor didn't like the recursion when I posted this before so I changed it when doing the refresh[*] that I gave to Salil Mehta. Salil also works for Huawei, are you guys not working together? [*] https://github.com/rhdrjones/qemu/commits/virt-cpu-topology-refresh Thanks for the sync. I'll look into it. I did not know about the refresh and the effort Salil Mehta has made on this. We are not in the same dept and work for different projects. Thanks Ying. Thanks, drew + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb"); diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ -- 2.23.0 .
Re: [RFC PATCH 03/12] target/arm/kvm32: make MPIDR consistent with CPU Topology
On 9/17/2020 4:07 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:24AM +0800, Ying Fang wrote: MPIDR helps to provide an additional PE identification in a multiprocessor system. This patch adds support for setting MPIDR from userspace, so that MPIDR is consistent with CPU topology configured. Signed-off-by: Ying Fang --- target/arm/kvm32.c | 46 ++ 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c index 0af46b41c8..85694dc8bf 100644 --- a/target/arm/kvm32.c +++ b/target/arm/kvm32.c This file no longer exists in mainline. Please rebase the whole series. Thanks, it is gone. Will rebase it. Thanks, drew .
Re: [RFC PATCH 02/12] target/arm/kvm64: make MPIDR consistent with CPU Topology
On 9/17/2020 6:59 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 09:53:35AM +0200, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:23AM +0800, Ying Fang wrote: MPIDR helps to provide an additional PE identification in a multiprocessor system. This patch adds support for setting MPIDR from userspace, so that MPIDR is consistent with CPU topology configured. Signed-off-by: Ying Fang --- target/arm/kvm64.c | 46 ++ 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index ef1e960285..fcce261a10 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -757,10 +757,46 @@ static int kvm_arm_sve_set_vls(CPUState *cs) #define ARM_CPU_ID_MPIDR 3, 0, 0, 0, 5 +static int kvm_arm_set_mp_affinity(CPUState *cs) +{ +uint64_t mpidr; +ARMCPU *cpu = ARM_CPU(cs); + +if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) { +/* Make MPIDR consistent with CPU topology */ +MachineState *ms = MACHINE(qdev_get_machine()); + +mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT; We should query KVM first to determine if it wants guests to see their PEs as threads or not. If not, and ms->smp.threads is > 1, then that's an error. And, in any case, if ms->smp.threads == 1, then we shouldn't waste aff0 on it, as that could reduce IPI broadcast performance. +mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores) +& 0xff) << ARM_AFF1_SHIFT; +mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads) +& 0xff) << ARM_AFF2_SHIFT; Also, as pointed out in the KVM thread, we should not be attempting to describe topology with the MPIDR at all. Alexandru pointed out [*] as evidence for that. However, we do need to consider the limits on Aff0 imposed by the GIC. See hw/arm/virt.c:virt_cpu_mp_affinity() for how we currently do it for TCG. We should do something similar for KVM guests when we're taking full control of the MPIDR. [*] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?id=3102bc0e6ac7 Thanks, drew Thanks for your information on MPIDR. As described in [*], MPIDR cannot be trusted as the actual topology. After applying: arm64: topology: Stop using MPIDR for topology information Can we just use topology information from ACPI or fdt as topology and ignore MPIDR ? + +/* Override mp affinity when KVM is in use */ +cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK; + +/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */ +mpidr |= (1ULL << 31); +return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, ); +} else { +/* + * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its + * own idea about MPIDR assignment, so we override our defaults with + * what we get from KVM. + */ +int ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), ); +if (ret) { +error_report("failed to set MPIDR"); We don't need this error, kvm_get_one_reg() has trace support already. Anyway, the wording is wrong since it says 'set' instead of 'get'. +return ret; +} +cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK; +return ret; +} +} + int kvm_arch_init_vcpu(CPUState *cs) { int ret; -uint64_t mpidr; ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = >env; @@ -814,16 +850,10 @@ int kvm_arch_init_vcpu(CPUState *cs) } } -/* - * When KVM is in use, PSCI is emulated in-kernel and not by qemu. - * Currently KVM has its own idea about MPIDR assignment, so we - * override our defaults with what we get from KVM. - */ -ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), ); +ret = kvm_arm_set_mp_affinity(cs); if (ret) { return ret; } -cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK; kvm_arm_init_debug(cs); -- 2.23.0 Thanks, drew .
Re: [RFC PATCH 02/12] target/arm/kvm64: make MPIDR consistent with CPU Topology
On 9/17/2020 3:53 PM, Andrew Jones wrote: On Thu, Sep 17, 2020 at 11:20:23AM +0800, Ying Fang wrote: MPIDR helps to provide an additional PE identification in a multiprocessor system. This patch adds support for setting MPIDR from userspace, so that MPIDR is consistent with CPU topology configured. Signed-off-by: Ying Fang --- target/arm/kvm64.c | 46 ++ 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index ef1e960285..fcce261a10 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -757,10 +757,46 @@ static int kvm_arm_sve_set_vls(CPUState *cs) #define ARM_CPU_ID_MPIDR 3, 0, 0, 0, 5 +static int kvm_arm_set_mp_affinity(CPUState *cs) +{ +uint64_t mpidr; +ARMCPU *cpu = ARM_CPU(cs); + +if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) { +/* Make MPIDR consistent with CPU topology */ +MachineState *ms = MACHINE(qdev_get_machine()); + +mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT; We should query KVM first to determine if it wants guests to see their PEs as threads or not. If not, and ms->smp.threads is > 1, then that's an error. And, in any case, if ms->smp.threads == 1, then we shouldn't waste aff0 on it, as that could reduce IPI broadcast performance. Yes, good catch. Should check against smp.threads before filling the MPIDR value. +mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores) +& 0xff) << ARM_AFF1_SHIFT; +mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads) +& 0xff) << ARM_AFF2_SHIFT; + +/* Override mp affinity when KVM is in use */ +cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK; + +/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */ +mpidr |= (1ULL << 31); +return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, ); +} else { +/* + * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its + * own idea about MPIDR assignment, so we override our defaults with + * what we get from KVM. + */ +int ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), ); +if (ret) { +error_report("failed to set MPIDR"); We don't need this error, kvm_get_one_reg() has trace support already. Anyway, the wording is wrong since it says 'set' instead of 'get'. Yes, my careless, I will fix it. +return ret; +} +cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK; +return ret; +} +} + int kvm_arch_init_vcpu(CPUState *cs) { int ret; -uint64_t mpidr; ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = >env; @@ -814,16 +850,10 @@ int kvm_arch_init_vcpu(CPUState *cs) } } -/* - * When KVM is in use, PSCI is emulated in-kernel and not by qemu. - * Currently KVM has its own idea about MPIDR assignment, so we - * override our defaults with what we get from KVM. - */ -ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), ); +ret = kvm_arm_set_mp_affinity(cs); if (ret) { return ret; } -cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK; kvm_arm_init_debug(cs); -- 2.23.0 Thanks, drew .
[RFC PATCH 12/12] hw/arm/virt-acpi-build: Enable CPU cache topology
A helper struct AcpiCacheOffset is introduced to describe the offset of three level caches. The cache hierarchy is built according to ACPI spec v6.3 5.2.29.2. Let's enable CPU cache topology now. Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 19 +- hw/arm/virt-acpi-build.c| 52 - include/hw/acpi/acpi-defs.h | 6 + include/hw/acpi/aml-build.h | 7 ++--- 4 files changed, 68 insertions(+), 16 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 123eb032cd..f8d74f3f10 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1783,27 +1783,32 @@ void build_cache_hierarchy(GArray *tbl, /* * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) */ -void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +void build_socket_hierarchy(GArray *tbl, uint32_t parent, +uint32_t offset, uint32_t id) { build_append_byte(tbl, 0); /* Type 0 - processor */ -build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_byte(tbl, 24); /* Length, no private resources */ build_append_int_noprefix(tbl, 0, 2); /* Reserved */ build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ build_append_int_noprefix(tbl, parent, 4); /* Parent */ build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ -build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, 1, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, offset, 4); /* Private resources */ } -void build_processor_hierarchy(GArray *tbl, uint32_t flags, - uint32_t parent, uint32_t id) +void build_processor_hierarchy(GArray *tbl, uint32_t flags, uint32_t parent, + AcpiCacheOffset offset, uint32_t id) { build_append_byte(tbl, 0); /* Type 0 - processor */ -build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_byte(tbl, 32); /* Length, no private resources */ build_append_int_noprefix(tbl, 0, 2); /* Reserved */ build_append_int_noprefix(tbl, flags, 4); /* Flags */ build_append_int_noprefix(tbl, parent, 4); /* Parent */ build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ -build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, 3, 4); /* Number of private resources */ +build_append_int_noprefix(tbl, offset.l1d_offset, 4);/* Private resources */ +build_append_int_noprefix(tbl, offset.l1i_offset, 4);/* Private resources */ +build_append_int_noprefix(tbl, offset.l2_offset, 4); /* Private resources */ } void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index b5aa3d3c83..375fb9e24f 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -594,29 +594,69 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } -static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +static inline void arm_acpi_cache_info(CPUCacheInfo *cpu_cache, + AcpiCacheInfo *acpi_cache) { +acpi_cache->size = cpu_cache->size; +acpi_cache->sets = cpu_cache->sets; +acpi_cache->associativity = cpu_cache->associativity; +acpi_cache->attributes = cpu_cache->attributes; +acpi_cache->line_size = cpu_cache->line_size; +} + +static void build_pptt(GArray *table_data, BIOSLinker *linker, + VirtMachineState *vms) +{ +MachineState *ms = MACHINE(vms); int pptt_start = table_data->len; int uid = 0, cpus = 0, socket; unsigned int smp_cores = ms->smp.cores; unsigned int smp_threads = ms->smp.threads; +AcpiCacheOffset offset; +ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpus)); +AcpiCacheInfo cache_info; acpi_data_push(table_data, sizeof(AcpiTableHeader)); for (socket = 0; cpus < ms->possible_cpus->len; socket++) { -uint32_t socket_offset = table_data->len - pptt_start; +uint32_t l3_offset = table_data->len - pptt_start; +uint32_t socket_offset; int core; -build_socket_hierarchy(table_data, 0, socket); +/* L3 cache type structure */ +arm_acpi_cache_info(cpu->caches.l3_cache, _info); +build_cache_hierarchy(table_data, 0, _info); + +socket_offset = table_data->len - pptt_start; +build_socket_hierarchy(table_data, 0, l3_offset, socket); for (core = 0; core < smp_cores; core++) { uint32_t core_offset = table_data->len
[RFC PATCH 11/12] hw/acpi/aml-build: build ACPI CPU cache topology information
To build cache information, An AcpiCacheInfo structure is defined to hold the Type 1 cache structure according to ACPI spec v6.3 5.2.29.2. A helper function build_cache_hierarchy is introduced to encode the cache information. Signed-off-by: Ying Fang --- hw/acpi/aml-build.c | 26 ++ include/hw/acpi/acpi-defs.h | 8 include/hw/acpi/aml-build.h | 3 +++ 3 files changed, 37 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 13eb6e1345..123eb032cd 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1754,6 +1754,32 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* ACPI 6.3: 5.29.2 Cache type structure (Type 1) */ +static void build_cache_head(GArray *tbl, uint32_t next_level) +{ +build_append_byte(tbl, 1); +build_append_byte(tbl, 24); +build_append_int_noprefix(tbl, 0, 2); +build_append_int_noprefix(tbl, 0x7f, 4); +build_append_int_noprefix(tbl, next_level, 4); +} + +static void build_cache_tail(GArray *tbl, AcpiCacheInfo *cache_info) +{ +build_append_int_noprefix(tbl, cache_info->size, 4); +build_append_int_noprefix(tbl, cache_info->sets, 4); +build_append_byte(tbl, cache_info->associativity); +build_append_byte(tbl, cache_info->attributes); +build_append_int_noprefix(tbl, cache_info->line_size, 2); +} + +void build_cache_hierarchy(GArray *tbl, + uint32_t next_level, AcpiCacheInfo *cache_info) +{ +build_cache_head(tbl, next_level); +build_cache_tail(tbl, cache_info); +} + /* * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) */ diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h index 38a42f409a..3df38ab449 100644 --- a/include/hw/acpi/acpi-defs.h +++ b/include/hw/acpi/acpi-defs.h @@ -618,4 +618,12 @@ struct AcpiIortRC { } QEMU_PACKED; typedef struct AcpiIortRC AcpiIortRC; +typedef struct AcpiCacheInfo { +uint32_t size; +uint32_t sets; +uint8_t associativity; +uint8_t attributes; +uint16_t line_size; +} AcpiCacheInfo; + #endif diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index ff4c6a38f3..ced1ae6a83 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -435,6 +435,9 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_cache_hierarchy(GArray *tbl, + uint32_t next_level, AcpiCacheInfo *cache_info); + void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); void build_processor_hierarchy(GArray *tbl, uint32_t flags, -- 2.23.0
[RFC PATCH 10/12] hw/arm/virt: add fdt cache information
Support devicetree CPU cache information descriptions Signed-off-by: Ying Fang --- hw/arm/virt.c | 91 +++ 1 file changed, 91 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 71f7dbb317..74b748ae35 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -343,6 +343,89 @@ static void fdt_add_timer_nodes(const VirtMachineState *vms) GIC_FDT_IRQ_TYPE_PPI, ARCH_TIMER_NS_EL2_IRQ, irqflags); } +static void fdt_add_l3cache_nodes(const VirtMachineState *vms) +{ +int i; +const MachineState *ms = MACHINE(vms); +ARMCPU *cpu = ARM_CPU(first_cpu); +unsigned int smp_cores = ms->smp.cores; +unsigned int sockets = ms->smp.max_cpus / smp_cores; + +for (i = 0; i < sockets; i++) { +char *nodename = g_strdup_printf("/cpus/l3-cache%d", i); +qemu_fdt_add_subnode(vms->fdt, nodename); +qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache"); +qemu_fdt_setprop_string(vms->fdt, nodename, "cache-unified", "true"); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-level", 3); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size", + cpu->caches.l3_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size", + cpu->caches.l3_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets", + cpu->caches.l3_cache->sets); +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +g_free(nodename); +} +} + +static void fdt_add_l2cache_nodes(const VirtMachineState *vms) +{ +int i, j; +const MachineState *ms = MACHINE(vms); +unsigned int smp_cores = ms->smp.cores; +signed int sockets = ms->smp.max_cpus / smp_cores; +ARMCPU *cpu = ARM_CPU(first_cpu); + +for (i = 0; i < sockets; i++) { +char *next_path = g_strdup_printf("/cpus/l3-cache%d", i); +for (j = 0; j < smp_cores; j++) { +char *nodename = g_strdup_printf("/cpus/l2-cache%d", + i * smp_cores + j); +qemu_fdt_add_subnode(vms->fdt, nodename); +qemu_fdt_setprop_string(vms->fdt, nodename, "compatible", "cache"); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-size", + cpu->caches.l2_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-line-size", + cpu->caches.l2_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "cache-sets", + cpu->caches.l2_cache->sets); +qemu_fdt_setprop_phandle(vms->fdt, nodename, + "next-level-cache", next_path); +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); +g_free(nodename); +} +g_free(next_path); +} +} + +static void fdt_add_l1cache_prop(const VirtMachineState *vms, +char *nodename, int cpu_index) +{ + +ARMCPU *cpu = ARM_CPU(qemu_get_cpu(cpu_index)); +CPUCaches caches = cpu->caches; + +char *cachename = g_strdup_printf("/cpus/l2-cache%d", cpu_index); + +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-size", + caches.l1d_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-line-size", + caches.l1d_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "d-cache-sets", + caches.l1d_cache->sets); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-size", + caches.l1i_cache->size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-line-size", + caches.l1i_cache->line_size); +qemu_fdt_setprop_cell(vms->fdt, nodename, "i-cache-sets", + caches.l1i_cache->sets); +qemu_fdt_setprop_phandle(vms->fdt, nodename, "next-level-cache", + cachename); +g_free(cachename); +} + static void fdt_add_cpu_nodes(const VirtMachineState *vms) { int cpu; @@ -378,6 +461,11 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) qemu_fdt_setprop_cell(vms->fdt, "/cpus", "#address-cells", addr_cells); qemu_fdt_setprop_cell(vms->fdt, "/cpus", &qu
[RFC PATCH 08/12] hw/arm/virt-acpi-build: add PPTT table
Add the Processor Properties Topology Table (PPTT) to present CPU topology information to the guest. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 42 1 file changed, 42 insertions(+) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index f1d574b5d3..b5aa3d3c83 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -594,6 +594,42 @@ build_srat(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) "SRAT", table_data->len - srat_start, 3, NULL, NULL); } +static void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms) +{ +int pptt_start = table_data->len; +int uid = 0, cpus = 0, socket; +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; + +acpi_data_push(table_data, sizeof(AcpiTableHeader)); + +for (socket = 0; cpus < ms->possible_cpus->len; socket++) { +uint32_t socket_offset = table_data->len - pptt_start; +int core; + +build_socket_hierarchy(table_data, 0, socket); + +for (core = 0; core < smp_cores; core++) { +uint32_t core_offset = table_data->len - pptt_start; +int thread; + +if (smp_threads <= 1) { +build_processor_hierarchy(table_data, 2, socket_offset, uid++); + } else { +build_processor_hierarchy(table_data, 0, socket_offset, core); +for (thread = 0; thread < smp_threads; thread++) { +build_smt_hierarchy(table_data, core_offset, uid++); +} + } +} +cpus += smp_cores * smp_threads; +} + +build_header(linker, table_data, + (void *)(table_data->data + pptt_start), "PPTT", + table_data->len - pptt_start, 2, NULL, NULL); +} + /* GTDT */ static void build_gtdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) @@ -834,6 +870,7 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) unsigned dsdt, xsdt; GArray *tables_blob = tables->table_data; MachineState *ms = MACHINE(vms); +bool cpu_topology_enabled = !vmc->ignore_cpu_topology; table_offsets = g_array_new(false, true /* clear */, sizeof(uint32_t)); @@ -853,6 +890,11 @@ void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables) acpi_add_table(table_offsets, tables_blob); build_madt(tables_blob, tables->linker, vms); +if (cpu_topology_enabled) { +acpi_add_table(table_offsets, tables_blob); +build_pptt(tables_blob, tables->linker, ms); +} + acpi_add_table(table_offsets, tables_blob); build_gtdt(tables_blob, tables->linker, vms); -- 2.23.0
[RFC PATCH 00/12] hw/arm/virt: Introduce cpu and cache topology support
An accurate cpu topology may help improve the cpu scheduler's decision making when dealing with multi-core system. So cpu topology description is helpful to provide guest with the right view. Cpu cache information may also have slight impact on the sched domain, and even userspace software may check the cpu cache information to do some optimizations. Thus this patch series is posted to provide cpu and cache topology support for arm. To make the cpu topology consistent with MPIDR, an vcpu ioctl KVM_ARM_SET_MP_AFFINITY is introduced so that userspace can set MPIDR according to the topology specified [1]. To describe the cpu topology both fdt and ACPI are supported. To describe the cpu cache information, a default cache hierarchy is given and can be made configurable later. The cpu topology is built according to processor hierarchy node structure. The cpu cache information is built according to cache type structure. This patch series is partially based on the patches posted by Andrew Jone years ago [2], I jumped in on it since some OS vendor cooperative partners are eager for it. Thanks for Andrew's contribution. Please feel free to reply to me if there is anything improper. [1] https://patchwork.kernel.org/cover/11781317 [2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20180704124923.32483-1-drjo...@redhat.com Andrew Jones (2): device_tree: add qemu_fdt_add_path hw/arm/virt: DT: add cpu-map Ying Fang (10): linux headers: Update linux header with KVM_ARM_SET_MP_AFFINITY target/arm/kvm64: make MPIDR consistent with CPU Topology target/arm/kvm32: make MPIDR consistent with CPU Topology hw/arm/virt-acpi-build: distinguish possible and present cpus hw/acpi/aml-build: add processor hierarchy node structure hw/arm/virt-acpi-build: add PPTT table target/arm/cpu: Add CPU cache description for arm hw/arm/virt: add fdt cache information hw/acpi/aml-build: build ACPI CPU cache topology information hw/arm/virt-acpi-build: Enable CPU cache topology device_tree.c| 24 +++ hw/acpi/aml-build.c | 68 +++ hw/arm/virt-acpi-build.c | 99 +-- hw/arm/virt.c| 128 ++- include/hw/acpi/acpi-defs.h | 14 include/hw/acpi/aml-build.h | 11 +++ include/hw/arm/virt.h| 1 + include/sysemu/device_tree.h | 1 + linux-headers/linux/kvm.h| 3 + target/arm/cpu.c | 42 target/arm/cpu.h | 27 target/arm/kvm32.c | 46 ++--- target/arm/kvm64.c | 46 ++--- 13 files changed, 488 insertions(+), 22 deletions(-) -- 2.23.0
[RFC PATCH 07/12] hw/acpi/aml-build: add processor hierarchy node structure
Add the processor hierarchy node structures to build ACPI information for CPU topology. Three helpers are introduced: (1) build_socket_hierarchy for socket description structure (2) build_processor_hierarchy for processor description structure (3) build_smt_hierarchy for thread (logic processor) description structure Signed-off-by: Ying Fang Signed-off-by: Henglong Fan --- hw/acpi/aml-build.c | 37 + include/hw/acpi/aml-build.h | 7 +++ 2 files changed, 44 insertions(+) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index f6fbc9b95d..13eb6e1345 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1754,6 +1754,43 @@ void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms) table_data->len - slit_start, 1, NULL, NULL); } +/* + * ACPI 6.3: 5.2.29.1 Processor hierarchy node structure (Type 0) + */ +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 1, 4); /* Flags: Physical package */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0); /* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, no private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, flags, 4); /* Flags */ +build_append_int_noprefix(tbl, parent, 4); /* Parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Number of private resources */ +} + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id) +{ +build_append_byte(tbl, 0);/* Type 0 - processor */ +build_append_byte(tbl, 20); /* Length, add private resources */ +build_append_int_noprefix(tbl, 0, 2); /* Reserved */ +build_append_int_noprefix(tbl, 0x0e, 4);/* Processor is a thread */ +build_append_int_noprefix(tbl, parent , 4); /* parent */ +build_append_int_noprefix(tbl, id, 4); /* ACPI processor ID */ +build_append_int_noprefix(tbl, 0, 4); /* Num of private resources */ +} + /* build rev1/rev3/rev5.1 FADT */ void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id) diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index d27da03d64..ff4c6a38f3 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -435,6 +435,13 @@ void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base, void build_slit(GArray *table_data, BIOSLinker *linker, MachineState *ms); +void build_socket_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + +void build_processor_hierarchy(GArray *tbl, uint32_t flags, + uint32_t parent, uint32_t id); + +void build_smt_hierarchy(GArray *tbl, uint32_t parent, uint32_t id); + void build_fadt(GArray *tbl, BIOSLinker *linker, const AcpiFadtData *f, const char *oem_id, const char *oem_table_id); -- 2.23.0
[RFC PATCH 06/12] hw/arm/virt-acpi-build: distinguish possible and present cpus
When building ACPI tables regarding CPUs we should always build them for the number of possible CPUs, not the number of present CPUs. We then ensure only the present CPUs are enabled. Signed-off-by: Andrew Jones Signed-off-by: Ying Fang --- hw/arm/virt-acpi-build.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 9efd7a3881..f1d574b5d3 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -56,14 +56,18 @@ #define ARM_SPI_BASE 32 -static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus) +static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState *vms) { uint16_t i; +CPUArchIdList *possible_cpus = MACHINE(vms)->possible_cpus; -for (i = 0; i < smp_cpus; i++) { +for (i = 0; i < possible_cpus->len; i++) { Aml *dev = aml_device("C%.03X", i); aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0007"))); aml_append(dev, aml_name_decl("_UID", aml_int(i))); +if (possible_cpus->cpus[i].cpu == NULL) { +aml_append(dev, aml_name_decl("_STA", aml_int(0))); +} aml_append(scope, dev); } } @@ -635,6 +639,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) const int *irqmap = vms->irqmap; AcpiMadtGenericDistributor *gicd; AcpiMadtGenericMsiFrame *gic_msi; +int possible_cpus = MACHINE(vms)->possible_cpus->len; int i; acpi_data_push(table_data, sizeof(AcpiMultipleApicTable)); @@ -645,7 +650,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicd->base_address = cpu_to_le64(memmap[VIRT_GIC_DIST].base); gicd->version = vms->gic_version; -for (i = 0; i < vms->smp_cpus; i++) { +for (i = 0; i < possible_cpus; i++) { AcpiMadtGenericCpuInterface *gicc = acpi_data_push(table_data, sizeof(*gicc)); ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(i)); @@ -660,7 +665,9 @@ build_madt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) gicc->cpu_interface_number = cpu_to_le32(i); gicc->arm_mpidr = cpu_to_le64(armcpu->mp_affinity); gicc->uid = cpu_to_le32(i); -gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +if (i < vms->smp_cpus) { +gicc->flags = cpu_to_le32(ACPI_MADT_GICC_ENABLED); +} if (arm_feature(>env, ARM_FEATURE_PMU)) { gicc->performance_interrupt = cpu_to_le32(PPI(VIRTUAL_PMU_IRQ)); @@ -764,7 +771,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms) * the RTC ACPI device at all when using UEFI. */ scope = aml_scope("\\_SB"); -acpi_dsdt_add_cpus(scope, vms->smp_cpus); +acpi_dsdt_add_cpus(scope, vms); acpi_dsdt_add_uart(scope, [VIRT_UART], (irqmap[VIRT_UART] + ARM_SPI_BASE)); if (vmc->acpi_expose_flash) { -- 2.23.0
[RFC PATCH 09/12] target/arm/cpu: Add CPU cache description for arm
Add the CPUCacheInfo structure to hold CPU cache information for ARM cpus. A classic three level cache topology is used here. The default cache capacity is given and userspace can overwrite these values. Signed-off-by: Ying Fang --- target/arm/cpu.c | 42 ++ target/arm/cpu.h | 27 +++ 2 files changed, 69 insertions(+) diff --git a/target/arm/cpu.c b/target/arm/cpu.c index c179e0752d..efa8e1974a 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -27,6 +27,7 @@ #include "qapi/visitor.h" #include "cpu.h" #include "internals.h" +#include "qemu/units.h" #include "exec/exec-all.h" #include "hw/qdev-properties.h" #if !defined(CONFIG_USER_ONLY) @@ -998,6 +999,45 @@ uint64_t arm_cpu_mp_affinity(int idx, uint8_t clustersz) return (Aff1 << ARM_AFF1_SHIFT) | Aff0; } +static CPUCaches default_cache_info = { +.l1d_cache = &(CPUCacheInfo) { +.type = DATA_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x02, +}, +.l1i_cache = &(CPUCacheInfo) { +.type = INSTRUCTION_CACHE, +.level = 1, +.size = 64 * KiB, +.line_size = 64, +.associativity = 4, +.sets = 256, +.attributes = 0x04, +}, +.l2_cache = &(CPUCacheInfo) { +.type = UNIFIED_CACHE, +.level = 2, +.size = 512 * KiB, +.line_size = 64, +.associativity = 8, +.sets = 1024, +.attributes = 0x0a, +}, +.l3_cache = &(CPUCacheInfo) { +.type = UNIFIED_CACHE, +.level = 3, +.size = 65536 * KiB, +.line_size = 64, +.associativity = 15, +.sets = 2048, +.attributes = 0x0a, +}, +}; + static void cpreg_hashtable_data_destroy(gpointer data) { /* @@ -1835,6 +1875,8 @@ static void arm_cpu_realizefn(DeviceState *dev, Error **errp) } } +cpu->caches = default_cache_info; + qemu_init_vcpu(cs); cpu_reset(cs); diff --git a/target/arm/cpu.h b/target/arm/cpu.h index a1c7d8ebae..e9e3817e20 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -745,6 +745,30 @@ typedef enum ARMPSCIState { typedef struct ARMISARegisters ARMISARegisters; +/* Cache information type */ +enum CacheType { +DATA_CACHE, +INSTRUCTION_CACHE, +UNIFIED_CACHE +}; + +typedef struct CPUCacheInfo { +enum CacheType type; /* Cache Type*/ +uint8_t level; +uint32_t size;/* Size in bytes */ +uint16_t line_size; /* Line size in bytes */ +uint8_t associativity;/* Cache associativity */ +uint32_t sets;/* Number of sets */ +uint8_t attributes; /* Cache attributest */ +} CPUCacheInfo; + +typedef struct CPUCaches { +CPUCacheInfo *l1d_cache; +CPUCacheInfo *l1i_cache; +CPUCacheInfo *l2_cache; +CPUCacheInfo *l3_cache; +} CPUCaches; + /** * ARMCPU: * @env: #CPUARMState @@ -986,6 +1010,9 @@ struct ARMCPU { /* Generic timer counter frequency, in Hz */ uint64_t gt_cntfrq_hz; + +/* CPU cache information */ +CPUCaches caches; }; unsigned int gt_cntfrq_period_ns(ARMCPU *cpu); -- 2.23.0
[RFC PATCH 01/12] linux headers: Update linux header with KVM_ARM_SET_MP_AFFINITY
Signed-off-by: Ying Fang --- linux-headers/linux/kvm.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index a28c366737..461a2302e7 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_PPC_SECURE_GUEST 181 #define KVM_CAP_HALT_POLL 182 #define KVM_CAP_ASYNC_PF_INT 183 +#define KVM_CAP_ARM_MP_AFFINITY 187 #ifdef KVM_CAP_IRQ_ROUTING @@ -1470,6 +1471,8 @@ struct kvm_s390_ucas_mapping { #define KVM_S390_SET_CMMA_BITS _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log) /* Memory Encryption Commands */ #define KVM_MEMORY_ENCRYPT_OP _IOWR(KVMIO, 0xba, unsigned long) +/* Available with KVM_CAP_ARM_MP_AFFINITY */ +#define KVM_ARM_SET_MP_AFFINITY_IOWR(KVMIO, 0xbb, unsigned long) struct kvm_enc_region { __u64 addr; -- 2.23.0
[RFC PATCH 04/12] device_tree: add qemu_fdt_add_path
From: Andrew Jones qemu_fdt_add_path works like qemu_fdt_add_subnode, except it also recursively adds any missing parent nodes. Cc: Peter Crosthwaite Cc: Alexander Graf Signed-off-by: Andrew Jones --- device_tree.c| 24 include/sysemu/device_tree.h | 1 + 2 files changed, 25 insertions(+) diff --git a/device_tree.c b/device_tree.c index b335dae707..1854be3a02 100644 --- a/device_tree.c +++ b/device_tree.c @@ -524,6 +524,30 @@ int qemu_fdt_add_subnode(void *fdt, const char *name) return retval; } +int qemu_fdt_add_path(void *fdt, const char *path) +{ +char *parent; +int offset; + +offset = fdt_path_offset(fdt, path); +if (offset < 0 && offset != -FDT_ERR_NOTFOUND) { +error_report("%s Couldn't find node %s: %s", __func__, path, + fdt_strerror(offset)); +exit(1); +} + +if (offset != -FDT_ERR_NOTFOUND) { +return offset; +} + +parent = g_strdup(path); +strrchr(parent, '/')[0] = '\0'; +qemu_fdt_add_path(fdt, parent); +g_free(parent); + +return qemu_fdt_add_subnode(fdt, path); +} + void qemu_fdt_dumpdtb(void *fdt, int size) { const char *dumpdtb = qemu_opt_get(qemu_get_machine_opts(), "dumpdtb"); diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h index 982c89345f..15fb98af98 100644 --- a/include/sysemu/device_tree.h +++ b/include/sysemu/device_tree.h @@ -104,6 +104,7 @@ uint32_t qemu_fdt_get_phandle(void *fdt, const char *path); uint32_t qemu_fdt_alloc_phandle(void *fdt); int qemu_fdt_nop_node(void *fdt, const char *node_path); int qemu_fdt_add_subnode(void *fdt, const char *name); +int qemu_fdt_add_path(void *fdt, const char *path); #define qemu_fdt_setprop_cells(fdt, node_path, property, ...) \ do { \ -- 2.23.0
[RFC PATCH 05/12] hw/arm/virt: DT: add cpu-map
From: Andrew Jones Support devicetree CPU topology descriptions. Signed-off-by: Andrew Jones --- hw/arm/virt.c | 37 - include/hw/arm/virt.h | 1 + 2 files changed, 37 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index acf9bfbece..71f7dbb317 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -348,7 +348,10 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) int cpu; int addr_cells = 1; const MachineState *ms = MACHINE(vms); - +VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms); +unsigned int smp_cores = ms->smp.cores; +unsigned int smp_threads = ms->smp.threads; +bool cpu_topology_enabled = !vmc->ignore_cpu_topology; /* * From Documentation/devicetree/bindings/arm/cpus.txt * On ARM v8 64-bit systems value should be set to 2, @@ -404,8 +407,37 @@ static void fdt_add_cpu_nodes(const VirtMachineState *vms) ms->possible_cpus->cpus[cs->cpu_index].props.node_id); } +qemu_fdt_setprop_cell(vms->fdt, nodename, "phandle", + qemu_fdt_alloc_phandle(vms->fdt)); + g_free(nodename); } +if (cpu_topology_enabled) { +/* Add vcpu topology by fdt node cpu-map. */ +qemu_fdt_add_subnode(vms->fdt, "/cpus/cpu-map"); + +for (cpu = vms->smp_cpus - 1; cpu >= 0; cpu--) { +char *cpu_path = g_strdup_printf("/cpus/cpu@%d", cpu); +char *map_path; + +if (smp_threads > 1) { +map_path = g_strdup_printf( + "/cpus/cpu-map/%s%d/%s%d/%s%d", + "cluster", cpu / (smp_cores * smp_threads), + "core", (cpu / smp_threads) % smp_cores, + "thread", cpu % smp_threads); +} else { +map_path = g_strdup_printf( + "/cpus/cpu-map/%s%d/%s%d", + "cluster", cpu / smp_cores, + "core", cpu % smp_cores); +} +qemu_fdt_add_path(vms->fdt, map_path); +qemu_fdt_setprop_phandle(vms->fdt, map_path, "cpu", cpu_path); +g_free(map_path); +g_free(cpu_path); +} +} } static void fdt_add_its_gic_node(VirtMachineState *vms) @@ -2553,8 +2585,11 @@ DEFINE_VIRT_MACHINE_AS_LATEST(5, 2) static void virt_machine_5_1_options(MachineClass *mc) { +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc)); + virt_machine_5_2_options(mc); compat_props_add(mc->compat_props, hw_compat_5_1, hw_compat_5_1_len); +vmc->ignore_cpu_topology = true; } DEFINE_VIRT_MACHINE(5, 1) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index dff67e1bef..d37c6b7858 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -119,6 +119,7 @@ typedef struct { MachineClass parent; bool disallow_affinity_adjustment; bool no_its; +bool ignore_cpu_topology; bool no_pmu; bool claim_edge_triggered_timers; bool smbios_old_sys_ver; -- 2.23.0
[RFC PATCH 02/12] target/arm/kvm64: make MPIDR consistent with CPU Topology
MPIDR helps to provide an additional PE identification in a multiprocessor system. This patch adds support for setting MPIDR from userspace, so that MPIDR is consistent with CPU topology configured. Signed-off-by: Ying Fang --- target/arm/kvm64.c | 46 ++ 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index ef1e960285..fcce261a10 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -757,10 +757,46 @@ static int kvm_arm_sve_set_vls(CPUState *cs) #define ARM_CPU_ID_MPIDR 3, 0, 0, 0, 5 +static int kvm_arm_set_mp_affinity(CPUState *cs) +{ +uint64_t mpidr; +ARMCPU *cpu = ARM_CPU(cs); + +if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) { +/* Make MPIDR consistent with CPU topology */ +MachineState *ms = MACHINE(qdev_get_machine()); + +mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT; +mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores) +& 0xff) << ARM_AFF1_SHIFT; +mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads) +& 0xff) << ARM_AFF2_SHIFT; + +/* Override mp affinity when KVM is in use */ +cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK; + +/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */ +mpidr |= (1ULL << 31); +return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, ); +} else { +/* + * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its + * own idea about MPIDR assignment, so we override our defaults with + * what we get from KVM. + */ +int ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), ); +if (ret) { +error_report("failed to set MPIDR"); +return ret; +} +cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK; +return ret; +} +} + int kvm_arch_init_vcpu(CPUState *cs) { int ret; -uint64_t mpidr; ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = >env; @@ -814,16 +850,10 @@ int kvm_arch_init_vcpu(CPUState *cs) } } -/* - * When KVM is in use, PSCI is emulated in-kernel and not by qemu. - * Currently KVM has its own idea about MPIDR assignment, so we - * override our defaults with what we get from KVM. - */ -ret = kvm_get_one_reg(cs, ARM64_SYS_REG(ARM_CPU_ID_MPIDR), ); +ret = kvm_arm_set_mp_affinity(cs); if (ret) { return ret; } -cpu->mp_affinity = mpidr & ARM64_AFFINITY_MASK; kvm_arm_init_debug(cs); -- 2.23.0
[RFC PATCH 03/12] target/arm/kvm32: make MPIDR consistent with CPU Topology
MPIDR helps to provide an additional PE identification in a multiprocessor system. This patch adds support for setting MPIDR from userspace, so that MPIDR is consistent with CPU topology configured. Signed-off-by: Ying Fang --- target/arm/kvm32.c | 46 ++ 1 file changed, 38 insertions(+), 8 deletions(-) diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c index 0af46b41c8..85694dc8bf 100644 --- a/target/arm/kvm32.c +++ b/target/arm/kvm32.c @@ -201,11 +201,47 @@ int kvm_arm_cpreg_level(uint64_t regidx) #define ARM_CPU_ID_MPIDR 0, 0, 0, 5 +static int kvm_arm_set_mp_affinity(CPUState *cs) +{ +uint32_t mpidr; +ARMCPU *cpu = ARM_CPU(cs); + +if (kvm_check_extension(kvm_state, KVM_CAP_ARM_MP_AFFINITY)) { +/* Make MPIDR consistent with CPU topology */ +MachineState *ms = MACHINE(qdev_get_machine()); + +mpidr = (kvm_arch_vcpu_id(cs) % ms->smp.threads) << ARM_AFF0_SHIFT; +mpidr |= ((kvm_arch_vcpu_id(cs) / ms->smp.threads % ms->smp.cores) +& 0xff) << ARM_AFF1_SHIFT; +mpidr |= (kvm_arch_vcpu_id(cs) / (ms->smp.cores * ms->smp.threads) + & 0xff) << ARM_AFF2_SHIFT; + +/* Override mp affinity when KVM is in use */ +cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK; + +/* Bit 31 is RES1 indicates the ARMv7 Multiprocessing Extensions */ +mpidr |= (1ULL << 31); +return kvm_vcpu_ioctl(cs, KVM_ARM_SET_MP_AFFINITY, ); +} else { +/* + * When KVM_CAP_ARM_MP_AFFINITY is not supported, it means KVM has its + * own idea about MPIDR assignment, so we override our defaults with + * what we get from KVM. + */ +int ret = kvm_get_one_reg(cs, ARM_CP15_REG32(ARM_CPU_ID_MPIDR), ); +if (ret) { +error_report("failed to set MPIDR"); +return ret; +} +cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK; +return ret; +} +} + int kvm_arch_init_vcpu(CPUState *cs) { int ret; uint64_t v; -uint32_t mpidr; struct kvm_one_reg r; ARMCPU *cpu = ARM_CPU(cs); @@ -244,16 +280,10 @@ int kvm_arch_init_vcpu(CPUState *cs) return -EINVAL; } -/* - * When KVM is in use, PSCI is emulated in-kernel and not by qemu. - * Currently KVM has its own idea about MPIDR assignment, so we - * override our defaults with what we get from KVM. - */ -ret = kvm_get_one_reg(cs, ARM_CP15_REG32(ARM_CPU_ID_MPIDR), ); +ret = kvm_arm_set_mp_affinity(cs); if (ret) { return ret; } -cpu->mp_affinity = mpidr & ARM32_AFFINITY_MASK; /* Check whether userspace can specify guest syndrome value */ kvm_arm_init_serror_injection(cs); -- 2.23.0
Re: [PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters
On 8/7/2020 4:13 PM, Kevin Wolf wrote: Am 07.08.2020 um 09:42 hat Ying Fang geschrieben: On 8/6/2020 5:13 PM, Kevin Wolf wrote: Am 05.08.2020 um 04:38 hat Ying Fang geschrieben: From: fangying When qemu or qemu-nbd process uses a qcow2 image and configured with 'cache = none', it will write to the qcow2 image with a cache to cache L2 tables, however the process will not use L2 tables without explicitly calling the flush command or closing the mirror flash into the disk. Which may cause the disk data inconsistent with the written data for a long time. If an abnormal process exit occurs here, the issued written data will be lost. Therefore, in order to keep data consistency we need to flush the changes to the L2 entry to the disk in time for the newly allocated cluster. Signed-off-by: Ying Fang If you want to have data safely written to the disk after each write request, you need to use cache=writethrough/directsync (in other words, aliases that are equivalent to setting -device ...,write-cache=off). Note that this will have a major impact on write performance. cache=none means bypassing the kernel page cache (O_DIRECT), but not flushing after each write request. Well, IIUC, cache=none does not guarantee data safety and we should not expect that. Then this patch can be ignored. Indeed, cache=none is a writeback cache mode with all of the consequences. In practice, this is normally good enough because the guest OS will send flush requests when needed (e.g. because a guest application called fsync()), but if the guest doesn't do this, it may suffer data loss. This behaviour is comparable to a volatile disk cache on real hard disks and is a good default, but sometimes you need a writethrough cache mode at the cost of a performance penalty. The late reply, thanks for your detailed explanation on the 'cache' option, having more understanding for it now. Kevin .
Re: [PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters
On 8/6/2020 5:13 PM, Kevin Wolf wrote: Am 05.08.2020 um 04:38 hat Ying Fang geschrieben: From: fangying When qemu or qemu-nbd process uses a qcow2 image and configured with 'cache = none', it will write to the qcow2 image with a cache to cache L2 tables, however the process will not use L2 tables without explicitly calling the flush command or closing the mirror flash into the disk. Which may cause the disk data inconsistent with the written data for a long time. If an abnormal process exit occurs here, the issued written data will be lost. Therefore, in order to keep data consistency we need to flush the changes to the L2 entry to the disk in time for the newly allocated cluster. Signed-off-by: Ying Fang If you want to have data safely written to the disk after each write request, you need to use cache=writethrough/directsync (in other words, aliases that are equivalent to setting -device ...,write-cache=off). Note that this will have a major impact on write performance. cache=none means bypassing the kernel page cache (O_DIRECT), but not flushing after each write request. Well, IIUC, cache=none does not guarantee data safety and we should not expect that. Then this patch can be ignored. Thanks. Kevin .
Re: [PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters
On 8/5/2020 10:43 AM, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/20200805023826.184-1-fangyi...@huawei.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. I see some error message which says ** No space left on device ** However I do not know what is wrong with this build test. Could you give me some help here? Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 error: copy-fd: write returned No space left on device fatal: failed to copy file to '/var/tmp/patchew-tester-tmp-wtnwtuq5/src/.git/objects/pack/pack-518a8ad92e3ce11d2627a7221e2d360b337cb27d.pack': No space left on device fatal: The remote end hung up unexpectedly Traceback (most recent call last): File "patchew-tester/src/patchew-cli", line 521, in test_one git_clone_repo(clone, r["repo"], r["head"], logf, True) File "patchew-tester/src/patchew-cli", line 53, in git_clone_repo subprocess.check_call(clone_cmd, stderr=logf, stdout=logf) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['git', 'clone', '-q', '/home/patchew/.cache/patchew-git-cache/httpsgithubcompatchewprojectqemu-3c8cf5a9c21ff8782164d1def7f44bd888713384', '/var/tmp/patchew-tester-tmp-wtnwtuq5/src']' returned non-zero exit status 128. The full log is available at http://patchew.org/logs/20200805023826.184-1-fangyi...@huawei.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
Re: [PATCH v5 0/2] add new options to set smbios type 4 fields
On 8/6/2020 2:01 PM, Michael S. Tsirkin wrote: On Thu, Aug 06, 2020 at 11:56:32AM +0800, Ying Fang wrote: From: fangying Hi, this patchset was previously posted by my teamate Heyi Guo several months ago, however we missed the merge window. It is reposted here to make it an end. Thanks. Thanks, I will tag it for after the release. Pls ping me after the release to make sure I don't drop it by mistake. Yes, I will do that. Hope it won't be missed this time. Thanks. Patch description: Common VM users sometimes care about CPU speed, so we add two new options to allow VM vendors to present CPU speed to their users. Normally these information can be fetched from host smbios. Strictly speaking, the "max speed" and "current speed" in type 4 are not really for the max speed and current speed of processor, for "max speed" identifies a capability of the system, and "current speed" identifies the processor's speed at boot (see smbios spec), but some applications do not tell the differences. Changelog: v4 -> v5: - Rebase patch for lastest upstream v3 -> v4: - Fix the default value when not specifying "-smbios type=4" option; it would be 0 instead of 2000 in previous versions - Use uint64_t type to check value overflow - Add test case to check smbios type 4 CPU speed - v4 https://patchwork.kernel.org/cover/11444635/ v2 -> v3: - Refine comments per Igor's suggestion. v1 -> v2: - change "_" in option names to "-" - check if option value is too large to fit in SMBIOS type 4 speed fields. Cc: "Michael S. Tsirkin" Cc: Igor Mammedov Ying Fang (2): hw/smbios: add options for type 4 max-speed and current-speed tests/bios-tables-test: add smbios cpu speed test hw/smbios/smbios.c | 36 ++ qemu-options.hx | 2 +- tests/bios-tables-test.c | 42 3 files changed, 75 insertions(+), 5 deletions(-) -- 2.23.0 .
[PATCH v5 1/2] hw/smbios: add options for type 4 max-speed and current-speed
Common VM users sometimes care about CPU speed, so we add two new options to allow VM vendors to present CPU speed to their users. Normally these information can be fetched from host smbios. Strictly speaking, the "max speed" and "current speed" in type 4 are not really for the max speed and current speed of processor, for "max speed" identifies a capability of the system, and "current speed" identifies the processor's speed at boot (see smbios spec), but some applications do not tell the differences. Reviewed-by: Igor Mammedov Signed-off-by: Ying Fang Signed-off-by: Heyi Guo --- hw/smbios/smbios.c | 36 qemu-options.hx| 2 +- 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c index 11d476c4a2..53181a58eb 100644 --- a/hw/smbios/smbios.c +++ b/hw/smbios/smbios.c @@ -93,9 +93,21 @@ static struct { const char *manufacturer, *version, *serial, *asset, *sku; } type3; +/* + * SVVP requires max_speed and current_speed to be set and not being + * 0 which counts as unknown (SMBIOS 3.1.0/Table 21). Set the + * default value to 2000MHz as we did before. + */ +#define DEFAULT_CPU_SPEED 2000 + static struct { const char *sock_pfx, *manufacturer, *version, *serial, *asset, *part; -} type4; +uint64_t max_speed; +uint64_t current_speed; +} type4 = { +.max_speed = DEFAULT_CPU_SPEED, +.current_speed = DEFAULT_CPU_SPEED +}; static struct { size_t nvalues; @@ -273,6 +285,14 @@ static const QemuOptDesc qemu_smbios_type4_opts[] = { .name = "version", .type = QEMU_OPT_STRING, .help = "version number", +},{ +.name = "max-speed", +.type = QEMU_OPT_NUMBER, +.help = "max speed in MHz", +},{ +.name = "current-speed", +.type = QEMU_OPT_NUMBER, +.help = "speed at system boot in MHz", },{ .name = "serial", .type = QEMU_OPT_STRING, @@ -587,9 +607,8 @@ static void smbios_build_type_4_table(MachineState *ms, unsigned instance) SMBIOS_TABLE_SET_STR(4, processor_version_str, type4.version); t->voltage = 0; t->external_clock = cpu_to_le16(0); /* Unknown */ -/* SVVP requires max_speed and current_speed to not be unknown. */ -t->max_speed = cpu_to_le16(2000); /* 2000 MHz */ -t->current_speed = cpu_to_le16(2000); /* 2000 MHz */ +t->max_speed = cpu_to_le16(type4.max_speed); +t->current_speed = cpu_to_le16(type4.current_speed); t->status = 0x41; /* Socket populated, CPU enabled */ t->processor_upgrade = 0x01; /* Other */ t->l1_cache_handle = cpu_to_le16(0x); /* N/A */ @@ -1130,6 +1149,15 @@ void smbios_entry_add(QemuOpts *opts, Error **errp) save_opt(, opts, "serial"); save_opt(, opts, "asset"); save_opt(, opts, "part"); +type4.max_speed = qemu_opt_get_number(opts, "max-speed", + DEFAULT_CPU_SPEED); +type4.current_speed = qemu_opt_get_number(opts, "current-speed", + DEFAULT_CPU_SPEED); +if (type4.max_speed > UINT16_MAX || +type4.current_speed > UINT16_MAX) { +error_setg(errp, "SMBIOS CPU speed is too large (> %d)", + UINT16_MAX); +} return; case 11: qemu_opts_validate(opts, qemu_smbios_type11_opts, ); diff --git a/qemu-options.hx b/qemu-options.hx index ea0638e92d..50b068423c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2073,7 +2073,7 @@ DEF("smbios", HAS_ARG, QEMU_OPTION_smbios, " [,sku=str]\n" "specify SMBIOS type 3 fields\n" "-smbios type=4[,sock_pfx=str][,manufacturer=str][,version=str][,serial=str]\n" -" [,asset=str][,part=str]\n" +" [,asset=str][,part=str][,max-speed=%d][,current-speed=%d]\n" "specify SMBIOS type 4 fields\n" "-smbios type=17[,loc_pfx=str][,bank=str][,manufacturer=str][,serial=str]\n" " [,asset=str][,part=str][,speed=%d]\n" -- 2.23.0
[PATCH v5 0/2] add new options to set smbios type 4 fields
From: fangying Hi, this patchset was previously posted by my teamate Heyi Guo several months ago, however we missed the merge window. It is reposted here to make it an end. Thanks. Patch description: Common VM users sometimes care about CPU speed, so we add two new options to allow VM vendors to present CPU speed to their users. Normally these information can be fetched from host smbios. Strictly speaking, the "max speed" and "current speed" in type 4 are not really for the max speed and current speed of processor, for "max speed" identifies a capability of the system, and "current speed" identifies the processor's speed at boot (see smbios spec), but some applications do not tell the differences. Changelog: v4 -> v5: - Rebase patch for lastest upstream v3 -> v4: - Fix the default value when not specifying "-smbios type=4" option; it would be 0 instead of 2000 in previous versions - Use uint64_t type to check value overflow - Add test case to check smbios type 4 CPU speed - v4 https://patchwork.kernel.org/cover/11444635/ v2 -> v3: - Refine comments per Igor's suggestion. v1 -> v2: - change "_" in option names to "-" - check if option value is too large to fit in SMBIOS type 4 speed fields. Cc: "Michael S. Tsirkin" Cc: Igor Mammedov Ying Fang (2): hw/smbios: add options for type 4 max-speed and current-speed tests/bios-tables-test: add smbios cpu speed test hw/smbios/smbios.c | 36 ++ qemu-options.hx | 2 +- tests/bios-tables-test.c | 42 3 files changed, 75 insertions(+), 5 deletions(-) -- 2.23.0
[PATCH v5 2/2] tests/bios-tables-test: add smbios cpu speed test
Add smbios type 4 CPU speed check for we added new options to set smbios type 4 "max speed" and "current speed". The default value should be 2000 when no option is specified, just as the old version did. We add the test case to one machine of each architecture, though it doesn't really run on aarch64 platform for smbios test can't run on uefi only platform yet. Signed-off-by: Ying Fang Signed-off-by: Heyi Guo --- tests/bios-tables-test.c | 42 1 file changed, 42 insertions(+) diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index a356ac3489..6bd165021b 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -37,6 +37,8 @@ typedef struct { GArray *tables; uint32_t smbios_ep_addr; struct smbios_21_entry_point smbios_ep_table; +uint16_t smbios_cpu_max_speed; +uint16_t smbios_cpu_curr_speed; uint8_t *required_struct_types; int required_struct_types_len; QTestState *qts; @@ -516,6 +518,31 @@ static inline bool smbios_single_instance(uint8_t type) } } +static bool smbios_cpu_test(test_data *data, uint32_t addr) +{ +uint16_t expect_speed[2]; +uint16_t real; +int offset[2]; +int i; + +/* Check CPU speed for backward compatibility */ +offset[0] = offsetof(struct smbios_type_4, max_speed); +offset[1] = offsetof(struct smbios_type_4, current_speed); +expect_speed[0] = data->smbios_cpu_max_speed ? : 2000; +expect_speed[1] = data->smbios_cpu_curr_speed ? : 2000; + +for (i = 0; i < 2; i++) { +real = qtest_readw(data->qts, addr + offset[i]); +if (real != expect_speed[i]) { +fprintf(stderr, "Unexpected SMBIOS CPU speed: real %u expect %u\n", +real, expect_speed[i]); +return false; +} +} + +return true; +} + static void test_smbios_structs(test_data *data) { DECLARE_BITMAP(struct_bitmap, SMBIOS_MAX_TYPE+1) = { 0 }; @@ -538,6 +565,10 @@ static void test_smbios_structs(test_data *data) } set_bit(type, struct_bitmap); +if (type == 4) { +g_assert(smbios_cpu_test(data, addr)); +} + /* seek to end of unformatted string area of this struct ("\0\0") */ prv = crt = 1; while (prv || crt) { @@ -673,6 +704,11 @@ static void test_acpi_q35_tcg(void) data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types); test_acpi_one(NULL, ); free_test_data(); + +data.smbios_cpu_max_speed = 3000; +data.smbios_cpu_curr_speed = 2600; +test_acpi_one("-smbios type=4,max-speed=3000,current-speed=2600", ); +free_test_data(); } static void test_acpi_q35_tcg_bridge(void) @@ -885,6 +921,12 @@ static void test_acpi_virt_tcg(void) test_acpi_one("-cpu cortex-a57", ); free_test_data(); + +data.smbios_cpu_max_speed = 2900; +data.smbios_cpu_curr_speed = 2700; +test_acpi_one("-cpu cortex-a57 " + "-smbios type=4,max-speed=2900,current-speed=2700", ); +free_test_data(); } int main(int argc, char *argv[]) -- 2.23.0
[PATCH] qcow2: flush qcow2 l2 meta for new allocated clusters
From: fangying When qemu or qemu-nbd process uses a qcow2 image and configured with 'cache = none', it will write to the qcow2 image with a cache to cache L2 tables, however the process will not use L2 tables without explicitly calling the flush command or closing the mirror flash into the disk. Which may cause the disk data inconsistent with the written data for a long time. If an abnormal process exit occurs here, the issued written data will be lost. Therefore, in order to keep data consistency we need to flush the changes to the L2 entry to the disk in time for the newly allocated cluster. Signed-off-by: Ying Fang diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c index 7444b9c..ab6e812 100644 --- a/block/qcow2-cache.c +++ b/block/qcow2-cache.c @@ -266,6 +266,22 @@ int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c) return result; } +#define L2_ENTRIES_PER_SECTOR 64 +int qcow2_cache_l2_write_entry(BlockDriverState *bs, Qcow2Cache *c, + void *table, int index, int num) +{ +int ret; +int i = qcow2_cache_get_table_idx(c, table); +int start_sector = index / L2_ENTRIES_PER_SECTOR; +int end_sector = (index + num - 1) / L2_ENTRIES_PER_SECTOR; +int nr_sectors = end_sector - start_sector + 1; +ret = bdrv_pwrite(bs->file, + c->entries[i].offset + start_sector * BDRV_SECTOR_SIZE, + table + start_sector * BDRV_SECTOR_SIZE, + nr_sectors * BDRV_SECTOR_SIZE); +return ret; +} + int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c, Qcow2Cache *dependency) { diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index a677ba9..ae49a83 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -998,6 +998,9 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m) } +ret = qcow2_cache_l2_write_entry(bs, s->l2_table_cache, l2_slice, + l2_index, m->nb_clusters); + qcow2_cache_put(s->l2_table_cache, (void **) _slice); /* diff --git a/block/qcow2.h b/block/qcow2.h index 7ce2c23..168ab59 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -748,6 +748,8 @@ int qcow2_cache_destroy(Qcow2Cache *c); void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table); int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c); int qcow2_cache_write(BlockDriverState *bs, Qcow2Cache *c); +int qcow2_cache_l2_write_entry(BlockDriverState *bs, Qcow2Cache *c, + void *table, int index, int num); int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c, Qcow2Cache *dependency); void qcow2_cache_depends_on_flush(Qcow2Cache *c); -- 1.8.3.1
Re: [PATCH v3] target/arm/cpu: adjust virtual time for arm cpu
On 6/10/2020 3:40 PM, Andrew Jones wrote: On Wed, Jun 10, 2020 at 09:32:06AM +0800, Ying Fang wrote: On 6/8/2020 8:49 PM, Andrew Jones wrote: On Mon, Jun 08, 2020 at 08:12:43PM +0800, Ying Fang wrote: From: fangying Virtual time adjustment was implemented for virt-5.0 machine type, but the cpu property was enabled only for host-passthrough and max cpu model. Let's add it for arm cpu which has the generic timer feature enabled. Suggested-by: Andrew Jones This isn't true. I did suggest the way to arrange the code, after Peter suggested to move the kvm_arm_add_vcpu_properties() call to arm_cpu_post_init(), but I didn't suggest making this change in general, which is what this tag means. In fact, I've argued that it's pretty I'm quite sorry for adding it here. No problem. pointless to do this, since KVM users should be using '-cpu host' or '-cpu max' anyway. Since I don't need credit for the code arranging, As discussed in thread [1], there is a situation where a 'custom' cpu mode is needed for us to keep instruction set compatibility so that migration can be done, just like x86 does. I understand the motivation. But, as I've said, KVM doesn't work that way. And we are planning to add support for it if nobody is currently doing that. Great! I'm looking forward to seeing the KVM patches. Especially since, without the KVM patches, the 'custom' CPU model isn't a custom CPU model, it's just a misleading way to use host passthrough. Indeed, I'm a bit opposed to allowing anything other than '-cpu host' and '-cpu max' (with features explicitly enabled/disabled, e.g. -cpu host,pmu=off) to work until KVM actually works with CPU models. Otherwise, how do we know the difference between a model that actually works and one that is just misleadingly named? Yes you are right here. My colleague zhanghailiang and me are now working on it. We will post the patch set soon. Thanks, drew Thanks. Ying [1]: https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00022.html please just drop the tag. Peter can maybe do that on merge though. Also, despite not agreeing that we need this change today, as there's nothing wrong with it and it looks good to me Reviewed-by: Andrew Jones Thanks, drew Signed-off-by: Ying Fang --- v3: - set kvm-no-adjvtime property in kvm_arm_add_vcpu_properties v2: - move kvm_arm_add_vcpu_properties into arm_cpu_post_init v1: - initial commit - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg08518.html diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 32bec156f2..5b7a36b5d7 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -1245,6 +1245,10 @@ void arm_cpu_post_init(Object *obj) if (arm_feature(>env, ARM_FEATURE_GENERIC_TIMER)) { qdev_property_add_static(DEVICE(cpu), _cpu_gt_cntfrq_property); } + +if (kvm_enabled()) { +kvm_arm_add_vcpu_properties(obj); +} } static void arm_cpu_finalizefn(Object *obj) @@ -2029,7 +2033,6 @@ static void arm_max_initfn(Object *obj) if (kvm_enabled()) { kvm_arm_set_cpu_features_from_host(cpu); -kvm_arm_add_vcpu_properties(obj); } else { cortex_a15_initfn(obj); @@ -2183,7 +2186,6 @@ static void arm_host_initfn(Object *obj) if (arm_feature(>env, ARM_FEATURE_AARCH64)) { aarch64_add_sve_properties(obj); } -kvm_arm_add_vcpu_properties(obj); arm_cpu_post_init(obj); } diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index cbc5c3868f..778cecc2e6 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -592,7 +592,6 @@ static void aarch64_max_initfn(Object *obj) if (kvm_enabled()) { kvm_arm_set_cpu_features_from_host(cpu); -kvm_arm_add_vcpu_properties(obj); } else { uint64_t t; uint32_t u; diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 4bdbe6dcac..eef3bbd1cc 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -194,17 +194,18 @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp) /* KVM VCPU properties should be prefixed with "kvm-". */ void kvm_arm_add_vcpu_properties(Object *obj) { -if (!kvm_enabled()) { -return; -} +ARMCPU *cpu = ARM_CPU(obj); +CPUARMState *env = >env; -ARM_CPU(obj)->kvm_adjvtime = true; -object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get, - kvm_no_adjvtime_set); -object_property_set_description(obj, "kvm-no-adjvtime", -"Set on to disable the adjustment of " -"the virtual counter. VM stopped time " -"will be counted."); +if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) { +cpu->kvm_adjvtime = true; +object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get, +
Re: [PATCH v3] target/arm/cpu: adjust virtual time for arm cpu
On 6/8/2020 8:49 PM, Andrew Jones wrote: On Mon, Jun 08, 2020 at 08:12:43PM +0800, Ying Fang wrote: From: fangying Virtual time adjustment was implemented for virt-5.0 machine type, but the cpu property was enabled only for host-passthrough and max cpu model. Let's add it for arm cpu which has the generic timer feature enabled. Suggested-by: Andrew Jones This isn't true. I did suggest the way to arrange the code, after Peter suggested to move the kvm_arm_add_vcpu_properties() call to arm_cpu_post_init(), but I didn't suggest making this change in general, which is what this tag means. In fact, I've argued that it's pretty I'm quite sorry for adding it here. pointless to do this, since KVM users should be using '-cpu host' or '-cpu max' anyway. Since I don't need credit for the code arranging, As discussed in thread [1], there is a situation where a 'custom' cpu mode is needed for us to keep instruction set compatibility so that migration can be done, just like x86 does. And we are planning to add support for it if nobody is currently doing that. Thanks. Ying [1]: https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg00022.html please just drop the tag. Peter can maybe do that on merge though. Also, despite not agreeing that we need this change today, as there's nothing wrong with it and it looks good to me Reviewed-by: Andrew Jones Thanks, drew Signed-off-by: Ying Fang --- v3: - set kvm-no-adjvtime property in kvm_arm_add_vcpu_properties v2: - move kvm_arm_add_vcpu_properties into arm_cpu_post_init v1: - initial commit - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg08518.html diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 32bec156f2..5b7a36b5d7 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -1245,6 +1245,10 @@ void arm_cpu_post_init(Object *obj) if (arm_feature(>env, ARM_FEATURE_GENERIC_TIMER)) { qdev_property_add_static(DEVICE(cpu), _cpu_gt_cntfrq_property); } + +if (kvm_enabled()) { +kvm_arm_add_vcpu_properties(obj); +} } static void arm_cpu_finalizefn(Object *obj) @@ -2029,7 +2033,6 @@ static void arm_max_initfn(Object *obj) if (kvm_enabled()) { kvm_arm_set_cpu_features_from_host(cpu); -kvm_arm_add_vcpu_properties(obj); } else { cortex_a15_initfn(obj); @@ -2183,7 +2186,6 @@ static void arm_host_initfn(Object *obj) if (arm_feature(>env, ARM_FEATURE_AARCH64)) { aarch64_add_sve_properties(obj); } -kvm_arm_add_vcpu_properties(obj); arm_cpu_post_init(obj); } diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index cbc5c3868f..778cecc2e6 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -592,7 +592,6 @@ static void aarch64_max_initfn(Object *obj) if (kvm_enabled()) { kvm_arm_set_cpu_features_from_host(cpu); -kvm_arm_add_vcpu_properties(obj); } else { uint64_t t; uint32_t u; diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 4bdbe6dcac..eef3bbd1cc 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -194,17 +194,18 @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp) /* KVM VCPU properties should be prefixed with "kvm-". */ void kvm_arm_add_vcpu_properties(Object *obj) { -if (!kvm_enabled()) { -return; -} +ARMCPU *cpu = ARM_CPU(obj); +CPUARMState *env = >env; -ARM_CPU(obj)->kvm_adjvtime = true; -object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get, - kvm_no_adjvtime_set); -object_property_set_description(obj, "kvm-no-adjvtime", -"Set on to disable the adjustment of " -"the virtual counter. VM stopped time " -"will be counted."); +if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) { +cpu->kvm_adjvtime = true; +object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get, + kvm_no_adjvtime_set); +object_property_set_description(obj, "kvm-no-adjvtime", +"Set on to disable the adjustment of " +"the virtual counter. VM stopped time " +"will be counted."); +} } bool kvm_arm_pmu_supported(CPUState *cpu) -- 2.23.0 .
[PATCH v3] target/arm/cpu: adjust virtual time for arm cpu
From: fangying Virtual time adjustment was implemented for virt-5.0 machine type, but the cpu property was enabled only for host-passthrough and max cpu model. Let's add it for arm cpu which has the generic timer feature enabled. Suggested-by: Andrew Jones Signed-off-by: Ying Fang --- v3: - set kvm-no-adjvtime property in kvm_arm_add_vcpu_properties v2: - move kvm_arm_add_vcpu_properties into arm_cpu_post_init v1: - initial commit - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg08518.html diff --git a/target/arm/cpu.c b/target/arm/cpu.c index 32bec156f2..5b7a36b5d7 100644 --- a/target/arm/cpu.c +++ b/target/arm/cpu.c @@ -1245,6 +1245,10 @@ void arm_cpu_post_init(Object *obj) if (arm_feature(>env, ARM_FEATURE_GENERIC_TIMER)) { qdev_property_add_static(DEVICE(cpu), _cpu_gt_cntfrq_property); } + +if (kvm_enabled()) { +kvm_arm_add_vcpu_properties(obj); +} } static void arm_cpu_finalizefn(Object *obj) @@ -2029,7 +2033,6 @@ static void arm_max_initfn(Object *obj) if (kvm_enabled()) { kvm_arm_set_cpu_features_from_host(cpu); -kvm_arm_add_vcpu_properties(obj); } else { cortex_a15_initfn(obj); @@ -2183,7 +2186,6 @@ static void arm_host_initfn(Object *obj) if (arm_feature(>env, ARM_FEATURE_AARCH64)) { aarch64_add_sve_properties(obj); } -kvm_arm_add_vcpu_properties(obj); arm_cpu_post_init(obj); } diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index cbc5c3868f..778cecc2e6 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -592,7 +592,6 @@ static void aarch64_max_initfn(Object *obj) if (kvm_enabled()) { kvm_arm_set_cpu_features_from_host(cpu); -kvm_arm_add_vcpu_properties(obj); } else { uint64_t t; uint32_t u; diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 4bdbe6dcac..eef3bbd1cc 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -194,17 +194,18 @@ static void kvm_no_adjvtime_set(Object *obj, bool value, Error **errp) /* KVM VCPU properties should be prefixed with "kvm-". */ void kvm_arm_add_vcpu_properties(Object *obj) { -if (!kvm_enabled()) { -return; -} +ARMCPU *cpu = ARM_CPU(obj); +CPUARMState *env = >env; -ARM_CPU(obj)->kvm_adjvtime = true; -object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get, - kvm_no_adjvtime_set); -object_property_set_description(obj, "kvm-no-adjvtime", -"Set on to disable the adjustment of " -"the virtual counter. VM stopped time " -"will be counted."); +if (arm_feature(env, ARM_FEATURE_GENERIC_TIMER)) { +cpu->kvm_adjvtime = true; +object_property_add_bool(obj, "kvm-no-adjvtime", kvm_no_adjvtime_get, + kvm_no_adjvtime_set); +object_property_set_description(obj, "kvm-no-adjvtime", +"Set on to disable the adjustment of " +"the virtual counter. VM stopped time " +"will be counted."); +} } bool kvm_arm_pmu_supported(CPUState *cpu) -- 2.23.0
Re: Forward migration broken down since virt-4.2 machine type
ping On 6/4/2020 4:51 PM, Ying Fang wrote: Hi Richard, Recently we are doing some tests on forward migration based on arm virt machine. And we found the patch below breaks forward migration compatibility from virt-4.2 to virt-5.0 above machine type. The patch which breaks this down given by git bisect is commit f9506e162c33e87b609549157dd8431fcc732085 target/arm: Remove ARM_FEATURE_VFP* QEMU may get crashed on the destination host loading cpu state. Here goes my question since I am not familiar with the VFP feature. 1: Should we keep the forward migration compatibility here ? 2: If so how can we fixed it ? Below is the crash stack: Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault. [Switching to LWP 712330] armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) at qemu/hw/intc/armv7m_nvic.c:391 391 if (s->cpu->env.v7m.faultmask[secure]) { #0 armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) at qemu/hw/intc/armv7m_nvic.c:391 #1 0xaaae6f766510 in arm_v7m_mmu_idx_for_secstate_and_priv (env=0xaaae73456780, secstate=false, priv=true) at qemu/target/arm/m_helper.c:2711 #2 0xaaae6f7163f0 in arm_mmu_idx_el (env=env@entry=0xaaae73456780, el=el@entry=1) at qemu/target/arm/helper.c:12386 #3 0xaaae6f717000 in rebuild_hflags_internal (env=0xaaae73456780) at qemu/target/arm/helper.c:12611 #4 arm_rebuild_hflags (env=env@entry=0xaaae73456780) at qemu/target/arm/helper.c:12624 #5 0xaaae6f722940 in cpu_post_load (opaque=0xaaae7344ceb0, version_id=) at qemu/target/arm/machine.c:767 #6 0xaaae6f9e0e78 in vmstate_load_state (f=f@entry=0xaaae73020260, vmsd=0xaaae6fe93178 , opaque=0xaaae7344ceb0, version_id=22) at migration/vmstate.c:168 #7 0xaaae6f9d9858 in vmstate_load (f=f@entry=0xaaae73020260, se=se@entry=0xaaae7302f750) at migration/savevm.c:885 #8 0xaaae6f9dab90 in qemu_loadvm_section_start_full (f=f@entry=0xaaae73020260, mis=0xaaae72fb88a0) at migration/savevm.c:2302 #9 0xaaae6f9dd248 in qemu_loadvm_state_main (f=f@entry=0xaaae73020260, mis=mis@entry=0xaaae72fb88a0) at migration/savevm.c:2486 #10 0xaaae6f9de3bc in qemu_loadvm_state (f=0xaaae73020260) at migration/savevm.c:2560 #11 0xaaae6f9d489c in process_incoming_migration_co (opaque=) at migration/migration.c:461 #12 0xaaae6fb59850 in coroutine_trampoline (i0=, i1=) at util/coroutine-ucontext.c:115 #13 0xfffdd6c16030 in ?? () from target:/usr/lib64/libc.so.6 #0 armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) at qemu/hw/intc/armv7m_nvic.c:391 (gdb) p s $4 = (NVICState *) 0x0 Thanks. Ying
Re: About the kvm-no-adjvtime CPU property
On 6/3/2020 4:53 PM, Andrew Jones wrote: On Tue, Jun 02, 2020 at 03:47:22PM +0800, Ying Fang wrote: On 2020/6/1 20:29, Andrew Jones wrote: On Mon, Jun 01, 2020 at 08:07:31PM +0800, Ying Fang wrote: On 2020/6/1 16:07, Andrew Jones wrote: On Sat, May 30, 2020 at 04:56:26PM +0800, Ying Fang wrote: About the kvm-no-adjvtime CPU property Hi Andrew, To adjust virutal time, a new kvm cpu property kvm-no-adjvtime was introduced to 5.0 virt machine types. However the cpu property was enabled only for host-passthrough and max cpu model. As for other cpu model like cortex-a57, cortex-a53, cortex-a72, this kvm-adjvtime is not enabled by default, which means the virutal time can not be adjust for them. Here, for example, if VM is configured with kvm enabled: cortex-a72 We cannot adjust virtual time even if 5.0 virt machine is used. So i'd like to add it to other cpu model, do you have any suggestions here ? Hi Fang, The cpu feature only requires kvm. If a cpu model may be used with kvm, then the feature can be allowed to be used with the model. What I find interesting is that the cpu model is being used with kvm instead of 'host' or 'max'. Can you explain the reasons for that? Currently, when using yes,the cpu model is indeed used with kvm. There is a situation where the host cpu model is Cortex-A72 and a 'custom' cpu mode is used to keep insrtuction set compatible between the source and destination host machine when doing live migration. So the host physical machine cpu model is Cortex-A72 but host-passthrough model is mode used here. I mean host-passthrough model is 'not' used here. Sorry to make it confusing. I guessed as much. Are the source and destinations hosts used in the migration identical? If so, then the guest can use cpu 'host' and disable cpu features that should not be exposed (e.g. -cpu host,pmu=off). If the source and destination hosts are not identical, then I'm curious what those exact differences are. With the way AArch64 KVM works today, even using the Cortex-A72 cpu model should require identical hosts when migrating. Or, at least both hosts must be compatible with Cortex-A72 and any difference in ID registers must be somehow getting hidden from the guest. Yes, you are right. We have AAarch64 server with cpu based on Cortex-A72 and some extra instruction set added. Source host with cpu based on V1 and destination host with cpu based on V2 and they are compatible with Cortex-A72. We want to use a 'custom' cpu mode here to make it possible to do live migration between them. This is the scenario where the 'host' cpu model is not used since a 'custom' cpu model Cortex-A72 is used here . What you've described here is indeed the reason to use CPU models. I.e. enabling the migration from a host of one type to another by creating a guest that only enables the features contained in both hosts (as well as maintaining all state that describes the CPU type, e.g. MIDR). Unfortunately, unless your KVM has patches that aren't upstream, then that doesn't work on AArch64 KVM (more on that below). It may appear to be working for you, because your guest kernel and userspace don't mind the slight differences exposed to it between the hosts, or those differences are limited to explicitly disabled features. If that's the case, then I would guess that using '-cpu host' and disabling the same features would "work" as well. Yes, upstream KVM currently does not support it. We are planning to add support for the aarch64 platform since we have the situation where it is needed for our hardware. @Marc Zyngier, is there anyone who doing on this? Here's some more details on why the Cortex-A72 CPU model doesn't matter with upstream KVM. First, upstream AArch64 KVM doesn't support CPU models, and it doesn't even have a Cortex-A72 preferred target. For Cortex-A72 it will use "KVM_ARM_TARGET_GENERIC_V8", which is the same thing 'host' would do when running on a Cortex-A72. Second, if V2 of the Cortex-A72- based CPU you're using changed the revision of the MIDR, or any other state that gets passed directly through to the guest like the MIDR, then that state will change on migration. If a guest looks before migration and again after migration, then it could get confused. A guest kernel may only look once on boot and then not notice, but anything exposed to userspace is extra risky, as userspace may check more frequently. Yes, just as explained here. In short, without KVM patches that aren't upstream, then it's risky to migrate between machines with V1 and V2 of these CPUs. And, it doesn't help to use the Cortex-A72 CPU model. Thanks for your detailed introduction. Thanks, drew However the kvm-adjvtime feature is also need. So I think we should move kvm_arm_add_vcpu_properties to arm_cpu_post_init, instead of limited to 'host' and 'max' cpu model[1]. 1: https://lists.gnu.org/archive/html/qemu-de
Forward migration broken down since virt-4.2 machine type
Hi Richard, Recently we are doing some tests on forward migration based on arm virt machine. And we found the patch below breaks forward migration compatibility from virt-4.2 to virt-5.0 above machine type. The patch which breaks this down given by git bisect is commit f9506e162c33e87b609549157dd8431fcc732085 target/arm: Remove ARM_FEATURE_VFP* QEMU may get crashed on the destination host loading cpu state. Here goes my question since I am not familiar with the VFP feature. 1: Should we keep the forward migration compatibility here ? 2: If so how can we fixed it ? Below is the crash stack: Thread 1 "qemu-system-aar" received signal SIGSEGV, Segmentation fault. [Switching to LWP 712330] armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) at qemu/hw/intc/armv7m_nvic.c:391 391 if (s->cpu->env.v7m.faultmask[secure]) { #0 armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) at qemu/hw/intc/armv7m_nvic.c:391 #1 0xaaae6f766510 in arm_v7m_mmu_idx_for_secstate_and_priv (env=0xaaae73456780, secstate=false, priv=true) at qemu/target/arm/m_helper.c:2711 #2 0xaaae6f7163f0 in arm_mmu_idx_el (env=env@entry=0xaaae73456780, el=el@entry=1) at qemu/target/arm/helper.c:12386 #3 0xaaae6f717000 in rebuild_hflags_internal (env=0xaaae73456780) at qemu/target/arm/helper.c:12611 #4 arm_rebuild_hflags (env=env@entry=0xaaae73456780) at qemu/target/arm/helper.c:12624 #5 0xaaae6f722940 in cpu_post_load (opaque=0xaaae7344ceb0, version_id=) at qemu/target/arm/machine.c:767 #6 0xaaae6f9e0e78 in vmstate_load_state (f=f@entry=0xaaae73020260, vmsd=0xaaae6fe93178 , opaque=0xaaae7344ceb0, version_id=22) at migration/vmstate.c:168 #7 0xaaae6f9d9858 in vmstate_load (f=f@entry=0xaaae73020260, se=se@entry=0xaaae7302f750) at migration/savevm.c:885 #8 0xaaae6f9dab90 in qemu_loadvm_section_start_full (f=f@entry=0xaaae73020260, mis=0xaaae72fb88a0) at migration/savevm.c:2302 #9 0xaaae6f9dd248 in qemu_loadvm_state_main (f=f@entry=0xaaae73020260, mis=mis@entry=0xaaae72fb88a0) at migration/savevm.c:2486 #10 0xaaae6f9de3bc in qemu_loadvm_state (f=0xaaae73020260) at migration/savevm.c:2560 #11 0xaaae6f9d489c in process_incoming_migration_co (opaque=) at migration/migration.c:461 #12 0xaaae6fb59850 in coroutine_trampoline (i0=, i1=) at util/coroutine-ucontext.c:115 #13 0xfffdd6c16030 in ?? () from target:/usr/lib64/libc.so.6 #0 armv7m_nvic_neg_prio_requested (opaque=0x0, secure=secure@entry=false) at qemu/hw/intc/armv7m_nvic.c:391 (gdb) p s $4 = (NVICState *) 0x0 Thanks. Ying