Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package

2024-09-25 Thread Zhao Liu
On Fri, Sep 20, 2024 at 01:08:27PM +0200, Igor Mammedov wrote:
> Date: Fri, 20 Sep 2024 13:08:27 +0200
> From: Igor Mammedov 
> Subject: Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for
>  logical processors in the physical package
> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu)
> 
> On Fri, 20 Sep 2024 02:29:46 +0800
> Zhao Liu  wrote:
> 
> > Hi Chuang and Igor,
> > 
> > Sorry for late reply,
> > 
> > On Wed, Sep 18, 2024 at 09:18:15PM +0800, Chuang Xu wrote:
> > > Date: Wed, 18 Sep 2024 21:18:15 +0800
> > > From: Chuang Xu 
> > > Subject: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical
> > >  processors in the physical package
> > > X-Mailer: git-send-email 2.24.3 (Apple Git-128)
> > > 
> > > When QEMU is started with:
> > > -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> > > -smp 180,sockets=2,dies=1,cores=45,threads=2
> > > 
> > > Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> > > CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> > > execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> > > CPUID.04H.EAX[31:26] as expected.
> > > 
> > > As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> > > we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> > > integer too. Otherwise we may encounter unexpected results in guest.
> > > 
> > > For example, when QEMU is started with CLI above and xtopology is 
> > > disabled,
> > > guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) 
> > > to
> > > calculate threads-per-core in detect_ht(). Then guest will get 
> > > "90/(1+63)=1"
> > > as the result, even though theads-per-core should actually be 2.
> > > 
> > > So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> > > to solve the unexpected result.
> > > 
> > > Signed-off-by: Guixiong Wei 
> > > Signed-off-by: Yipeng Yin 
> > > Signed-off-by: Chuang Xu 
> > > ---
> > >  target/i386/cpu.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 4c2e6f3a71..3710ae5283 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t 
> > > index, uint32_t count,
> > >  }
> > >  *edx = env->features[FEAT_1_EDX];
> > >  if (threads_per_pkg > 1) {
> > > -*ebx |= threads_per_pkg << 16;
> > > +*ebx |= pow2ceil(threads_per_pkg) << 16;  
> > 
> > Yes, the fix is right.
> > 
> > About the "Maximum number of addressable IDs", the commit 88dd4ca06c83
> > ("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]")
> > introduced the new way to calculate.
> > 
> > The pow2ceil() works for current SMP topology, but may be wrong on
> > hybrid topology, as the reason I listed in the commit message:
> > 
> > > The nearest power-of-2 integer can be calculated by pow2ceil() or by
> > > using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).  
> > 
> > > But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> > > are associated with APIC ID. For example, in linux kernel, the field
> > > "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
> > > another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> > > matched with actual core numbers and it's calculated by:
> > > "(1 << (pkg_offset - core_offset)) - 1".  
> > 
> > Using APIC ID offset to calculate is the hardware's approach, so I tried
> > to use APIC ID instead of pow2ceil() and replaced all pow2ceil() case.
> 
> Well, hybrid case needs some more explanation then.
> 
> 'pow2ceil(threads_per_pkg) << 16' - does exactly what SDM says for 
> CPUID.01H.EBX[23:16]
> 
> Can you point to a spec that confirms that above is wrong and
> explain in more details how hybrid case is supposed to work
> and where it's documented?
>  

This is mainly about the meaning of "addressable ID". There's a spec
"Intel 64 Architecture Processor Topology Enumeration" [1].

In the section 1.5.3, it mentions the &q

Re: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical processors in the physical package

2024-09-19 Thread Zhao Liu
Hi Chuang and Igor,

Sorry for late reply,

On Wed, Sep 18, 2024 at 09:18:15PM +0800, Chuang Xu wrote:
> Date: Wed, 18 Sep 2024 21:18:15 +0800
> From: Chuang Xu 
> Subject: [PATCH v3] i386/cpu: fixup number of addressable IDs for logical
>  processors in the physical package
> X-Mailer: git-send-email 2.24.3 (Apple Git-128)
> 
> When QEMU is started with:
> -cpu host,migratable=on,host-cache-info=on,l3-cache=off
> -smp 180,sockets=2,dies=1,cores=45,threads=2
> 
> Try to execute "cpuid -1 -l 1 -r" in guest, we'll obtain a value of 90 for
> CPUID.01H.EBX[23:16], while the expected value is 128. And Try to
> execute "cpuid -1 -l 4 -r" in guest, we'll obtain a value of 63 for
> CPUID.04H.EAX[31:26] as expected.
> 
> As (1+CPUID.04H.EAX[31:26]) round up to the nearest power-of-2 integer,
> we'd beter round up CPUID.01H.EBX[23:16] to the nearest power-of-2
> integer too. Otherwise we may encounter unexpected results in guest.
> 
> For example, when QEMU is started with CLI above and xtopology is disabled,
> guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to
> calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1"
> as the result, even though theads-per-core should actually be 2.
> 
> So let us round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer
> to solve the unexpected result.
> 
> Signed-off-by: Guixiong Wei 
> Signed-off-by: Yipeng Yin 
> Signed-off-by: Chuang Xu 
> ---
>  target/i386/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 4c2e6f3a71..3710ae5283 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6417,7 +6417,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  }
>  *edx = env->features[FEAT_1_EDX];
>  if (threads_per_pkg > 1) {
> -*ebx |= threads_per_pkg << 16;
> +*ebx |= pow2ceil(threads_per_pkg) << 16;

Yes, the fix is right.

About the "Maximum number of addressable IDs", the commit 88dd4ca06c83
("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]")
introduced the new way to calculate.

The pow2ceil() works for current SMP topology, but may be wrong on
hybrid topology, as the reason I listed in the commit message:

> The nearest power-of-2 integer can be calculated by pow2ceil() or by
> using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).

> But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
> are associated with APIC ID. For example, in linux kernel, the field
> "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
> another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
> matched with actual core numbers and it's calculated by:
> "(1 << (pkg_offset - core_offset)) - 1".

Using APIC ID offset to calculate is the hardware's approach, so I tried
to use APIC ID instead of pow2ceil() and replaced all pow2ceil() case.

Hi Igor, do you agree? :-)

Best Regards,
Zhao




[RFC v2 05/12] hw/core/machine: Introduce custom CPU topology with max limitations

2024-09-18 Thread Zhao Liu
Custom topology allows user to create CPU topology totally via -device
from CLI.

Once custom topology is enabled, machine will stop the default CPU
creation and expect user's CPU topology tree built from CLI.

With custom topology, any CPU topology, whether symmetric or hybrid
(aka, heterogeneous), can be created naturally.

However, custom topology also needs to be restricted because
possible_cpus[] requires some preliminary topology information for
initialization, which is the max limitation (the new max parameters in
-smp). Custom topology will be subject to this max limitation.

Max limitations are necessary because creating custom topology before
initializing possible_cpus[] would compromise future hotplug scalability.

Max limitations are placed in -smp, even though custom topology can be
defined as hybrid. From an implementation perspective, any hybrid
topology can be considered a subset of a complete SMP structure.
Therefore, semantically, using max limitations to constrain hybrid
topology is consistent.

Introduce custom CPU topology related properties in MachineClass. At the
same time, add and parse max parameters from -smp, and store the max
limitations in CPUSlot.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS   |   1 +
 hw/core/machine-smp.c |   2 +
 hw/core/machine.c |  33 +++
 hw/core/meson.build   |   2 +-
 hw/cpu/cpu-slot.c | 118 ++
 include/hw/boards.h   |   2 +
 include/hw/cpu/cpu-slot.h |   9 +++
 qapi/machine.json |  22 ++-
 stubs/machine-stubs.c |  21 +++
 stubs/meson.build |   1 +
 10 files changed, 209 insertions(+), 2 deletions(-)
 create mode 100644 stubs/machine-stubs.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4608c3c6db8c..5ea739f12857 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1901,6 +1901,7 @@ F: include/hw/cpu/die.h
 F: include/hw/cpu/module.h
 F: include/hw/cpu/socket.h
 F: include/sysemu/numa.h
+F: stubs/machine-stubs.c
 F: tests/functional/test_cpu_queries.py
 F: tests/functional/test_empty_cpu_model.py
 F: tests/unit/test-smp-parse.c
diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 9a281946762f..d3be4352267d 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -259,6 +259,8 @@ void machine_parse_smp_config(MachineState *ms,
mc->name, mc->max_cpus);
 return;
 }
+
+machine_parse_custom_topo_config(ms, config, errp);
 }
 
 static bool machine_check_topo_support(MachineState *ms,
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7b4ac5ac52b2..dedabd75c825 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -966,6 +966,30 @@ static void machine_set_smp_cache(Object *obj, Visitor *v, 
const char *name,
 qapi_free_SmpCachePropertiesList(caches);
 }
 
+static bool machine_get_custom_topo(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+if (!ms->topo) {
+error_setg(errp, "machine doesn't support custom topology");
+return false;
+}
+
+return ms->topo->custom_topo_enabled;
+}
+
+static void machine_set_custom_topo(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+if (!ms->topo) {
+error_setg(errp, "machine doesn't support custom topology");
+return;
+}
+
+ms->topo->custom_topo_enabled = value;
+}
+
 static void machine_get_boot(Object *obj, Visitor *v, const char *name,
 void *opaque, Error **errp)
 {
@@ -1240,6 +1264,15 @@ static void machine_initfn(Object *obj)
 }
 
 ms->topo = NULL;
+if (mc->smp_props.topo_tree_supported &&
+mc->smp_props.custom_topo_supported) {
+object_property_add_bool(obj, "custom-topo",
+ machine_get_custom_topo,
+ machine_set_custom_topo);
+object_property_set_description(obj, "custom-topo",
+"Set on/off to enable/disable "
+"user custom CPU topology tree");
+}
 
 machine_copy_boot_config(ms, &(BootConfiguration){ 0 });
 }
diff --git a/hw/core/meson.build b/hw/core/meson.build
index a3d9bab9f42a..f70d6104a00d 100644
--- a/hw/core/meson.build
+++ b/hw/core/meson.build
@@ -13,7 +13,6 @@ hwcore_ss.add(files(
 ))
 
 common_ss.add(files('cpu-common.c'))
-common_ss.add(files('machine-smp.c'))
 system_ss.add(when: 'CONFIG_FITLOADER', if_true: files('loader-fit.c'))
 system_ss.add(when: 'CONFIG_GENERIC_LOADER', if_true: 
files('generic-loader.c'))
 system_ss.add(when: 'CONFIG_GUEST_LOADER', if_true: files('guest-loader.c'))
@@ -33,6 +32,7 @@ system_ss.add(files(
   'loader.c',
   'machine-hmp-cmds.c',
   'machine-qmp-c

[RFC v2 06/12] hw/cpu: Constrain CPU topology tree with max_limit

2024-09-18 Thread Zhao Liu
Apply max_limit to CPU topology and prevent the number of topology
devices from exceeding the max limitation configured by user.

Additionally, ensure that CPUs created from the CLI via custom topology
meet at least the requirements of smp.cpus. This guarantees that custom
topology will always have CPUs.

Signed-off-by: Zhao Liu 
---
 hw/core/machine.c |  4 
 hw/cpu/cpu-slot.c | 32 
 include/hw/cpu/cpu-slot.h |  1 +
 include/hw/qdev-core.h|  5 +
 4 files changed, 42 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index dedabd75c825..54fca9eb7265 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1684,6 +1684,10 @@ void machine_run_board_post_init(MachineState *machine, 
Error **errp)
 {
 MachineClass *machine_class = MACHINE_GET_CLASS(machine);
 
+if (!machine_validate_topo_tree(machine, errp)) {
+return;
+}
+
 if (machine_class->post_init) {
 machine_class->post_init(machine);
 }
diff --git a/hw/cpu/cpu-slot.c b/hw/cpu/cpu-slot.c
index 2d16a2729501..f2b9c412926f 100644
--- a/hw/cpu/cpu-slot.c
+++ b/hw/cpu/cpu-slot.c
@@ -47,6 +47,7 @@ static void cpu_slot_device_realize(DeviceListener *listener,
 {
 CPUSlot *slot = container_of(listener, CPUSlot, listener);
 CPUTopoState *topo;
+int max_children;
 
 if (!object_dynamic_cast(OBJECT(dev), TYPE_CPU_TOPO)) {
 return;
@@ -54,6 +55,13 @@ static void cpu_slot_device_realize(DeviceListener *listener,
 
 topo = CPU_TOPO(dev);
 cpu_slot_add_topo_info(slot, topo);
+
+if (dev->parent_bus) {
+max_children = slot->stat.entries[GET_CPU_TOPO_LEVEL(topo)].max_limit;
+if (dev->parent_bus->num_children == max_children) {
+qbus_mark_full(dev->parent_bus);
+}
+}
 }
 
 static void cpu_slot_del_topo_info(CPUSlot *slot, CPUTopoState *topo)
@@ -79,6 +87,10 @@ static void cpu_slot_device_unrealize(DeviceListener 
*listener,
 
 topo = CPU_TOPO(dev);
 cpu_slot_del_topo_info(slot, topo);
+
+if (dev->parent_bus) {
+qbus_mask_full(dev->parent_bus);
+}
 }
 
 DeviceListener cpu_slot_device_listener = {
@@ -443,3 +455,23 @@ bool machine_parse_custom_topo_config(MachineState *ms,
 
 return true;
 }
+
+bool machine_validate_topo_tree(MachineState *ms, Error **errp)
+{
+int cpus;
+
+if (!ms->topo || !ms->topo->custom_topo_enabled) {
+return true;
+}
+
+cpus = ms->topo->stat.entries[CPU_TOPOLOGY_LEVEL_THREAD].total_instances;
+if (cpus < ms->smp.cpus) {
+error_setg(errp, "machine requires at least %d online CPUs, "
+   "but currently only %d CPUs",
+   ms->smp.cpus, cpus);
+return false;
+}
+
+/* TODO: Add checks for other levels to honor more -smp parameters. */
+return true;
+}
diff --git a/include/hw/cpu/cpu-slot.h b/include/hw/cpu/cpu-slot.h
index 8d7e35aa1851..f56a0b08dca4 100644
--- a/include/hw/cpu/cpu-slot.h
+++ b/include/hw/cpu/cpu-slot.h
@@ -84,5 +84,6 @@ int get_max_topo_by_level(const MachineState *ms, 
CpuTopologyLevel level);
 bool machine_parse_custom_topo_config(MachineState *ms,
   const SMPConfiguration *config,
   Error **errp);
+bool machine_validate_topo_tree(MachineState *ms, Error **errp);
 
 #endif /* CPU_SLOT_H */
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index ddcaa329e3ec..3f2117e08774 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1063,6 +1063,11 @@ static inline void qbus_mark_full(BusState *bus)
 bus->full = true;
 }
 
+static inline void qbus_mask_full(BusState *bus)
+{
+bus->full = false;
+}
+
 void device_listener_register(DeviceListener *listener);
 void device_listener_unregister(DeviceListener *listener);
 
-- 
2.34.1




[RFC v2 03/12] system/vl: Create CPU topology devices from CLI early

2024-09-18 Thread Zhao Liu
Custom topology will allow user to build CPU topology from CLI totally,
and this replaces machine's default CPU creation process (*_init_cpus()
in MachineClass.init()).

For the machine's initialization, there may be CPU dependencies in the
remaining initialization after the CPU creation.

To address such dependencies, create the CPU topology device (including
CPU devices) from the CLI earlier, so that the latter part of machine
initialization can be separated after qemu_add_cli_devices_early().

Signed-off-by: Zhao Liu 
---
 system/vl.c | 55 +++--
 1 file changed, 36 insertions(+), 19 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index c40364e2f091..8540454aa1c2 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1211,8 +1211,9 @@ static int device_help_func(void *opaque, QemuOpts *opts, 
Error **errp)
 static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
 DeviceState *dev;
+long *category = opaque;
 
-dev = qdev_device_add(opts, NULL, errp);
+dev = qdev_device_add(opts, category, errp);
 if (!dev && *errp) {
 error_report_err(*errp);
 return -1;
@@ -2623,6 +2624,36 @@ static void qemu_init_displays(void)
 }
 }
 
+static void qemu_add_devices(long *category)
+{
+DeviceOption *opt;
+
+qemu_opts_foreach(qemu_find_opts("device"),
+  device_init_func, category, &error_fatal);
+QTAILQ_FOREACH(opt, &device_opts, next) {
+DeviceState *dev;
+loc_push_restore(&opt->loc);
+/*
+ * TODO Eventually we should call qmp_device_add() here to make sure it
+ * behaves the same, but QMP still has to accept incorrectly typed
+ * options until libvirt is fixed and we want to be strict on the CLI
+ * from the start, so call qdev_device_add_from_qdict() directly for
+ * now.
+ */
+dev = qdev_device_add_from_qdict(opt->opts, category,
+ true, &error_fatal);
+object_unref(OBJECT(dev));
+loc_pop(&opt->loc);
+}
+}
+
+static void qemu_add_cli_devices_early(void)
+{
+long category = DEVICE_CATEGORY_CPU_DEF;
+
+qemu_add_devices(&category);
+}
+
 static void qemu_init_board(void)
 {
 /* process plugin before CPUs are created, but once -smp has been parsed */
@@ -2631,6 +2662,9 @@ static void qemu_init_board(void)
 /* From here on we enter MACHINE_PHASE_INITIALIZED.  */
 machine_run_board_init(current_machine, mem_path, &error_fatal);
 
+/* Create CPU topology device if any. */
+qemu_add_cli_devices_early();
+
 drive_check_orphaned();
 
 realtime_init();
@@ -2638,8 +2672,6 @@ static void qemu_init_board(void)
 
 static void qemu_create_cli_devices(void)
 {
-DeviceOption *opt;
-
 soundhw_init();
 
 qemu_opts_foreach(qemu_find_opts("fw_cfg"),
@@ -2653,22 +2685,7 @@ static void qemu_create_cli_devices(void)
 
 /* init generic devices */
 rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
-qemu_opts_foreach(qemu_find_opts("device"),
-  device_init_func, NULL, &error_fatal);
-QTAILQ_FOREACH(opt, &device_opts, next) {
-DeviceState *dev;
-loc_push_restore(&opt->loc);
-/*
- * TODO Eventually we should call qmp_device_add() here to make sure it
- * behaves the same, but QMP still has to accept incorrectly typed
- * options until libvirt is fixed and we want to be strict on the CLI
- * from the start, so call qdev_device_add_from_qdict() directly for
- * now.
- */
-dev = qdev_device_add_from_qdict(opt->opts, NULL, true, &error_fatal);
-object_unref(OBJECT(dev));
-loc_pop(&opt->loc);
-}
+qemu_add_devices(NULL);
 rom_reset_order_override();
 }
 
-- 
2.34.1




[RFC v2 08/12] hw/i386: Use get_max_topo_by_level() to get topology information

2024-09-18 Thread Zhao Liu
To honor the custom topology case and generate correct APIC ID for
hybrid CPU topology, Use get_max_topo_by_level() to get topology
information instead of accessing MachineState.smp directly.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86-common.c | 19 +--
 hw/i386/x86.c| 20 +---
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index 75d4b2f3d43a..58591e015569 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -202,11 +202,15 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 
 static void x86_fixup_topo_ids(MachineState *ms, X86CPU *cpu)
 {
+int max_modules, max_dies;
+
+max_modules = get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_MODULE);
+max_dies = get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_DIE);
 /*
  * die-id was optional in QEMU 4.0 and older, so keep it optional
  * if there's only one die per socket.
  */
-if (cpu->module_id < 0 && ms->smp.modules == 1) {
+if (cpu->module_id < 0 && max_modules == 1) {
 cpu->module_id = 0;
 }
 
@@ -214,7 +218,7 @@ static void x86_fixup_topo_ids(MachineState *ms, X86CPU 
*cpu)
  * module-id was optional in QEMU 9.0 and older, so keep it optional
  * if there's only one module per die.
  */
-if (cpu->die_id < 0 && ms->smp.dies == 1) {
+if (cpu->die_id < 0 && max_dies == 1) {
 cpu->die_id = 0;
 }
 }
@@ -393,6 +397,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 MachineState *ms = MACHINE(hotplug_dev);
 X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
 X86CPUTopoInfo topo_info;
+int max_modules, max_dies;
 
 if (!object_dynamic_cast(OBJECT(cpu), ms->cpu_type)) {
 error_setg(errp, "Invalid CPU type, expected cpu type: '%s'",
@@ -413,13 +418,15 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 init_topo_info(&topo_info, x86ms);
 
-if (ms->smp.modules > 1) {
-env->nr_modules = ms->smp.modules;
+max_modules = get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_MODULE);
+if (max_modules > 1) {
+env->nr_modules = max_modules;
 set_bit(CPU_TOPOLOGY_LEVEL_MODULE, env->avail_cpu_topo);
 }
 
-if (ms->smp.dies > 1) {
-env->nr_dies = ms->smp.dies;
+max_dies = get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_DIE);
+if (max_dies > 1) {
+env->nr_dies = max_dies;
 set_bit(CPU_TOPOLOGY_LEVEL_DIE, env->avail_cpu_topo);
 }
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index cdf7b81ad0e3..55904b545d84 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -44,16 +44,20 @@ void init_topo_info(X86CPUTopoInfo *topo_info,
 {
 MachineState *ms = MACHINE(x86ms);
 
-topo_info->dies_per_pkg = ms->smp.dies;
+topo_info->dies_per_pkg =
+get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_DIE);
 /*
  * Though smp.modules means the number of modules in one cluster,
  * i386 doesn't support cluster level so that the smp.clusters
  * always defaults to 1, therefore using smp.modules directly is
  * fine here.
  */
-topo_info->modules_per_die = ms->smp.modules;
-topo_info->cores_per_module = ms->smp.cores;
-topo_info->threads_per_core = ms->smp.threads;
+topo_info->modules_per_die =
+get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_MODULE);
+topo_info->cores_per_module =
+get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_CORE);
+topo_info->threads_per_core =
+get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_THREAD);
 }
 
 /*
@@ -103,7 +107,7 @@ static const CPUArchIdList 
*x86_possible_cpu_arch_ids(MachineState *ms)
 X86MachineState *x86ms = X86_MACHINE(ms);
 unsigned int max_cpus = ms->smp.max_cpus;
 X86CPUTopoInfo topo_info;
-int i;
+int i, max_dies, max_modules;
 
 if (ms->possible_cpus) {
 /*
@@ -120,6 +124,8 @@ static const CPUArchIdList 
*x86_possible_cpu_arch_ids(MachineState *ms)
 
 init_topo_info(&topo_info, x86ms);
 
+max_dies = get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_DIE);
+max_modules = get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_MODULE);
 for (i = 0; i < ms->possible_cpus->len; i++) {
 X86CPUTopoIDs topo_ids;
 
@@ -131,11 +137,11 @@ static const CPUArchIdList 
*x86_possible_cpu_arch_ids(MachineState *ms)
  &topo_info, &topo_ids);
 ms->possible_cpus->cpus[i].props.has_socket_id = true;
 ms->possible_cpus->cpus[i].props.socket_id = topo_ids.pkg_id;
-if (ms->smp.dies > 1) {
+if (max_dies > 1) {
 ms->possible_cpus->cpus[i].props.has_die_id = true;
 ms->possible_cpus->cpus[i].props.die_id = topo_ids.die_id;
 }
-if (ms->smp.modules > 1) {
+if (max_modules > 1) {
 ms->possible_cpus->cpus[i].props.has_module_id = true;
 ms->possible_cpus->cpus[i].props.module_id = topo_ids.module_id;
 }
-- 
2.34.1




[RFC v2 09/12] i386: Introduce x86 CPU core abstractions

2024-09-18 Thread Zhao Liu
Abstract 3 core types for i386: common core, Intel Core (P-core) and
Intel atom (E-core). This is in preparation for creating the hybrid
topology from the CLI.

Signed-off-by: Zhao Liu 
---
 target/i386/core.c  | 56 +
 target/i386/core.h  | 53 ++
 target/i386/meson.build |  1 +
 3 files changed, 110 insertions(+)
 create mode 100644 target/i386/core.c
 create mode 100644 target/i386/core.h

diff --git a/target/i386/core.c b/target/i386/core.c
new file mode 100644
index ..d76186a6a070
--- /dev/null
+++ b/target/i386/core.c
@@ -0,0 +1,56 @@
+/*
+ * x86 CPU core
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "core.h"
+
+static void x86_common_core_class_init(ObjectClass *oc, void *data)
+{
+X86CPUCoreClass *cc = X86_CPU_CORE_CLASS(oc);
+
+cc->core_type = COMMON_CORE;
+}
+
+static void x86_intel_atom_class_init(ObjectClass *oc, void *data)
+{
+X86CPUCoreClass *cc = X86_CPU_CORE_CLASS(oc);
+
+cc->core_type = INTEL_ATOM;
+}
+
+static void x86_intel_core_class_init(ObjectClass *oc, void *data)
+{
+X86CPUCoreClass *cc = X86_CPU_CORE_CLASS(oc);
+
+cc->core_type = INTEL_CORE;
+}
+
+static const TypeInfo x86_cpu_core_infos[] = {
+{
+.name = TYPE_X86_CPU_CORE,
+.parent = TYPE_CPU_CORE,
+.class_size = sizeof(X86CPUCoreClass),
+.class_init = x86_common_core_class_init,
+.instance_size = sizeof(X86CPUCore),
+},
+{
+.parent = TYPE_X86_CPU_CORE,
+.name = X86_CPU_CORE_TYPE_NAME("intel-atom"),
+.class_init = x86_intel_atom_class_init,
+},
+{
+.parent = TYPE_X86_CPU_CORE,
+.name = X86_CPU_CORE_TYPE_NAME("intel-core"),
+.class_init = x86_intel_core_class_init,
+},
+};
+
+DEFINE_TYPES(x86_cpu_core_infos)
diff --git a/target/i386/core.h b/target/i386/core.h
new file mode 100644
index ..b942153b2c0d
--- /dev/null
+++ b/target/i386/core.h
@@ -0,0 +1,53 @@
+/*
+ * x86 CPU core header
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "hw/cpu/core.h"
+#include "hw/cpu/cpu-topology.h"
+#include "qom/object.h"
+
+#ifndef I386_CORE_H
+#define I386_CORE_H
+
+#ifdef TARGET_X86_64
+#define TYPE_X86_PREFIX "x86-"
+#else
+#define TYPE_X86_PREFIX "i386-"
+#endif
+
+#define TYPE_X86_CPU_CORE TYPE_X86_PREFIX "core"
+
+OBJECT_DECLARE_TYPE(X86CPUCore, X86CPUCoreClass, X86_CPU_CORE)
+
+typedef enum {
+COMMON_CORE = 0,
+INTEL_ATOM,
+INTEL_CORE,
+} X86CoreType;
+
+struct X86CPUCoreClass {
+/*< private >*/
+CPUTopoClass parent_class;
+
+/*< public >*/
+DeviceRealize parent_realize;
+X86CoreType core_type;
+};
+
+struct X86CPUCore {
+/*< private >*/
+CPUCore parent_obj;
+
+/*< public >*/
+};
+
+#define X86_CPU_CORE_TYPE_NAME(core_type_str) (TYPE_X86_PREFIX core_type_str)
+
+#endif /* I386_CORE_H */
diff --git a/target/i386/meson.build b/target/i386/meson.build
index 075117989b9d..80a32526d98b 100644
--- a/target/i386/meson.build
+++ b/target/i386/meson.build
@@ -18,6 +18,7 @@ i386_system_ss.add(files(
   'arch_memory_mapping.c',
   'machine.c',
   'monitor.c',
+  'core.c',
   'cpu-apic.c',
   'cpu-sysemu.c',
 ))
-- 
2.34.1




[RFC v2 12/12] i386: Support custom topology for microvm, pc-i440fx and pc-q35

2024-09-18 Thread Zhao Liu
With custom topology enabling, user could configure hyrid CPU topology
from CLI.

For example, create a Intel Core (P core) with 2 threads and 2 Intel
Atom (E core) with single thread for PC machine:

-smp maxsockets=1,maxdies=1,maxmodules=2,maxcores=2,maxthreads=2
-machine pc,custom-topo=on \
-device cpu-socket,id=sock0 \
-device cpu-die,id=die0,bus=sock0 \
-device cpu-module,id=mod0,bus=die0 \
-device cpu-module,id=mod1,bus=die0 \
-device x86-intel-core,id=core0,bus=mod0 \
-device x86-intel-atom,id=core1,bus=mod1 \
-device x86-intel-atom,id=core2,bus=mod1 \
-device 
host-x86_64-cpu,id=cpu0,socket-id=0,die-id=0,module-id=0,core-id=0,thread-id=0 \
-device 
host-x86_64-cpu,id=cpu1,socket-id=0,die-id=0,module-id=0,core-id=0,thread-id=1 \
-device 
host-x86_64-cpu,id=cpu2,socket-id=0,die-id=0,module-id=1,core-id=0,thread-id=0 \
-device 
host-x86_64-cpu,id=cpu3,socket-id=0,die-id=0,module-id=1,core-id=1,thread-id=0

Signed-off-by: Zhao Liu 
---
 hw/i386/microvm.c| 1 +
 hw/i386/pc_piix.c| 1 +
 hw/i386/pc_q35.c | 1 +
 hw/i386/x86-common.c | 6 ++
 4 files changed, 9 insertions(+)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index dc9b21a34230..bd03b6946e6c 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -671,6 +671,7 @@ static void microvm_class_init(ObjectClass *oc, void *data)
 mc->reset = microvm_machine_reset;
 
 mc->post_init = microvm_machine_state_post_init;
+mc->smp_props.custom_topo_supported = true;
 
 /* hotplug (for cpu coldplug) */
 mc->get_hotplug_handler = microvm_get_hotplug_handler;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index c1db2f3129cf..9c696a226858 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -473,6 +473,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 m->no_floppy = !module_object_class_by_name(TYPE_ISA_FDC);
 m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
 m->post_init = pc_post_init1;
+m->smp_props.custom_topo_supported = true;
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_VMBUS_BRIDGE);
 
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 9ce3e65d7182..9241366ff351 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -356,6 +356,7 @@ static void pc_q35_machine_options(MachineClass *m)
 m->max_cpus = 4096;
 m->no_parallel = !module_object_class_by_name(TYPE_ISA_PARALLEL);
 m->post_init = pc_q35_post_init;
+m->smp_props.custom_topo_supported = true;
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE);
 machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE);
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index 58591e015569..2995eed5d670 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -195,6 +195,12 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 }
 
 possible_cpus = mc->possible_cpu_arch_ids(ms);
+
+/* Leave user to add CPUs. */
+if (ms->topo->custom_topo_enabled) {
+return;
+}
+
 for (i = 0; i < ms->smp.cpus; i++) {
 x86_cpu_new(x86ms, i, possible_cpus->cpus[i].arch_id, &error_fatal);
 }
-- 
2.34.1




[RFC v2 10/12] i386/cpu: Support Intel hybrid CPUID

2024-09-18 Thread Zhao Liu
For hybrid cpu topology, Intel exposes these CPUIDs [1]:
1. Set CPUID.07H.0H:EDX.Hybrid[bit 15]. With setting as 1, the processor
   is identified as a hybrid part.
2. Have CPUID.1AH leaf. Set core type and native model ID in
   CPUID.1AH:EAX. Because the native model ID is currently useless for
   the software, no need to emulate.

For hybrid related CPUIDs, especially CPUID.07H.0H:EDX.Hybrid[bit 15],
there's no need to expose this feature in feature_word_info[] to allow
user to set directly, because hybrid features depend on the specific
core type information, and this information needs to be gathered
together with hybrid cpu topology.

[1]: SDM, vol.2, Ch.3, 3.2 Instructions (A-L), CPUID-CPU Identification

Co-Developed-by: Zhuocheng Ding 
Signed-off-by: Zhuocheng Ding 
Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 58 +++
 target/i386/cpu.h |  5 
 2 files changed, 63 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index fb54c2c100a0..2f0e7f3d5ad7 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -22,6 +22,7 @@
 #include "qemu/cutils.h"
 #include "qemu/qemu-print.h"
 #include "qemu/hw-version.h"
+#include "core.h"
 #include "cpu.h"
 #include "tcg/helper-tcg.h"
 #include "sysemu/hvf.h"
@@ -743,6 +744,10 @@ static CPUCacheInfo legacy_l3_cache = {
 #define INTEL_AMX_TMUL_MAX_K   0x10
 #define INTEL_AMX_TMUL_MAX_N   0x40
 
+/* CPUID Leaf 0x1A constants: */
+#define INTEL_HYBRID_TYPE_ATOM 0x20
+#define INTEL_HYBRID_TYPE_CORE 0x40
+
 void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
   uint32_t vendor2, uint32_t vendor3)
 {
@@ -6580,6 +6585,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx |= CPUID_7_0_ECX_OSPKE;
 }
 *edx = env->features[FEAT_7_0_EDX]; /* Feature flags */
+
+if (env->parent_core_type != COMMON_CORE &&
+(IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
+*edx |= CPUID_7_0_EDX_HYBRID;
+}
 } else if (count == 1) {
 *eax = env->features[FEAT_7_1_EAX];
 *edx = env->features[FEAT_7_1_EDX];
@@ -6800,6 +6810,31 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 break;
 }
+case 0x1A:
+/* Hybrid Information Enumeration */
+*eax = 0;
+*ebx = 0;
+*ecx = 0;
+*edx = 0;
+if (env->parent_core_type != COMMON_CORE &&
+(IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
+/*
+ * CPUID.1AH:EAX.[bits 23-0] indicates "native model ID of the
+ * core". Since this field currently is useless for software,
+ * no need to emulate.
+ */
+switch (env->parent_core_type) {
+case INTEL_ATOM:
+*eax = INTEL_HYBRID_TYPE_ATOM << 24;
+break;
+case INTEL_CORE:
+*eax = INTEL_HYBRID_TYPE_CORE << 24;
+break;
+default:
+g_assert_not_reached();
+}
+}
+break;
 case 0x1D: {
 /* AMX TILE, for now hardcoded for Sapphire Rapids*/
 *eax = 0;
@@ -7459,6 +7494,14 @@ void x86_cpu_expand_features(X86CPU *cpu, Error **errp)
 }
 }
 
+/*
+ * Intel CPU topology with hybrid cores support requires CPUID.1AH.
+ */
+if (env->parent_core_type != COMMON_CORE &&
+(IS_INTEL_CPU(env) || !cpu->vendor_cpuid_only)) {
+x86_cpu_adjust_level(cpu, &env->cpuid_min_level, 0x1A);
+}
+
 /*
  * Intel CPU topology with multi-dies support requires CPUID[0x1F].
  * For AMD Rome/Milan, cpuid level is 0x10, and guest OS should detect
@@ -7650,6 +7693,20 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 return;
 }
 
+/*
+ * TODO: Introduce parent_pre_realize to make sure topology device
+ * can realize first.
+ */
+if (dev->parent_bus && dev->parent_bus->parent) {
+DeviceState *parent = dev->parent_bus->parent;
+X86CPUCore *core =
+(X86CPUCore *)object_dynamic_cast(OBJECT(parent),
+  TYPE_X86_CPU_CORE);
+if (core) {
+env->parent_core_type = X86_CPU_CORE_GET_CLASS(core)->core_type;
+}
+}
+
 /*
  * Process Hyper-V enlightenments.
  * Note: this currently has to happen before the expansion of CPU features.
@@ -8048,6 +8105,7 @@ static void x86_cpu_initfn(Object *obj)
 CPUX86State *env = &cpu->env;
 
 x86_cpu_init_default_topo(cpu);
+env->parent_core_type = COMMON_CORE

[RFC v2 04/12] hw/core/machine: Split machine initialization around qemu_add_cli_devices_early()

2024-09-18 Thread Zhao Liu
Split machine initialization and machine_run_board_init() into two parts
around qemu_add_cli_devices_early(), allowing initialization to continue
after the CPU creation from the CLI.

This enables machine to place the initialization steps with CPU
dependencies in post_init().

Signed-off-by: Zhao Liu 
---
 hw/core/machine.c   | 10 ++
 include/hw/boards.h |  2 ++
 system/vl.c |  4 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 076bd365197b..7b4ac5ac52b2 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1645,6 +1645,16 @@ void machine_run_board_init(MachineState *machine, const 
char *mem_path, Error *
 
 accel_init_interfaces(ACCEL_GET_CLASS(machine->accelerator));
 machine_class->init(machine);
+}
+
+void machine_run_board_post_init(MachineState *machine, Error **errp)
+{
+MachineClass *machine_class = MACHINE_GET_CLASS(machine);
+
+if (machine_class->post_init) {
+machine_class->post_init(machine);
+}
+
 phase_advance(PHASE_MACHINE_INITIALIZED);
 }
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a49677466ef6..9f706223e848 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -33,6 +33,7 @@ const char *machine_class_default_cpu_type(MachineClass *mc);
 
 void machine_add_audiodev_property(MachineClass *mc);
 void machine_run_board_init(MachineState *machine, const char *mem_path, Error 
**errp);
+void machine_run_board_post_init(MachineState *machine, Error **errp);
 bool machine_usb(MachineState *machine);
 int machine_phandle_start(MachineState *machine);
 bool machine_dump_guest_core(MachineState *machine);
@@ -271,6 +272,7 @@ struct MachineClass {
 const char *deprecation_reason;
 
 void (*init)(MachineState *state);
+void (*post_init)(MachineState *state);
 void (*reset)(MachineState *state, ShutdownCause reason);
 void (*wakeup)(MachineState *state);
 int (*kvm_type)(MachineState *machine, const char *arg);
diff --git a/system/vl.c b/system/vl.c
index 8540454aa1c2..00370f7a52aa 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2659,12 +2659,14 @@ static void qemu_init_board(void)
 /* process plugin before CPUs are created, but once -smp has been parsed */
 qemu_plugin_load_list(&plugin_list, &error_fatal);
 
-/* From here on we enter MACHINE_PHASE_INITIALIZED.  */
 machine_run_board_init(current_machine, mem_path, &error_fatal);
 
 /* Create CPU topology device if any. */
 qemu_add_cli_devices_early();
 
+/* From here on we enter MACHINE_PHASE_INITIALIZED.  */
+machine_run_board_post_init(current_machine, &error_fatal);
+
 drive_check_orphaned();
 
 realtime_init();
-- 
2.34.1




[RFC v2 11/12] i386/machine: Split machine initialization after CPU creation into post_init()

2024-09-18 Thread Zhao Liu
Custom topology will allow machine to skip the default CPU creation and
accept user's CPU creation from CLI.

Therefore, for microvm, pc-i440fx and pc-q35, split machine
initialization from x86_cpus_init(), and place the remaining part into
post_init(), which can continue to run after CPU creation from CLI.

This addresses the CPU dependency for the remaining initialization steps
after x86_cpus_init().

Signed-off-by: Zhao Liu 
---
 hw/i386/microvm.c|  7 +++
 hw/i386/pc_piix.c| 40 +---
 hw/i386/pc_q35.c | 36 ++--
 include/hw/i386/pc.h |  3 +++
 4 files changed, 57 insertions(+), 29 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 49a897db50fc..dc9b21a34230 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -463,6 +463,11 @@ static void microvm_machine_state_init(MachineState 
*machine)
 microvm_memory_init(mms);
 
 x86_cpus_init(x86ms, CPU_VERSION_LATEST);
+}
+
+static void microvm_machine_state_post_init(MachineState *machine)
+{
+MicrovmMachineState *mms = MICROVM_MACHINE(machine);
 
 microvm_devices_init(mms);
 }
@@ -665,6 +670,8 @@ static void microvm_class_init(ObjectClass *oc, void *data)
 /* Machine class handlers */
 mc->reset = microvm_machine_reset;
 
+mc->post_init = microvm_machine_state_post_init;
+
 /* hotplug (for cpu coldplug) */
 mc->get_hotplug_handler = microvm_get_hotplug_handler;
 hc->pre_plug = microvm_device_pre_plug_cb;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2bf6865d405e..c1db2f3129cf 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -105,19 +105,9 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
 PCMachineState *pcms = PC_MACHINE(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(machine);
-MemoryRegion *system_memory = get_system_memory();
-MemoryRegion *system_io = get_system_io();
-Object *phb = NULL;
-ISABus *isa_bus;
-Object *piix4_pm = NULL;
-qemu_irq smi_irq;
-GSIState *gsi_state;
-MemoryRegion *ram_memory;
-MemoryRegion *pci_memory = NULL;
-MemoryRegion *rom_memory = system_memory;
 ram_addr_t lowmem;
-uint64_t hole64_size = 0;
 
+pcms->pci_type = pci_type;
 /*
  * Calculate ram split, for memory below and above 4G.  It's a bit
  * complicated for backward compatibility reasons ...
@@ -150,9 +140,9 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
  *qemu -M pc,max-ram-below-4g=4G -m 3968M  -> 3968M low (=4G-128M)
  */
 if (xen_enabled()) {
-xen_hvm_init_pc(pcms, &ram_memory);
+xen_hvm_init_pc(pcms, &pcms->pre_config_ram);
 } else {
-ram_memory = machine->ram;
+pcms->pre_config_ram = machine->ram;
 if (!pcms->max_ram_below_4g) {
 pcms->max_ram_below_4g = 0xe000; /* default: 3.5G */
 }
@@ -182,6 +172,23 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
 
 pc_machine_init_sgx_epc(pcms);
 x86_cpus_init(x86ms, pcmc->default_cpu_version);
+}
+
+static void pc_post_init1(MachineState *machine)
+{
+PCMachineState *pcms = PC_MACHINE(machine);
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+X86MachineState *x86ms = X86_MACHINE(machine);
+MemoryRegion *system_memory = get_system_memory();
+MemoryRegion *system_io = get_system_io();
+Object *phb = NULL;
+ISABus *isa_bus;
+Object *piix4_pm = NULL;
+qemu_irq smi_irq;
+GSIState *gsi_state;
+MemoryRegion *pci_memory = NULL;
+MemoryRegion *rom_memory = system_memory;
+uint64_t hole64_size = 0;
 
 if (kvm_enabled()) {
 kvmclock_create(pcmc->kvmclock_create_always);
@@ -195,7 +202,7 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
 phb = OBJECT(qdev_new(TYPE_I440FX_PCI_HOST_BRIDGE));
 object_property_add_child(OBJECT(machine), "i440fx", phb);
 object_property_set_link(phb, PCI_HOST_PROP_RAM_MEM,
- OBJECT(ram_memory), &error_fatal);
+ OBJECT(pcms->pre_config_ram), &error_fatal);
 object_property_set_link(phb, PCI_HOST_PROP_PCI_MEM,
  OBJECT(pci_memory), &error_fatal);
 object_property_set_link(phb, PCI_HOST_PROP_SYSTEM_MEM,
@@ -206,7 +213,7 @@ static void pc_init1(MachineState *machine, const char 
*pci_type)
  x86ms->below_4g_mem_size, &error_fatal);
 object_property_set_uint(phb, PCI_HOST_ABOVE_4G_MEM_SIZE,
  x86ms->above_4g_mem_size, &error_fatal);
-object_property_set_str(phb, I440FX_HOST_PROP_PCI_TYPE, pci_type,
+object_property_set_str(phb, I440FX_HOST_PROP_PCI_TYPE, pcms->pci_

[RFC v2 07/12] hw/core: Re-implement topology helpers to honor max limitations

2024-09-18 Thread Zhao Liu
For custom topology case, the valid and reliable topology information
be obtained from topology max limitations.

Therefore, re-implement machine_topo_get_cores_per_socket() and
machine_topo_get_threads_per_socket() to consider the custom topology
case. And further, use the wrapped helper to set CPUState.nr_threads/
nr_cores, avoiding topology mismatches in custom topology scenarios.

Additionally, since test-smp-parse needs more stubs to compile with
cpu-slot.c, keep the old helpers for test-smp-parse' use for now. The
legacy old helpers will be cleaned up when full compilation support is
added later on.

Signed-off-by: Zhao Liu 
---
 hw/core/machine-smp.c   |  8 +---
 hw/cpu/cpu-slot.c   | 18 ++
 include/hw/boards.h |  9 +++--
 include/hw/cpu/cpu-slot.h   |  2 ++
 system/cpus.c   |  2 +-
 tests/unit/test-smp-parse.c |  4 ++--
 6 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index d3be4352267d..2965b042fd92 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -376,14 +376,16 @@ bool machine_parse_smp_cache(MachineState *ms,
 return true;
 }
 
-unsigned int machine_topo_get_cores_per_socket(const MachineState *ms)
+unsigned int machine_topo_get_cores_per_socket_old(const MachineState *ms)
 {
+assert(!ms->topo);
 return ms->smp.cores * ms->smp.modules * ms->smp.clusters * ms->smp.dies;
 }
 
-unsigned int machine_topo_get_threads_per_socket(const MachineState *ms)
+unsigned int machine_topo_get_threads_per_socket_old(const MachineState *ms)
 {
-return ms->smp.threads * machine_topo_get_cores_per_socket(ms);
+assert(!ms->topo);
+return ms->smp.threads * machine_topo_get_cores_per_socket_old(ms);
 }
 
 CpuTopologyLevel machine_get_cache_topo_level(const MachineState *ms,
diff --git a/hw/cpu/cpu-slot.c b/hw/cpu/cpu-slot.c
index f2b9c412926f..8c0d55e835e2 100644
--- a/hw/cpu/cpu-slot.c
+++ b/hw/cpu/cpu-slot.c
@@ -204,6 +204,8 @@ static int get_smp_info_by_level(const CpuTopology 
*smp_info,
 return smp_info->cores;
 case CPU_TOPOLOGY_LEVEL_MODULE:
 return smp_info->modules;
+case CPU_TOPOLOGY_LEVEL_CLUSTER:
+return smp_info->clusters;
 case CPU_TOPOLOGY_LEVEL_DIE:
 return smp_info->dies;
 case CPU_TOPOLOGY_LEVEL_SOCKET:
@@ -356,6 +358,22 @@ int get_max_topo_by_level(const MachineState *ms, 
CpuTopologyLevel level)
 return ms->topo->stat.entries[level].max_limit;
 }
 
+unsigned int machine_topo_get_cores_per_socket(const MachineState *ms)
+{
+int cores = 1, i;
+
+for (i = CPU_TOPOLOGY_LEVEL_CORE; i < CPU_TOPOLOGY_LEVEL_SOCKET; i++) {
+cores *= get_max_topo_by_level(ms, i);
+}
+return cores;
+}
+
+unsigned int machine_topo_get_threads_per_socket(const MachineState *ms)
+{
+return get_max_topo_by_level(ms, CPU_TOPOLOGY_LEVEL_THREAD) *
+   machine_topo_get_cores_per_socket(ms);
+}
+
 bool machine_parse_custom_topo_config(MachineState *ms,
   const SMPConfiguration *config,
   Error **errp)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 6ef4ea322590..faf7859debdd 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -48,8 +48,13 @@ void machine_parse_smp_config(MachineState *ms,
 bool machine_parse_smp_cache(MachineState *ms,
  const SmpCachePropertiesList *caches,
  Error **errp);
-unsigned int machine_topo_get_cores_per_socket(const MachineState *ms);
-unsigned int machine_topo_get_threads_per_socket(const MachineState *ms);
+/*
+ * TODO: Drop these old helpers when cpu-slot.c could be compiled for
+ * test-smp-parse. Pls use machine_topo_get_cores_per_socket() and
+ * machine_topo_get_threads_per_socket() instead.
+ */
+unsigned int machine_topo_get_cores_per_socket_old(const MachineState *ms);
+unsigned int machine_topo_get_threads_per_socket_old(const MachineState *ms);
 CpuTopologyLevel machine_get_cache_topo_level(const MachineState *ms,
   CacheLevelAndType cache);
 void machine_memory_devices_init(MachineState *ms, hwaddr base, uint64_t size);
diff --git a/include/hw/cpu/cpu-slot.h b/include/hw/cpu/cpu-slot.h
index f56a0b08dca4..230309b67fe1 100644
--- a/include/hw/cpu/cpu-slot.h
+++ b/include/hw/cpu/cpu-slot.h
@@ -81,6 +81,8 @@ struct CPUSlot {
 void machine_plug_cpu_slot(MachineState *ms);
 bool machine_create_topo_tree(MachineState *ms, Error **errp);
 int get_max_topo_by_level(const MachineState *ms, CpuTopologyLevel level);
+unsigned int machine_topo_get_cores_per_socket(const MachineState *ms);
+unsigned int machine_topo_get_threads_per_socket(const MachineState *ms);
 bool machine_parse_custom_topo_config(MachineState *ms,
   const SMPConfiguration *config,
  

[RFC v2 02/12] qdev: Introduce new device category to cover basic topology device

2024-09-18 Thread Zhao Liu
Topology devices are used to define CPUs and need to be created and
realized earlier than current qemu_create_cli_devices().

Use this new catogory to identify such special devices, which allows
to create them earlier in subsequent change.

Signed-off-by: Zhao Liu 
---
 hw/cpu/cpu-topology.c  | 2 +-
 include/hw/qdev-core.h | 1 +
 system/qdev-monitor.c  | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/cpu/cpu-topology.c b/hw/cpu/cpu-topology.c
index 3e8982ff7e6c..ce3da844a7d8 100644
--- a/hw/cpu/cpu-topology.c
+++ b/hw/cpu/cpu-topology.c
@@ -164,7 +164,7 @@ static void cpu_topo_class_init(ObjectClass *oc, void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
 
-set_bit(DEVICE_CATEGORY_CPU, dc->categories);
+set_bit(DEVICE_CATEGORY_CPU_DEF, dc->categories);
 dc->realize = cpu_topo_realize;
 
 /*
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 77223b28c788..ddcaa329e3ec 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -86,6 +86,7 @@ typedef enum DeviceCategory {
 DEVICE_CATEGORY_SOUND,
 DEVICE_CATEGORY_MISC,
 DEVICE_CATEGORY_CPU,
+DEVICE_CATEGORY_CPU_DEF,
 DEVICE_CATEGORY_WATCHDOG,
 DEVICE_CATEGORY_MAX
 } DeviceCategory;
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index fe120353fedc..07863d4e650a 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -179,6 +179,7 @@ static void qdev_print_devinfos(bool show_no_user)
 [DEVICE_CATEGORY_SOUND]   = "Sound",
 [DEVICE_CATEGORY_MISC]= "Misc",
 [DEVICE_CATEGORY_CPU] = "CPU",
+[DEVICE_CATEGORY_CPU_DEF] = "CPU Definition",
 [DEVICE_CATEGORY_WATCHDOG]= "Watchdog",
 [DEVICE_CATEGORY_MAX] = "Uncategorized",
 };
-- 
2.34.1




[RFC v2 00/12] Introduce Hybrid CPU Topology via Custom Topology Tree

2024-09-18 Thread Zhao Liu
y()   
││ │
││ └───(*)Create CPU topology devices  
││including CPUs   
││  
│└(*)machine_run_board_post_init()
│  │
│  └───(*)machine_class->post_init(machine)   
│   │   
│   └───(*)Other initialization steps
│  with CPU dependencies
│   
└─── qemu_create_cli_devices()  


As the above figure, "(*)" indicates the new interface/hook added in
this series:

  * (For the machine supports custom topology) split CPU dependent
initialization setps into machine_class->post_init().

- For example, in q35 machine, all the logic after x86_cpu_new() is
  placed in machine_class->post_init().

  * Between machine_class->init() and machine_class->post_init(),
create CPU topology devices (including CPUs) from CLI early.

This effectively replaces the default CPU creation (as well as topology
tree creation) in the original initialization process with
qemu_add_cli_devices_early().


3. Patch Summary


Patch 01-03: Create topology device from CLI early.
Ptach 04,11: Separate the part following CPU creation from the machine
 initialization process into MachineClass.post_init().
Patch 05-08: Implement max parameters in -smp and use max limitations
 to initialize possible_cpus[].
Patch 09-10: Add Intel hybrid CPU support.
Patch12: Allow user to customize topology tree for x86 machines.


4. Reference


[1]: [RFC 00/52] Introduce hybrid CPU topology
 
https://lore.kernel.org/qemu-devel/20230213095035.158240-1-zhao1@linux.intel.com/
[2]: [RFC v2 00/15] qom-topo: Abstract CPU Topology Level to Topology Device
 
https://lore.kernel.org/qemu-devel/20240919015533.766754-1-zhao1@intel.com/
[3]: [RFC 00/41] qom-topo: Abstract Everything about CPU Topology
 
https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1@linux.intel.com/


Thanks and Best Regards,
Zhao
---
Zhao Liu (12):
  qdev: Allow qdev_device_add() to add specific category device
  qdev: Introduce new device category to cover basic topology device
  system/vl: Create CPU topology devices from CLI early
  hw/core/machine: Split machine initialization around
qemu_add_cli_devices_early()
  hw/core/machine: Introduce custom CPU topology with max limitations
  hw/cpu: Constrain CPU topology tree with max_limit
  hw/core: Re-implement topology helpers to honor max limitations
  hw/i386: Use get_max_topo_by_level() to get topology information
  i386: Introduce x86 CPU core abstractions
  i386/cpu: Support Intel hybrid CPUID
  i386/machine: Split machine initialization after CPU creation into
post_init()
  i386: Support custom topology for microvm, pc-i440fx and pc-q35

 MAINTAINERS |   1 +
 hw/core/machine-smp.c   |  10 ++-
 hw/core/machine.c   |  47 ++
 hw/core/meson.build |   2 +-
 hw/cpu/cpu-slot.c   | 168 
 hw/cpu/cpu-topology.c   |   2 +-
 hw/i386/microvm.c   |   8 ++
 hw/i386/pc_piix.c   |  41 +
 hw/i386/pc_q35.c|  37 +---
 hw/i386/x86-common.c|  25 --
 hw/i386/x86.c   |  20 +++--
 hw/net/virtio-net.c |   2 +-
 hw/usb/xen-usb.c|   3 +-
 include/hw/boards.h |  13 ++-
 include/hw/cpu/cpu-slot.h   |  12 +++
 include/hw/i386/pc.h|   3 +
 include/hw/qdev-core.h  |   6 ++
 include/monitor/qdev.h  |   4 +-
 qapi/machine.json   |  22 -
 stubs/machine-stubs.c   |  21 +
 stubs/meson.build   |   1 +
 system/cpus.c   |   2 +-
 system/qdev-monitor.c   |  13 ++-
 system/vl.c |  59 -
 target/i386/core.c  |  56 
 target/i386/core.h  |  53 
 target/i386/cpu.c   |  58 +
 target/i386/cpu.h   |   5 ++
 target/i386/meson.build |   1 +
 tests/unit/test-smp-parse.c |   4 +-
 30 files changed, 618 insertions(+), 81 deletions(-)
 create mode 100644 stubs/machine-stubs.c
 create mode 100644 target/i386/core.c
 create mode 100644 target/i386/core.h

-- 
2.34.1




[RFC v2 01/12] qdev: Allow qdev_device_add() to add specific category device

2024-09-18 Thread Zhao Liu
Topology devices need to be created and realized before board
initialization.

Allow qdev_device_add() to specify category to help create topology
devices early.

Signed-off-by: Zhao Liu 
---
 hw/net/virtio-net.c|  2 +-
 hw/usb/xen-usb.c   |  3 ++-
 include/monitor/qdev.h |  4 ++--
 system/qdev-monitor.c  | 12 
 system/vl.c|  4 ++--
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index fb84d142ee29..0d92e09e9076 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -935,7 +935,7 @@ static void failover_add_primary(VirtIONet *n, Error **errp)
 return;
 }
 
-dev = qdev_device_add_from_qdict(n->primary_opts,
+dev = qdev_device_add_from_qdict(n->primary_opts, NULL,
  n->primary_opts_from_json,
  &err);
 if (err) {
diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 13901625c0c8..e4168b1fec7e 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -766,7 +766,8 @@ static void usbback_portid_add(struct usbback_info *usbif, 
unsigned port,
 qdict_put_str(qdict, "hostport", portname);
 opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict,
 &error_abort);
-usbif->ports[port - 1].dev = USB_DEVICE(qdev_device_add(opts, &local_err));
+usbif->ports[port - 1].dev = USB_DEVICE(
+ qdev_device_add(opts, NULL, &local_err));
 if (!usbif->ports[port - 1].dev) {
 qobject_unref(qdict);
 xen_pv_printf(&usbif->xendev, 0,
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 1d57bf657794..f5fd6e6c1ffc 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -8,8 +8,8 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict);
 void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
 
 int qdev_device_help(QemuOpts *opts);
-DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
-DeviceState *qdev_device_add_from_qdict(const QDict *opts,
+DeviceState *qdev_device_add(QemuOpts *opts, long *category, Error **errp);
+DeviceState *qdev_device_add_from_qdict(const QDict *opts, long *category,
 bool from_json, Error **errp);
 
 /**
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 457dfd05115e..fe120353fedc 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -632,7 +632,7 @@ const char *qdev_set_id(DeviceState *dev, char *id, Error 
**errp)
 return prop->name;
 }
 
-DeviceState *qdev_device_add_from_qdict(const QDict *opts,
+DeviceState *qdev_device_add_from_qdict(const QDict *opts, long *category,
 bool from_json, Error **errp)
 {
 ERRP_GUARD();
@@ -655,6 +655,10 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 return NULL;
 }
 
+if (category && !test_bit(*category, dc->categories)) {
+return NULL;
+}
+
 /* find bus */
 path = qdict_get_try_str(opts, "bus");
 if (path != NULL) {
@@ -767,12 +771,12 @@ err_del_dev:
 }
 
 /* Takes ownership of @opts on success */
-DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
+DeviceState *qdev_device_add(QemuOpts *opts, long *category, Error **errp)
 {
 QDict *qdict = qemu_opts_to_qdict(opts, NULL);
 DeviceState *ret;
 
-ret = qdev_device_add_from_qdict(qdict, false, errp);
+ret = qdev_device_add_from_qdict(qdict, category, false, errp);
 if (ret) {
 qemu_opts_del(opts);
 }
@@ -897,7 +901,7 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp)
 qemu_opts_del(opts);
 return;
 }
-dev = qdev_device_add(opts, errp);
+dev = qdev_device_add(opts, NULL, errp);
 if (!dev) {
 /*
  * Drain all pending RCU callbacks. This is done because
diff --git a/system/vl.c b/system/vl.c
index 193e7049ccbe..c40364e2f091 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1212,7 +1212,7 @@ static int device_init_func(void *opaque, QemuOpts *opts, 
Error **errp)
 {
 DeviceState *dev;
 
-dev = qdev_device_add(opts, errp);
+dev = qdev_device_add(opts, NULL, errp);
 if (!dev && *errp) {
 error_report_err(*errp);
 return -1;
@@ -2665,7 +2665,7 @@ static void qemu_create_cli_devices(void)
  * from the start, so call qdev_device_add_from_qdict() directly for
  * now.
  */
-dev = qdev_device_add_from_qdict(opt->opts, true, &error_fatal);
+dev = qdev_device_add_from_qdict(opt->opts, NULL, true, &error_fatal);
 object_unref(OBJECT(dev));
 loc_pop(&opt->loc);
 }
-- 
2.34.1




[RFC v2 13/15] system/qdev-monitor: Introduce bus-finder interface for compatibility with bus-less plug behavior

2024-09-18 Thread Zhao Liu
Currently, cpu and core is located by topology IDs when plugging.

On a topology tree, each topology device will has a CPU bus. Once cpu
and core specify the bus_type, it's necessary to find accurate buses
for them based on topology IDs (if bus=* is not set in -device).

Therefore, we need a way to use traditional topology IDs for locating
specific bus in the topology tree. This is the bus-finder interface.

With bus-finder, qdev-monitor can locate the bus based on device
properties when "bus=*" is not specified.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS  |  2 ++
 include/monitor/bus-finder.h | 41 
 system/bus-finder.c  | 46 
 system/meson.build   |  1 +
 system/qdev-monitor.c| 41 
 5 files changed, 126 insertions(+), 5 deletions(-)
 create mode 100644 include/monitor/bus-finder.h
 create mode 100644 system/bus-finder.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 03c1a13de074..4608c3c6db8c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3281,12 +3281,14 @@ F: hw/core/qdev*
 F: hw/core/bus.c
 F: hw/core/sysbus.c
 F: include/hw/qdev*
+F: include/monitor/bus-finder.h
 F: include/monitor/qdev.h
 F: include/qom/
 F: qapi/qom.json
 F: qapi/qdev.json
 F: scripts/coccinelle/qom-parent-type.cocci
 F: scripts/qom-cast-macro-clean-cocci-gen.py
+F: system/bus-finder.c
 F: system/qdev-monitor.c
 F: stubs/qdev.c
 F: qom/
diff --git a/include/monitor/bus-finder.h b/include/monitor/bus-finder.h
new file mode 100644
index ..56f1e4791b66
--- /dev/null
+++ b/include/monitor/bus-finder.h
@@ -0,0 +1,41 @@
+/*
+ * Bus finder interface header
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef BUS_FINDER_H
+#define BUS_FINDER_H
+
+#include "hw/qdev-core.h"
+#include "qom/object.h"
+
+#define TYPE_BUS_FINDER "bus-finder"
+
+typedef struct BusFinderClass BusFinderClass;
+DECLARE_CLASS_CHECKERS(BusFinderClass, BUS_FINDER, TYPE_BUS_FINDER)
+#define BUS_FINDER(obj) INTERFACE_CHECK(BusFinder, (obj), TYPE_BUS_FINDER)
+
+typedef struct BusFinder BusFinder;
+
+/**
+ * BusFinderClass:
+ * @find_bus: Method to find bus.
+ */
+struct BusFinderClass {
+/*  */
+InterfaceClass parent_class;
+
+/*  */
+BusState *(*find_bus)(DeviceState *dev);
+};
+
+bool is_bus_finder_type(DeviceClass *dc);
+BusState *bus_finder_select_bus(DeviceState *dev);
+
+#endif /* BUS_FINDER_H */
diff --git a/system/bus-finder.c b/system/bus-finder.c
new file mode 100644
index ..097291a96bf3
--- /dev/null
+++ b/system/bus-finder.c
@@ -0,0 +1,46 @@
+/*
+ * Bus finder interface
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/qdev-core.h"
+#include "monitor/bus-finder.h"
+#include "qom/object.h"
+
+bool is_bus_finder_type(DeviceClass *dc)
+{
+return !!object_class_dynamic_cast(OBJECT_CLASS(dc), TYPE_BUS_FINDER);
+}
+
+BusState *bus_finder_select_bus(DeviceState *dev)
+{
+BusFinder *bf = BUS_FINDER(dev);
+BusFinderClass *bfc = BUS_FINDER_GET_CLASS(bf);
+
+if (bfc->find_bus) {
+return bfc->find_bus(dev);
+}
+
+return NULL;
+}
+
+static const TypeInfo bus_finder_interface_info = {
+.name  = TYPE_BUS_FINDER,
+.parent= TYPE_INTERFACE,
+.class_size = sizeof(BusFinderClass),
+};
+
+static void bus_finder_register_types(void)
+{
+type_register_static(&bus_finder_interface_info);
+}
+
+type_init(bus_finder_register_types)
diff --git a/system/meson.build b/system/meson.build
index a296270cb005..090716b81abd 100644
--- a/system/meson.build
+++ b/system/meson.build
@@ -9,6 +9,7 @@ specific_ss.add(when: 'CONFIG_SYSTEM_ONLY', if_true: [files(
 system_ss.add(files(
   'balloon.c',
   'bootdevice.c',
+  'bus-finder.c',
   'cpus.c',
   'cpu-throttle.c',
   'cpu-timers.c',
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 44994ea0e160..457dfd05115e 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "hw/sysbus.h"
+#include "monitor/bus-finder.h"
 #include "monitor/hmp.h"
 #include "monitor/monitor.h"
 #include "monitor/qdev.h"
@@ -589,6 +590,16 @@ static BusState *qbus_find(const char *path, Error **errp)
 return bus;
 }
 
+static inline bool qdev_post_find_bus(DeviceClass *dc)
+{
+return is_bus_finder_type(dc);
+}
+
+static inline BusState *qdev_find_bus_post_devi

[RFC v2 04/15] hw/cpu: Introduce CPU slot to manage CPU topology

2024-09-18 Thread Zhao Liu
When there's a CPU topology tree, original MachineState.smp (CpuTopology
structure) is not enough to dynamically monitor changes of the tree or
update topology information in time.

To address this, introduce the CPU slot, as the root of CPU topology
tree, which is used to update and maintain global topological statistics
by listening any changes of topology device (realize() and unrealize()).

Signed-off-by: Zhao Liu 
---
 MAINTAINERS   |   2 +
 hw/cpu/cpu-slot.c | 140 ++
 hw/cpu/meson.build|   1 +
 include/hw/cpu/cpu-slot.h |  72 
 4 files changed, 215 insertions(+)
 create mode 100644 hw/cpu/cpu-slot.c
 create mode 100644 include/hw/cpu/cpu-slot.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 230267597b5f..8e5b2cd91dca 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1884,6 +1884,7 @@ F: hw/core/machine-smp.c
 F: hw/core/null-machine.c
 F: hw/core/numa.c
 F: hw/cpu/cluster.c
+F: hw/cpu/cpu-slot.c
 F: hw/cpu/cpu-topology.c
 F: qapi/machine.json
 F: qapi/machine-common.json
@@ -1891,6 +1892,7 @@ F: qapi/machine-target.json
 F: include/hw/boards.h
 F: include/hw/core/cpu.h
 F: include/hw/cpu/cluster.h
+F: include/hw/cpu/cpu-slot.h
 F: include/hw/cpu/cpu-topology.h
 F: include/sysemu/numa.h
 F: tests/functional/test_cpu_queries.py
diff --git a/hw/cpu/cpu-slot.c b/hw/cpu/cpu-slot.c
new file mode 100644
index ..66ef8d9faa97
--- /dev/null
+++ b/hw/cpu/cpu-slot.c
@@ -0,0 +1,140 @@
+/*
+ * CPU slot abstraction - manage CPU topology
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/boards.h"
+#include "hw/cpu/cpu-slot.h"
+#include "hw/cpu/cpu-topology.h"
+#include "hw/qdev-core.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+
+static void cpu_slot_add_topo_info(CPUSlot *slot, CPUTopoState *topo)
+{
+CpuTopologyLevel level = GET_CPU_TOPO_LEVEL(topo);
+CPUTopoStatEntry *entry;
+int instances_num;
+
+entry = &slot->stat.entries[level];
+entry->total_instances++;
+
+instances_num = cpu_topo_get_instances_num(topo);
+if (instances_num > entry->max_instances) {
+entry->max_instances = instances_num;
+}
+
+set_bit(level, slot->stat.curr_levels);
+
+return;
+}
+
+static void cpu_slot_device_realize(DeviceListener *listener,
+DeviceState *dev)
+{
+CPUSlot *slot = container_of(listener, CPUSlot, listener);
+CPUTopoState *topo;
+
+if (!object_dynamic_cast(OBJECT(dev), TYPE_CPU_TOPO)) {
+return;
+}
+
+topo = CPU_TOPO(dev);
+cpu_slot_add_topo_info(slot, topo);
+}
+
+static void cpu_slot_del_topo_info(CPUSlot *slot, CPUTopoState *topo)
+{
+CpuTopologyLevel level = GET_CPU_TOPO_LEVEL(topo);
+CPUTopoStatEntry *entry;
+
+entry = &slot->stat.entries[level];
+entry->total_instances--;
+
+return;
+}
+
+static void cpu_slot_device_unrealize(DeviceListener *listener,
+  DeviceState *dev)
+{
+CPUSlot *slot = container_of(listener, CPUSlot, listener);
+CPUTopoState *topo;
+
+if (!object_dynamic_cast(OBJECT(dev), TYPE_CPU_TOPO)) {
+return;
+}
+
+topo = CPU_TOPO(dev);
+cpu_slot_del_topo_info(slot, topo);
+}
+
+DeviceListener cpu_slot_device_listener = {
+.realize = cpu_slot_device_realize,
+.unrealize = cpu_slot_device_unrealize,
+};
+
+static bool slot_bus_check_topology(CPUBusState *cbus,
+CPUTopoState *topo,
+Error **errp)
+{
+CPUSlot *slot = CPU_SLOT(BUS(cbus)->parent);
+CpuTopologyLevel level = GET_CPU_TOPO_LEVEL(topo);
+
+if (!test_bit(level, slot->supported_levels)) {
+error_setg(errp, "cpu topo: level %s is not supported",
+   CpuTopologyLevel_str(level));
+return false;
+}
+return true;
+}
+
+static void cpu_slot_realize(DeviceState *dev, Error **errp)
+{
+CPUSlot *slot = CPU_SLOT(dev);
+
+slot->listener = cpu_slot_device_listener;
+device_listener_register(&slot->listener);
+
+qbus_init(&slot->bus, sizeof(CPUBusState),
+  TYPE_CPU_BUS, dev, "cpu-slot");
+slot->bus.check_topology = slot_bus_check_topology;
+}
+
+static void cpu_slot_unrealize(DeviceState *dev)
+{
+CPUSlot *slot = CPU_SLOT(dev);
+
+device_listener_unregister(&slot->listener);
+}
+
+static void cpu_slot_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
+dc->realize = cpu_slot_reali

[RFC v2 08/15] hw/cpu/core: Convert cpu-core from general device to topology device

2024-09-18 Thread Zhao Liu
Convert cpu-core to topology device then it can be added into topology
tree.

At present, only PPC is using cpu-core device. For topology tree, it's
necessary to add cpu-core in the tree as one of the topology
hierarchies.

The generic cpu-core is sufficient to express the core layer in a
topology tree without needing to consider any arch-specific feature, so
to reduce the support complexity of the topology tree and allow arch to
be able to use the abstract cpu-core directly, without further
derivation of the arch-specific core, remove the "abstract" restriction
from TypeInfo.

Because cpu-core then inherits properties and settings of topology
device, also make the following changes to take into account the special
case for cpu-core:

 * Omit setting category since topology device has already set.

 * Make realize() of topology device as the parent realize() for PPC
   cores.

 * Set cpu-core's topology level as core.

 * Mask bus_type for PPC cores as NULL to avoid PPC cores' creation
   failure since PPC currently doesn't support topology tree.

Signed-off-by: Zhao Liu 
---
 hw/cpu/core.c   |  9 +
 hw/ppc/pnv_core.c   | 11 ++-
 hw/ppc/spapr_cpu_core.c | 12 +++-
 include/hw/cpu/core.h   |  3 ++-
 include/hw/ppc/pnv_core.h   |  3 ++-
 include/hw/ppc/spapr_cpu_core.h |  4 +++-
 6 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/hw/cpu/core.c b/hw/cpu/core.c
index 495a5c30ffe1..bf1cbceea21b 100644
--- a/hw/cpu/core.c
+++ b/hw/cpu/core.c
@@ -79,19 +79,20 @@ static void cpu_core_instance_init(Object *obj)
 
 static void cpu_core_class_init(ObjectClass *oc, void *data)
 {
-DeviceClass *dc = DEVICE_CLASS(oc);
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
 
-set_bit(DEVICE_CATEGORY_CPU, dc->categories);
+/* TODO: Offload "core-id" and "nr-threads" to ppc-specific core. */
 object_class_property_add(oc, "core-id", "int", core_prop_get_core_id,
   core_prop_set_core_id, NULL, NULL);
 object_class_property_add(oc, "nr-threads", "int", 
core_prop_get_nr_threads,
   core_prop_set_nr_threads, NULL, NULL);
+
+tc->level = CPU_TOPOLOGY_LEVEL_CORE;
 }
 
 static const TypeInfo cpu_core_type_info = {
 .name = TYPE_CPU_CORE,
-.parent = TYPE_DEVICE,
-.abstract = true,
+.parent = TYPE_CPU_TOPO,
 .class_init = cpu_core_class_init,
 .instance_size = sizeof(CPUCore),
 .instance_init = cpu_core_instance_init,
diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
index a30693990b25..9be7a4b6c1a9 100644
--- a/hw/ppc/pnv_core.c
+++ b/hw/ppc/pnv_core.c
@@ -356,6 +356,8 @@ static void pnv_core_realize(DeviceState *dev, Error **errp)
 
 assert(pc->chip);
 
+pcc->parent_realize(dev, errp);
+
 pc->threads = g_new(PowerPCCPU *, cc->nr_threads);
 for (i = 0; i < cc->nr_threads; i++) {
 PowerPCCPU *cpu;
@@ -466,11 +468,18 @@ static void pnv_core_power10_class_init(ObjectClass *oc, 
void *data)
 static void pnv_core_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
+PnvCoreClass *pcc = PNV_CORE_CLASS(oc);
 
-dc->realize = pnv_core_realize;
 dc->unrealize = pnv_core_unrealize;
 device_class_set_props(dc, pnv_core_properties);
 dc->user_creatable = false;
+device_class_set_parent_realize(dc, pnv_core_realize,
+&pcc->parent_realize);
+/*
+ * Avoid ppc that do not support topology device trees from
+ * encountering error when creating cores.
+ */
+dc->bus_type = NULL;
 }
 
 #define DEFINE_PNV_CORE_TYPE(family, cpu_model) \
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 464224516881..49c440fc0e09 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -338,6 +338,7 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error 
**errp)
 (SpaprMachineState *) object_dynamic_cast(qdev_get_machine(),
   TYPE_SPAPR_MACHINE);
 SpaprCpuCore *sc = SPAPR_CPU_CORE(OBJECT(dev));
+SpaprCpuCoreClass *scc = SPAPR_CPU_CORE_GET_CLASS(sc);
 CPUCore *cc = CPU_CORE(OBJECT(dev));
 int i;
 
@@ -346,6 +347,8 @@ static void spapr_cpu_core_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+scc->parent_realize(dev, errp);
+
 qemu_register_reset(spapr_cpu_core_reset_handler, sc);
 sc->threads = g_new0(PowerPCCPU *, cc->nr_threads);
 for (i = 0; i < cc->nr_threads; i++) {
@@ -376,11 +379,18 @@ static void spapr_cpu_core_class_init(ObjectClass *oc, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(oc);
 SpaprCpuCoreClass *scc = SPAPR_CPU_CORE_CLASS(oc);
 
-dc->realize = spapr_cpu_core_realize;
 dc->unrealize = spapr_cpu_core_unrealize;
 device_clas

[RFC v2 03/15] hw/cpu: Introduce CPU topology device and CPU bus

2024-09-18 Thread Zhao Liu
Hybrid (or heterogeneous) CPU topology needs to be expressed as
a topology tree, which requires to abstract all the CPU topology
level as the objects.

At present, QEMU already has the CPU device, core device and cluster
device (for TCG), so that it's natual to introduce more topology
related devices instead of abstractong native QEMU objects.

To make it easier to deal with topological relationships, introduce
the general and abstract CPU topology device, and also introduce the
CPU bus to connect such CPU topology devices.

With the underlying CPU topology device abstraction, all the CPU
topology levels could be derived from it as subclasses. Then the
specific devices, such as CPU, core, or future module/die/socket devices
etc, don't have to care about topology relationship, and the underlying
CPU topology abstraction and CPU bus will take care of everything and
build the topology tree.

Note, for the user created topology devices, they are specified the
default object parent (one of the peripheral containers: "/peripheral"
or "/peripheral-anon"). It's necessary to fixup their parent object
to correct topology parent, so that it can make their canonical path
in qtree match the actual topological hierarchies relationship. This
is done by cpu_topo_set_parent() when topology device realizes.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS   |   2 +
 hw/cpu/cpu-topology.c | 179 ++
 hw/cpu/meson.build|   2 +
 include/hw/cpu/cpu-topology.h |  68 +
 include/qemu/typedefs.h   |   2 +
 stubs/hotplug-stubs.c |   5 +
 6 files changed, 258 insertions(+)
 create mode 100644 hw/cpu/cpu-topology.c
 create mode 100644 include/hw/cpu/cpu-topology.h

diff --git a/MAINTAINERS b/MAINTAINERS
index ffacd60f4075..230267597b5f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1884,12 +1884,14 @@ F: hw/core/machine-smp.c
 F: hw/core/null-machine.c
 F: hw/core/numa.c
 F: hw/cpu/cluster.c
+F: hw/cpu/cpu-topology.c
 F: qapi/machine.json
 F: qapi/machine-common.json
 F: qapi/machine-target.json
 F: include/hw/boards.h
 F: include/hw/core/cpu.h
 F: include/hw/cpu/cluster.h
+F: include/hw/cpu/cpu-topology.h
 F: include/sysemu/numa.h
 F: tests/functional/test_cpu_queries.py
 F: tests/functional/test_empty_cpu_model.py
diff --git a/hw/cpu/cpu-topology.c b/hw/cpu/cpu-topology.c
new file mode 100644
index ..e68c06132e7d
--- /dev/null
+++ b/hw/cpu/cpu-topology.c
@@ -0,0 +1,179 @@
+/*
+ * General CPU topology device abstraction
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/cpu/cpu-topology.h"
+#include "hw/qdev-core.h"
+#include "hw/qdev-properties.h"
+#include "hw/sysbus.h"
+#include "qapi/error.h"
+
+/* Roll up until topology root to check. */
+static bool cpu_parent_check_topology(DeviceState *parent,
+  DeviceState *dev,
+  Error **errp)
+{
+BusClass *bc;
+
+if (!parent || !parent->parent_bus ||
+object_dynamic_cast(OBJECT(parent->parent_bus), TYPE_CPU_BUS)) {
+return true;
+}
+
+bc = BUS_GET_CLASS(parent->parent_bus);
+if (bc->check_address) {
+return bc->check_address(parent->parent_bus, dev, errp);
+}
+
+return true;
+}
+
+static bool cpu_bus_check_address(BusState *bus, DeviceState *dev,
+  Error **errp)
+{
+CPUBusState *cbus = CPU_BUS(bus);
+
+if (cbus->check_topology) {
+return cbus->check_topology(CPU_BUS(bus), CPU_TOPO(dev), errp);
+}
+
+return cpu_parent_check_topology(bus->parent, dev, errp);
+}
+
+static void cpu_bus_class_init(ObjectClass *oc, void *data)
+{
+BusClass *bc = BUS_CLASS(oc);
+
+bc->check_address = cpu_bus_check_address;
+}
+
+static const TypeInfo cpu_bus_type_info = {
+.name = TYPE_CPU_BUS,
+.parent = TYPE_BUS,
+.class_init = cpu_bus_class_init,
+.instance_size = sizeof(CPUBusState),
+};
+
+static bool cpu_topo_set_parent(CPUTopoState *topo, Error **errp)
+{
+DeviceState *dev = DEVICE(topo);
+BusState *bus = dev->parent_bus;
+CPUTopoState *parent_topo = NULL;
+Object *parent;
+
+if (!bus || !bus->parent) {
+return true;
+}
+
+if (topo->parent) {
+error_setg(errp, "cpu topo: %s already have the parent?",
+   object_get_typename(OBJECT(topo)));
+return false;
+}
+
+parent = OBJECT(bus->parent);
+if (object_dynamic_cast(parent, TYPE_CPU_TOPO)) {
+parent_topo = CPU_TOPO(parent);
+
+if (GET_CPU_TOPO_LEVEL(topo) >= GET_CPU_TOPO_LEVEL(parent_topo)) {
+error_se

[RFC v2 09/15] hw/cpu: Abstract module/die/socket levels as topology devices

2024-09-18 Thread Zhao Liu
Abstract module/die/socket levels as the cpu-module/cpu-die/cpu-socket
topology devices then they can be inserted into topology tree.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS |  6 ++
 hw/cpu/die.c| 34 ++
 hw/cpu/meson.build  |  3 +++
 hw/cpu/module.c | 34 ++
 hw/cpu/socket.c | 34 ++
 include/hw/cpu/die.h| 29 +
 include/hw/cpu/module.h | 29 +
 include/hw/cpu/socket.h | 29 +
 8 files changed, 198 insertions(+)
 create mode 100644 hw/cpu/die.c
 create mode 100644 hw/cpu/module.c
 create mode 100644 hw/cpu/socket.c
 create mode 100644 include/hw/cpu/die.h
 create mode 100644 include/hw/cpu/module.h
 create mode 100644 include/hw/cpu/socket.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e5b2cd91dca..03c1a13de074 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1886,6 +1886,9 @@ F: hw/core/numa.c
 F: hw/cpu/cluster.c
 F: hw/cpu/cpu-slot.c
 F: hw/cpu/cpu-topology.c
+F: hw/cpu/die.c
+F: hw/cpu/module.c
+F: hw/cpu/socket.c
 F: qapi/machine.json
 F: qapi/machine-common.json
 F: qapi/machine-target.json
@@ -1894,6 +1897,9 @@ F: include/hw/core/cpu.h
 F: include/hw/cpu/cluster.h
 F: include/hw/cpu/cpu-slot.h
 F: include/hw/cpu/cpu-topology.h
+F: include/hw/cpu/die.h
+F: include/hw/cpu/module.h
+F: include/hw/cpu/socket.h
 F: include/sysemu/numa.h
 F: tests/functional/test_cpu_queries.py
 F: tests/functional/test_empty_cpu_model.py
diff --git a/hw/cpu/die.c b/hw/cpu/die.c
new file mode 100644
index ..f00907ffd78b
--- /dev/null
+++ b/hw/cpu/die.c
@@ -0,0 +1,34 @@
+/*
+ * CPU die abstract device
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cpu/die.h"
+
+static void cpu_die_class_init(ObjectClass *oc, void *data)
+{
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+tc->level = CPU_TOPOLOGY_LEVEL_DIE;
+}
+
+static const TypeInfo cpu_die_type_info = {
+.name = TYPE_CPU_DIE,
+.parent = TYPE_CPU_TOPO,
+.class_init = cpu_die_class_init,
+.instance_size = sizeof(CPUDie),
+};
+
+static void cpu_die_register_types(void)
+{
+type_register_static(&cpu_die_type_info);
+}
+
+type_init(cpu_die_register_types)
diff --git a/hw/cpu/meson.build b/hw/cpu/meson.build
index 358e2b3960fa..c64eec4460d8 100644
--- a/hw/cpu/meson.build
+++ b/hw/cpu/meson.build
@@ -3,6 +3,9 @@ common_ss.add(files('cpu-topology.c'))
 system_ss.add(files('core.c'))
 system_ss.add(files('cpu-slot.c'))
 system_ss.add(when: 'CONFIG_CPU_CLUSTER', if_true: files('cluster.c'))
+system_ss.add(files('die.c'))
+system_ss.add(files('module.c'))
+system_ss.add(files('socket.c'))
 
 system_ss.add(when: 'CONFIG_ARM11MPCORE', if_true: files('arm11mpcore.c'))
 system_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview_mpcore.c'))
diff --git a/hw/cpu/module.c b/hw/cpu/module.c
new file mode 100644
index ..b6f50a2ba588
--- /dev/null
+++ b/hw/cpu/module.c
@@ -0,0 +1,34 @@
+/*
+ * CPU module abstract device
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cpu/module.h"
+
+static void cpu_module_class_init(ObjectClass *oc, void *data)
+{
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+tc->level = CPU_TOPOLOGY_LEVEL_MODULE;
+}
+
+static const TypeInfo cpu_module_type_info = {
+.name = TYPE_CPU_MODULE,
+.parent = TYPE_CPU_TOPO,
+.class_init = cpu_module_class_init,
+.instance_size = sizeof(CPUModule),
+};
+
+static void cpu_module_register_types(void)
+{
+type_register_static(&cpu_module_type_info);
+}
+
+type_init(cpu_module_register_types)
diff --git a/hw/cpu/socket.c b/hw/cpu/socket.c
new file mode 100644
index 00000000..516e93389e11
--- /dev/null
+++ b/hw/cpu/socket.c
@@ -0,0 +1,34 @@
+/*
+ * CPU socket abstract device
+ *
+ * Copyright (C) 2024 Intel Corporation.
+ *
+ * Author: Zhao Liu 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cpu/socket.h"
+
+static void cpu_socket_class_init(ObjectClass *oc, void *data)
+{
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+tc->level = CPU_TOPOLOGY_LEVEL_SOCKET;
+}
+
+static const TypeInfo cpu_socket_type_info = {
+.name = TYPE_CPU_SOCKET,
+.parent = TYPE_CPU_TOPO,
+

[RFC v2 14/15] i386/cpu: Support CPU plugged in topology tree via bus-finder

2024-09-18 Thread Zhao Liu
Use topology sub IDs or APIC ID to locate parent topology device and
bus.

This process naturally verifies the correctness of topology-related IDs,
making it possible to drop the existing topology ID sanity checks once
x86 machine supports topology tree.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86-common.c  | 99 ++-
 include/hw/i386/x86.h |  2 +
 target/i386/cpu.c | 11 +
 3 files changed, 91 insertions(+), 21 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index a7f082b0a90b..d837aadc9dea 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -208,6 +208,65 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 }
 }
 
+static void x86_fixup_topo_ids(MachineState *ms, X86CPU *cpu)
+{
+/*
+ * die-id was optional in QEMU 4.0 and older, so keep it optional
+ * if there's only one die per socket.
+ */
+if (cpu->module_id < 0 && ms->smp.modules == 1) {
+cpu->module_id = 0;
+}
+
+/*
+ * module-id was optional in QEMU 9.0 and older, so keep it optional
+ * if there's only one module per die.
+ */
+if (cpu->die_id < 0 && ms->smp.dies == 1) {
+cpu->die_id = 0;
+}
+}
+
+BusState *x86_cpu_get_parent_bus(DeviceState *dev)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+X86MachineState *x86ms = X86_MACHINE(ms);
+X86CPU *cpu = X86_CPU(dev);
+X86CPUTopoIDs topo_ids;
+X86CPUTopoInfo topo_info;
+BusState *bus;
+
+x86_fixup_topo_ids(ms, cpu);
+init_topo_info(&topo_info, x86ms);
+
+if (cpu->apic_id == UNASSIGNED_APIC_ID) {
+/* TODO: Make the thread_id and bus index of CPU the same. */
+topo_ids.smt_id = cpu->thread_id;
+topo_ids.core_id = cpu->core_id;
+topo_ids.module_id = cpu->module_id;
+topo_ids.die_id = cpu->die_id;
+topo_ids.pkg_id = cpu->socket_id;
+} else {
+x86_topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
+}
+
+bus = x86_find_topo_bus(ms, &topo_ids);
+
+/*
+ * If APIC ID is not set,
+ * set it based on socket/die/module/core/thread properties.
+ *
+ * The children walking result proves topo ids are valid.
+ * Though module and die are optional, topology tree will create
+ * at least 1 instance by default if the machine supports.
+ */
+if (bus && cpu->apic_id == UNASSIGNED_APIC_ID) {
+cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
+}
+
+return bus;
+}
+
 void x86_rtc_set_cpus_count(ISADevice *s, uint16_t cpus_count)
 {
 MC146818RtcState *rtc = MC146818_RTC(s);
@@ -340,6 +399,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 X86CPU *cpu = X86_CPU(dev);
 CPUX86State *env = &cpu->env;
 MachineState *ms = MACHINE(hotplug_dev);
+MachineClass *mc = MACHINE_GET_CLASS(ms);
 X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
 unsigned int smp_cores = ms->smp.cores;
 unsigned int smp_threads = ms->smp.threads;
@@ -374,26 +434,9 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 set_bit(CPU_TOPOLOGY_LEVEL_DIE, env->avail_cpu_topo);
 }
 
-/*
- * If APIC ID is not set,
- * set it based on socket/die/module/core/thread properties.
- */
-if (cpu->apic_id == UNASSIGNED_APIC_ID) {
-/*
- * die-id was optional in QEMU 4.0 and older, so keep it optional
- * if there's only one die per socket.
- */
-if (cpu->die_id < 0 && ms->smp.dies == 1) {
-cpu->die_id = 0;
-}
-
-/*
- * module-id was optional in QEMU 9.0 and older, so keep it optional
- * if there's only one module per die.
- */
-if (cpu->module_id < 0 && ms->smp.modules == 1) {
-cpu->module_id = 0;
-}
+if (cpu->apic_id == UNASSIGNED_APIC_ID &&
+!mc->smp_props.topo_tree_supported) {
+x86_fixup_topo_ids(ms, cpu);
 
 if (cpu->socket_id < 0) {
 error_setg(errp, "CPU socket-id is not set");
@@ -409,7 +452,6 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 } else if (cpu->die_id > ms->smp.dies - 1) {
 error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u",
cpu->die_id, ms->smp.dies - 1);
-return;
 }
 if (cpu->module_id < 0) {
 error_setg(errp, "CPU module-id is not set");
@@ -442,6 +484,21 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 topo_ids.core_id = cpu->core_id;
 topo_ids.smt_id = cpu->thread_id;
 cpu->apic_id = x86_apicid_from_topo_ids(&topo_info, &topo_ids);
+} else if (cpu->apic_id == UNASSIGNED_APIC_ID 

[RFC v2 15/15] i386: Support topology device tree

2024-09-18 Thread Zhao Liu
Support complete QOM CPu topology tree for x86 machine, and specify
bus_type for x86 CPU so that all x86 CPUs will be added in the topology
tree.

Since the CPU slot make the machine as the hotplug handler for all
topology devices, hotplug related hooks may used to handle other
topology devices besides the CPU. Thus, make microvm not assume that
the device is only a CPU when implementing the relevant hooks.

Additionally, drop code paths that are not needed by the topology tree
implementation.

Signed-off-by: Zhao Liu 
---
 hw/i386/microvm.c| 13 +---
 hw/i386/x86-common.c | 78 +---
 hw/i386/x86.c|  2 ++
 target/i386/cpu.c|  2 ++
 4 files changed, 21 insertions(+), 74 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 40edcee7af29..49a897db50fc 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -417,16 +417,21 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
 static void microvm_device_pre_plug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
 {
-X86CPU *cpu = X86_CPU(dev);
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+X86CPU *cpu;
+cpu = X86_CPU(dev);
 
-cpu->host_phys_bits = true; /* need reliable phys-bits */
-x86_cpu_pre_plug(hotplug_dev, dev, errp);
+cpu->host_phys_bits = true; /* need reliable phys-bits */
+x86_cpu_pre_plug(hotplug_dev, dev, errp);
+}
 }
 
 static void microvm_device_plug_cb(HotplugHandler *hotplug_dev,
DeviceState *dev, Error **errp)
 {
-x86_cpu_plug(hotplug_dev, dev, errp);
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+x86_cpu_plug(hotplug_dev, dev, errp);
+}
 }
 
 static void microvm_device_unplug_request_cb(HotplugHandler *hotplug_dev,
diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index d837aadc9dea..75d4b2f3d43a 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -129,26 +129,18 @@ static void x86_cpu_new(X86MachineState *x86ms, int index,
 int64_t apic_id, Error **errp)
 {
 MachineState *ms = MACHINE(x86ms);
-MachineClass *mc = MACHINE_GET_CLASS(ms);
 Object *cpu = object_new(ms->cpu_type);
 DeviceState *dev = DEVICE(cpu);
 BusState *bus = NULL;
+X86CPUTopoIDs topo_ids;
+X86CPUTopoInfo topo_info;
 
-/*
- * Once x86 machine supports topo_tree_supported, x86 CPU would
- * also have bus_type.
- */
-if (mc->smp_props.topo_tree_supported) {
-X86CPUTopoIDs topo_ids;
-X86CPUTopoInfo topo_info;
-
-init_topo_info(&topo_info, x86ms);
-x86_topo_ids_from_apicid(apic_id, &topo_info, &topo_ids);
-bus = x86_find_topo_bus(ms, &topo_ids);
+init_topo_info(&topo_info, x86ms);
+x86_topo_ids_from_apicid(apic_id, &topo_info, &topo_ids);
+bus = x86_find_topo_bus(ms, &topo_ids);
 
-/* Only with dev->id, CPU can be inserted into topology tree. */
-dev->id = g_strdup_printf("%s[%d]", ms->cpu_type, index);
-}
+/* Only with dev->id, CPU can be inserted into topology tree. */
+dev->id = g_strdup_printf("%s[%d]", ms->cpu_type, index);
 
 if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
 goto out;
@@ -399,10 +391,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 X86CPU *cpu = X86_CPU(dev);
 CPUX86State *env = &cpu->env;
 MachineState *ms = MACHINE(hotplug_dev);
-MachineClass *mc = MACHINE_GET_CLASS(ms);
 X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
-unsigned int smp_cores = ms->smp.cores;
-unsigned int smp_threads = ms->smp.threads;
 X86CPUTopoInfo topo_info;
 
 if (!object_dynamic_cast(OBJECT(cpu), ms->cpu_type)) {
@@ -434,58 +423,7 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 set_bit(CPU_TOPOLOGY_LEVEL_DIE, env->avail_cpu_topo);
 }
 
-if (cpu->apic_id == UNASSIGNED_APIC_ID &&
-!mc->smp_props.topo_tree_supported) {
-x86_fixup_topo_ids(ms, cpu);
-
-if (cpu->socket_id < 0) {
-error_setg(errp, "CPU socket-id is not set");
-return;
-} else if (cpu->socket_id > ms->smp.sockets - 1) {
-error_setg(errp, "Invalid CPU socket-id: %u must be in range 0:%u",
-   cpu->socket_id, ms->smp.sockets - 1);
-return;
-}
-if (cpu->die_id < 0) {
-error_setg(errp, "CPU die-id is not set");
-return;
-} else if (cpu->die_id > ms->smp.dies - 1) {
-error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u",
-   cpu->die_id, ms->smp.dies - 1);
-}
-if (cpu->module_id < 0) {
-

[RFC v2 11/15] hw/core: Support topology tree in none machine for compatibility

2024-09-18 Thread Zhao Liu
None machine accepts any CPU types, even some CPUs may have the
bus_type.

To address this, set topo_tree_supported as true for none machine, then
none machine will have a CPU slot with CPU bus to collect any topology
device with bus_type specified.

And since arch_id_topo_level is not set, the topology devices will be
directly inserted under the CPU slot without being organized into a tree
structure.

For the CPUs without bus_type, topo_tree_supported will not affect them.

Signed-off-by: Zhao Liu 
---
 hw/core/null-machine.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/core/null-machine.c b/hw/core/null-machine.c
index f586a4bef543..101649f3e8c1 100644
--- a/hw/core/null-machine.c
+++ b/hw/core/null-machine.c
@@ -54,6 +54,11 @@ static void machine_none_machine_init(MachineClass *mc)
 mc->no_floppy = 1;
 mc->no_cdrom = 1;
 mc->no_sdcard = 1;
+/*
+ * For compatibility with arches and CPUs that already
+ * support topology tree.
+ */
+mc->smp_props.topo_tree_supported = true;
 }
 
 DEFINE_MACHINE("none", machine_none_machine_init)
-- 
2.34.1




[RFC v2 02/15] qdev: Add the interface to reparent the device

2024-09-18 Thread Zhao Liu
User created devices may need to adjust their default object parent or
parent bus.

User created devices are QOM parented to one of the peripheral
containers ("/peripheral" or "/peripheral-anon") in qdev_set_id() by
default. Sometimes, it is necessary to reparent a device to another
object to express the more accurate child<> relationship, as in the
cases of the PnvPHBRootPort device or subsequent topology devices.

The current pnv_phb_user_get_parent() implements such reparenting logic.
To allow it to be used by topology devices as well, transform it into a
generic qdev interface with custom device id ("default_id" parameter).

And add the code to handle the failure of object_property_add_child().

Signed-off-by: Zhao Liu 
---
 hw/core/qdev.c | 52 +
 hw/pci-host/pnv_phb.c  | 59 +-
 include/hw/qdev-core.h |  3 +++
 3 files changed, 67 insertions(+), 47 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 4429856eaddd..ff073cbff56d 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -143,6 +143,58 @@ bool qdev_set_parent_bus(DeviceState *dev, BusState *bus, 
Error **errp)
 return true;
 }
 
+/*
+ * Set the QOM parent and parent bus of an object child. If the device
+ * state associated with the child has an id, use it as QOM id.
+ * Otherwise use default_id as QOM id.
+ *
+ * This helper does both operations at the same time because setting
+ * a new QOM child will erase the bus parent of the device. This happens
+ * because object_unparent() will call object_property_del_child(),
+ * which in turn calls the property release callback prop->release if
+ * it's defined. In our case this callback is set to
+ * object_finalize_child_property(), which was assigned during the
+ * first object_property_add_child() call. This callback will end up
+ * calling device_unparent(), and this function removes the device
+ * from its parent bus.
+ *
+ * The QOM and parent bus to be set aren't necessarily related, so
+ * let's receive both as arguments.
+ */
+bool qdev_set_parent(DeviceState *dev, BusState *bus, Object *parent,
+ char *default_id, Error **errp)
+{
+Object *child = OBJECT(dev);
+ObjectProperty *prop;
+
+if (!dev->id && !default_id) {
+error_setg(errp, "unknown device id");
+return false;
+}
+
+if (child->parent == parent) {
+return true;
+}
+
+object_ref(child);
+object_unparent(child);
+prop =  object_property_add_child(parent,
+  dev->id ? dev->id : default_id,
+  child);
+object_unref(child);
+
+if (!prop) {
+error_setg(errp, "couldn't change parent");
+return false;
+}
+
+if (!qdev_set_parent_bus(dev, bus, errp)) {
+return false;
+}
+
+return true;
+}
+
 DeviceState *qdev_new(const char *name)
 {
 ObjectClass *oc = object_class_by_name(name);
diff --git a/hw/pci-host/pnv_phb.c b/hw/pci-host/pnv_phb.c
index d4c118d44362..a26e7b7aec5c 100644
--- a/hw/pci-host/pnv_phb.c
+++ b/hw/pci-host/pnv_phb.c
@@ -19,49 +19,6 @@
 #include "qom/object.h"
 #include "sysemu/sysemu.h"
 
-
-/*
- * Set the QOM parent and parent bus of an object child. If the device
- * state associated with the child has an id, use it as QOM id.
- * Otherwise use object_typename[index] as QOM id.
- *
- * This helper does both operations at the same time because setting
- * a new QOM child will erase the bus parent of the device. This happens
- * because object_unparent() will call object_property_del_child(),
- * which in turn calls the property release callback prop->release if
- * it's defined. In our case this callback is set to
- * object_finalize_child_property(), which was assigned during the
- * first object_property_add_child() call. This callback will end up
- * calling device_unparent(), and this function removes the device
- * from its parent bus.
- *
- * The QOM and parent bus to be set aren´t necessarily related, so
- * let's receive both as arguments.
- */
-static bool pnv_parent_fixup(Object *parent, BusState *parent_bus,
- Object *child, int index,
- Error **errp)
-{
-g_autofree char *default_id =
-g_strdup_printf("%s[%d]", object_get_typename(child), index);
-const char *dev_id = DEVICE(child)->id;
-
-if (child->parent == parent) {
-return true;
-}
-
-object_ref(child);
-object_unparent(child);
-object_property_add_child(parent, dev_id ? dev_id : default_id, child);
-object_unref(child);
-
-if (!qdev_set_parent_bus(DEVICE(child), parent_bus, errp)) {
-return false;
-}
-
-return true;
-}
-
 static Object *pnv_phb_user_get_parent(PnvChip *chip, PnvPHB *phb, Err

[RFC v2 12/15] hw/i386: Allow i386 to create new CPUs in topology tree

2024-09-18 Thread Zhao Liu
For x86, CPU's apic ID represent its topology path and is the
combination of topology sub IDs in each leavl.

When x86 machine creates CPUs, to insert the CPU into topology tree, use
apic ID to get topology sub IDs.

Then search the topology tree for the corresponding parent topology
device and insert the CPU into the CPU bus of the parent device.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86-common.c | 101 +--
 1 file changed, 97 insertions(+), 4 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index b21d2ab97349..a7f082b0a90b 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -53,14 +53,107 @@
 /* Physical Address of PVH entry point read from kernel ELF NOTE */
 static size_t pvh_start_addr;
 
-static void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
+static int x86_cpu_get_topo_id(const X86CPUTopoIDs *topo_ids,
+   CpuTopologyLevel level)
 {
-Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
+switch (level) {
+case CPU_TOPOLOGY_LEVEL_THREAD:
+return topo_ids->smt_id;
+case CPU_TOPOLOGY_LEVEL_CORE:
+return topo_ids->core_id;
+case CPU_TOPOLOGY_LEVEL_MODULE:
+return topo_ids->module_id;
+case CPU_TOPOLOGY_LEVEL_DIE:
+return topo_ids->die_id;
+case CPU_TOPOLOGY_LEVEL_SOCKET:
+return topo_ids->pkg_id;
+default:
+g_assert_not_reached();
+}
+
+return -1;
+}
+
+typedef struct SearchCoreCb {
+const X86CPUTopoIDs *topo_ids;
+const CPUTopoState *parent;
+} SearchCoreCb;
+
+static int x86_search_topo_parent(DeviceState *dev, void *opaque)
+{
+CPUTopoState *topo = CPU_TOPO(dev);
+CpuTopologyLevel level = GET_CPU_TOPO_LEVEL(topo);
+SearchCoreCb *cb = opaque;
+int topo_id, index;
+
+topo_id = x86_cpu_get_topo_id(cb->topo_ids, level);
+index = cpu_topo_get_index(topo);
+
+if (topo_id < 0) {
+error_report("Invalid %s-id: %d",
+ CpuTopologyLevel_str(level), topo_id);
+error_printf("Try to set the %s-id in [0-%d].\n",
+ CpuTopologyLevel_str(level),
+ cpu_topo_get_instances_num(topo) - 1);
+return TOPO_FOREACH_ERR;
+}
+
+if (topo_id == index) {
+if (level == CPU_TOPOLOGY_LEVEL_CORE) {
+cb->parent = topo;
+/* The error result could exit directly. */
+return TOPO_FOREACH_ERR;
+}
+return TOPO_FOREACH_CONTINUE;
+}
+return TOPO_FOREACH_END;
+}
+
+static BusState *x86_find_topo_bus(MachineState *ms, X86CPUTopoIDs *topo_ids)
+{
+SearchCoreCb cb;
+
+cb.topo_ids = topo_ids;
+cb.parent = NULL;
+qbus_walk_children(BUS(&ms->topo->bus), x86_search_topo_parent,
+   NULL, NULL, NULL, &cb);
+
+if (!cb.parent) {
+return NULL;
+}
+
+return BUS(cb.parent->bus);
+}
+
+static void x86_cpu_new(X86MachineState *x86ms, int index,
+int64_t apic_id, Error **errp)
+{
+MachineState *ms = MACHINE(x86ms);
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+Object *cpu = object_new(ms->cpu_type);
+DeviceState *dev = DEVICE(cpu);
+BusState *bus = NULL;
+
+/*
+ * Once x86 machine supports topo_tree_supported, x86 CPU would
+ * also have bus_type.
+ */
+if (mc->smp_props.topo_tree_supported) {
+X86CPUTopoIDs topo_ids;
+X86CPUTopoInfo topo_info;
+
+init_topo_info(&topo_info, x86ms);
+x86_topo_ids_from_apicid(apic_id, &topo_info, &topo_ids);
+bus = x86_find_topo_bus(ms, &topo_ids);
+
+/* Only with dev->id, CPU can be inserted into topology tree. */
+dev->id = g_strdup_printf("%s[%d]", ms->cpu_type, index);
+}
 
 if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
 goto out;
 }
-qdev_realize(DEVICE(cpu), NULL, errp);
+qdev_realize(dev, bus, errp);
 
 out:
 object_unref(cpu);
@@ -111,7 +204,7 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 
 possible_cpus = mc->possible_cpu_arch_ids(ms);
 for (i = 0; i < ms->smp.cpus; i++) {
-x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
+x86_cpu_new(x86ms, i, possible_cpus->cpus[i].arch_id, &error_fatal);
 }
 }
 
-- 
2.34.1




[RFC v2 07/15] hw/core/cpu: Convert CPU from general device to topology device

2024-09-18 Thread Zhao Liu
Convert CPU to topology device then it can be added into topology tree.

Because CPU then inherits properties and settings of topology device,
make the following changes to take into account the special case for CPU:

 * Omit setting category since topology device has already set.

 * Make realize() of topology device as the parent realize().

 * Clean up some cases that assume parent obj is DeviceState and access
   parent_obj directly.

 * Set CPU's topology level as thread.

 * And one complex change: mask bus_type as NULL.

- This is because for the arches don't support topology tree,
  there's no CPU bus bridge so that CPUs of these arches can't be
  created. So, only the CPU with arch supporting topology tree
  should override the bus_type field.

 * Further, support cpu_create() for the CPU with bus_type.

- This is a corner case, some arch CPUs may set bus_type, and
  cpu_create() would be called in system emulation case (e.g., none
  machine). To handle such case, try to find the machine's CPU bus
  in cpu_create().

Signed-off-by: Zhao Liu 
---
 accel/kvm/kvm-all.c   |  4 ++--
 hw/core/cpu-common.c  | 42 +-
 include/hw/core/cpu.h |  7 +--
 target/ppc/kvm.c  |  2 +-
 4 files changed, 45 insertions(+), 10 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index beb1988d12cf..48c040f6861d 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -4173,7 +4173,7 @@ static void query_stats(StatsResultList **result, 
StatsTarget target,
 break;
 case STATS_TARGET_VCPU:
 add_stats_entry(result, STATS_PROVIDER_KVM,
-cpu->parent_obj.canonical_path,
+DEVICE(cpu)->canonical_path,
 stats_list);
 break;
 default:
@@ -4265,7 +4265,7 @@ static void query_stats_cb(StatsResultList **result, 
StatsTarget target,
 stats_args.names = names;
 stats_args.errp = errp;
 CPU_FOREACH(cpu) {
-if (!apply_str_list_filter(cpu->parent_obj.canonical_path, 
targets)) {
+if (!apply_str_list_filter(DEVICE(cpu)->canonical_path, targets)) {
 continue;
 }
 query_stats_vcpu(cpu, &stats_args);
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 7982ecd39a53..08f2d536ff6d 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -57,7 +57,19 @@ CPUState *cpu_create(const char *typename)
 {
 Error *err = NULL;
 CPUState *cpu = CPU(object_new(typename));
-if (!qdev_realize(DEVICE(cpu), NULL, &err)) {
+BusState *bus = NULL;
+
+if (DEVICE_GET_CLASS(cpu)->bus_type) {
+MachineState *ms;
+
+ms = (MachineState *)object_dynamic_cast(qdev_get_machine(),
+ TYPE_MACHINE);
+if (ms) {
+bus = BUS(&ms->topo->bus);
+}
+}
+
+if (!qdev_realize(DEVICE(cpu), bus, &err)) {
 error_report_err(err);
 object_unref(OBJECT(cpu));
 exit(EXIT_FAILURE);
@@ -196,6 +208,12 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
 {
 CPUState *cpu = CPU(dev);
 Object *machine = qdev_get_machine();
+CPUClass *cc = CPU_GET_CLASS(cpu);
+
+cc->parent_realize(dev, errp);
+if (*errp) {
+return;
+}
 
 /* qdev_get_machine() can return something that's not TYPE_MACHINE
  * if this is one of the user-only emulators; in that case there's
@@ -302,6 +320,7 @@ static void cpu_common_class_init(ObjectClass *klass, void 
*data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 ResettableClass *rc = RESETTABLE_CLASS(klass);
+CPUTopoClass *tc = CPU_TOPO_CLASS(klass);
 CPUClass *k = CPU_CLASS(klass);
 
 k->parse_features = cpu_common_parse_features;
@@ -309,9 +328,6 @@ static void cpu_common_class_init(ObjectClass *klass, void 
*data)
 k->has_work = cpu_common_has_work;
 k->gdb_read_register = cpu_common_gdb_read_register;
 k->gdb_write_register = cpu_common_gdb_write_register;
-set_bit(DEVICE_CATEGORY_CPU, dc->categories);
-dc->realize = cpu_common_realizefn;
-dc->unrealize = cpu_common_unrealizefn;
 rc->phases.hold = cpu_common_reset_hold;
 cpu_class_init_props(dc);
 /*
@@ -319,11 +335,27 @@ static void cpu_common_class_init(ObjectClass *klass, 
void *data)
  * IRQs, adding reset handlers, halting non-first CPUs, ...
  */
 dc->user_creatable = false;
+/*
+ * CPU is the minimum granularity for hotplug in most case, and
+ * often its hotplug handler is ultimately decided by the machine.
+ * For generality, set this flag to avoid blocking possible hotplug
+ * support.
+ */
+dc->hotpluggable = true;
+device_class_set_parent_realize(dc, cpu_common_realizefn,
+

[RFC v2 00/15] qom-topo: Abstract CPU Topology Level to Topology Device

2024-09-18 Thread Zhao Liu
tes CPUs in
possible_cpus[], and the CPU topology abstraction under the CPU device
will insert the created CPUs into the topology tree based on the topology
ID properties, so that a topology tree is completed.


5. Future TODOs
===

The current QOM topology RFC is only the very first step to introduce
the most basic QOM support, and it tries to be as compatible as possible
with existing SMP facilities.

The ultimate goal is to completely replace the current smp-related
topology structures with cpu-slot.

There are many TODOs:

* Add unit tests.
* Support QOM topology for all architectures.
* Get rid of MachineState.smp and MachineClass.smp_props with cpu-slot.
...


6. Patch Summary


Patch 01-02: Necessary change for qdev to support CPU topology via bus.
Ptach 03-06: Introduce CPU topology device abstraction and CPU slot in
 machine.
Patch 07-09: Abstract all topology levels to topology devices (for x86).
Patch 10-11: Build topology tree for machine.
Patch 12-15: Enable topology tree for x86 machine.


7. Reference


[0]:  Daniel's suggestion about QOM topology:
  https://lore.kernel.org/qemu-devel/y+o9viv64mjxt...@redhat.com/
[1]:  [RFC 00/41] qom-topo: Abstract Everything about CPU Topology
  
https://lore.kernel.org/qemu-devel/20231130144203.2307629-1-zhao1@linux.intel.com/
[2]:  [PATCH v2 0/7] Introduce SMP Cache Topology
  
https://lore.kernel.org/qemu-devel/20240908125920.1160236-1-zhao1@intel.com/
[3]:  Heterogeneous computing:
  https://en.wikipedia.org/wiki/Heterogeneous_computing
[4]:  12th gen’s Intel hybrid technology:
  
https://www.intel.com/content/www/us/en/support/articles/91896/processors.html
[5]:  Intel Meteor Lake (14th gen) architecture overview:
  
https://www.intel.com/content/www/us/en/content-details/788851/meteor-lake-architecture-overview.html
[6]:  Need of ARM heterogeneous cache topology (by Yanan):
  https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg05139.html
[7]:  Cache topology implementation for i386:
  https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg08251.html
[8]:  Cluster for ARM to define shared L2 cache and L3 tag (by Yanan):
  https://lore.kernel.org/all/20211228092221.21068-1-wangyana...@huawei.com/
[9]:  S390x topology document (by Nina):
  https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg04842.html
[10]: [PATCH v1 0/4] prebuild cpu QOM tree /machine/node/socket/core
  ->link-cpu (by Fan):
  
https://lore.kernel.org/all/cover.1395217538.git.chen.fan.f...@cn.fujitsu.com/
[11]: [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling,
  part 1 (by Andreas):
  
https://lore.kernel.org/all/1427131923-4670-1-git-send-email-afaer...@suse.de/
[12]: "Now we want to have similar QOM tree for introspection which
      helps express topology as well" (by Igor):
  
https://lore.kernel.org/all/20150407170734.51faac90@igors-macbook-pro.local/
[13]: [for-2.7 PATCH v3 06/15] cpu: Abstract CPU core type (by Bharata)
  
https://lore.kernel.org/all/1463024905-28401-7-git-send-email-bhar...@linux.vnet.ibm.com/
[14]: [PATCH v8 01/16] hw/cpu: introduce CPU clusters (by Luc):
  
https://lore.kernel.org/all/20181207090135.7651-2-luc.mic...@greensocs.com/

Thanks and Best Regards,
Zhao

---
Zhao Liu (15):
  qdev: Add pointer to BusChild in DeviceState
  qdev: Add the interface to reparent the device
  hw/cpu: Introduce CPU topology device and CPU bus
  hw/cpu: Introduce CPU slot to manage CPU topology
  qdev: Add method in BusClass to customize device index
  hw/core: Create CPU slot in MachineState to manage CPU topology tree
  hw/core/cpu: Convert CPU from general device to topology device
  hw/cpu/core: Convert cpu-core from general device to topology device
  hw/cpu: Abstract module/die/socket levels as topology devices
  hw/machine: Build smp topology tree from -smp
  hw/core: Support topology tree in none machine for compatibility
  hw/i386: Allow i386 to create new CPUs in topology tree
  system/qdev-monitor: Introduce bus-finder interface for compatibility
with bus-less plug behavior
  i386/cpu: Support CPU plugged in topology tree via bus-finder
  i386: Support topology device tree

 MAINTAINERS |  12 ++
 accel/kvm/kvm-all.c |   4 +-
 hw/core/cpu-common.c|  42 +++-
 hw/core/machine.c   |   7 +
 hw/core/null-machine.c  |   5 +
 hw/core/qdev.c  |  89 +++--
 hw/cpu/core.c   |   9 +-
 hw/cpu/cpu-slot.c   | 327 
 hw/cpu/cpu-topology.c   | 216 +
 hw/cpu/die.c|  34 
 hw/cpu/meson.build  |   6 +
 hw/cpu/module.c |  34 
 hw/cpu/socket.c |  34 
 hw/i386/microvm.c   |  13 +-
 hw/i386/x86-common.c| 226 +++---
 hw/i386/x86.c   

[RFC v2 06/15] hw/core: Create CPU slot in MachineState to manage CPU topology tree

2024-09-18 Thread Zhao Liu
With CPU slot support, the machine can manage the CPU topology tree. To
enable hot-plug support for topology devices, use the machine as the
hotplug handler for the CPU bus.

Additionally, since not all machines support the topology tree from the
start, add a "topo_tree_supported" flag to indicate whether a machine
supports the topology tree. And create the CPU slot as the topology root
only for machines that support it.

Signed-off-by: Zhao Liu 
---
 hw/core/machine.c |  2 ++
 hw/cpu/cpu-slot.c | 34 ++
 include/hw/boards.h   |  9 +
 include/hw/cpu/cpu-slot.h |  2 ++
 system/vl.c   |  4 
 5 files changed, 51 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 518beb9f883a..b6258d95b1e8 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1239,6 +1239,8 @@ static void machine_initfn(Object *obj)
 ms->smp_cache.props[i].topology = CPU_TOPOLOGY_LEVEL_DEFAULT;
 }
 
+ms->topo = NULL;
+
 machine_copy_boot_config(ms, &(BootConfiguration){ 0 });
 }
 
diff --git a/hw/cpu/cpu-slot.c b/hw/cpu/cpu-slot.c
index 66ef8d9faa97..4dbd5b7b7e00 100644
--- a/hw/cpu/cpu-slot.c
+++ b/hw/cpu/cpu-slot.c
@@ -138,3 +138,37 @@ static void cpu_slot_register_types(void)
 }
 
 type_init(cpu_slot_register_types)
+
+void machine_plug_cpu_slot(MachineState *ms)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+CPUSlot *slot;
+
+slot = CPU_SLOT(qdev_new(TYPE_CPU_SLOT));
+set_bit(CPU_TOPOLOGY_LEVEL_THREAD, slot->supported_levels);
+set_bit(CPU_TOPOLOGY_LEVEL_CORE, slot->supported_levels);
+set_bit(CPU_TOPOLOGY_LEVEL_SOCKET, slot->supported_levels);
+
+/*
+ * Now just consider the levels that x86 supports.
+ * TODO: Supports other levels.
+ */
+if (mc->smp_props.modules_supported) {
+set_bit(CPU_TOPOLOGY_LEVEL_MODULE, slot->supported_levels);
+}
+
+if (mc->smp_props.dies_supported) {
+set_bit(CPU_TOPOLOGY_LEVEL_DIE, slot->supported_levels);
+}
+
+ms->topo = slot;
+object_property_add_child(container_get(OBJECT(ms), "/peripheral"),
+  "cpu-slot", OBJECT(ms->topo));
+DEVICE(ms->topo)->id = g_strdup_printf("%s", "cpu-slot");
+
+sysbus_realize(SYS_BUS_DEVICE(slot), &error_abort);
+
+if (mc->get_hotplug_handler) {
+qbus_set_hotplug_handler(BUS(&slot->bus), OBJECT(ms));
+}
+}
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 2dd8decf640a..eeb4e7e2ce9f 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -10,6 +10,7 @@
 #include "qemu/module.h"
 #include "qom/object.h"
 #include "hw/core/cpu.h"
+#include "hw/cpu/cpu-slot.h"
 
 #define TYPE_MACHINE_SUFFIX "-machine"
 
@@ -152,6 +153,8 @@ typedef struct {
  * @modules_supported - whether modules are supported by the machine
  * @cache_supported - whether cache topologies (l1d, l1i, l2 and l3) are
  *supported by the machine
+ * @topo_tree_supported - whether QOM topology tree is supported by the
+ *machine
  */
 typedef struct {
 bool prefer_sockets;
@@ -162,6 +165,7 @@ typedef struct {
 bool drawers_supported;
 bool modules_supported;
 bool cache_supported[CACHE_LEVEL_AND_TYPE__MAX];
+bool topo_tree_supported;
 } SMPCompatProps;
 
 /**
@@ -431,6 +435,11 @@ struct MachineState {
 CPUArchIdList *possible_cpus;
 CpuTopology smp;
 SmpCache smp_cache;
+/*
+ * TODO: get rid of "smp" and merge it into "topo" when all arches
+ * support QOM topology.
+ */
+CPUSlot *topo;
 struct NVDIMMState *nvdimms_state;
 struct NumaState *numa_state;
 };
diff --git a/include/hw/cpu/cpu-slot.h b/include/hw/cpu/cpu-slot.h
index 9d02d5de578e..24e122013bf7 100644
--- a/include/hw/cpu/cpu-slot.h
+++ b/include/hw/cpu/cpu-slot.h
@@ -69,4 +69,6 @@ struct CPUSlot {
 DeviceListener listener;
 };
 
+void machine_plug_cpu_slot(MachineState *ms);
+
 #endif /* CPU_SLOT_H */
diff --git a/system/vl.c b/system/vl.c
index fe547ca47c27..193e7049ccbe 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2151,6 +2151,10 @@ static void qemu_create_machine(QDict *qdict)
   false, &error_abort);
 qobject_unref(default_opts);
 }
+
+if (machine_class->smp_props.topo_tree_supported) {
+machine_plug_cpu_slot(current_machine);
+}
 }
 
 static int global_init_func(void *opaque, QemuOpts *opts, Error **errp)
-- 
2.34.1




[RFC v2 10/15] hw/machine: Build smp topology tree from -smp

2024-09-18 Thread Zhao Liu
For architectures supports QOM topology (indicated by the MachineClass.
topo_tree_supported field), implement smp QOM topology tree from
MachineState.smp.

The topology tree is created before MachineClass.init(), where arch
will initialize CPUs or cores, corresponding to the
MachineState.possible_cpus[].

To avoid conflicts with CPU/core generation in the arch machine,
create_smp_topo_children() will only create topology levels which
are higher than the granularity of possible_cpus[]. The remaining
topology parts will be completed by the arch machine during machine
init().

There's a new field, arch_id_topo_level, to indicate the granularity of
possible_cpus[]. While this field is set, CPU slot can create the
topology tree level by level. Without this field, any topology device
will be collect at the CPU bus of the CPU slot and will not be organized
into a tree structure.

Signed-off-by: Zhao Liu 
---
 hw/core/machine.c |   5 ++
 hw/cpu/cpu-slot.c | 153 ++
 include/hw/boards.h   |   2 +
 include/hw/cpu/cpu-slot.h |   5 ++
 include/qemu/bitops.h |   5 ++
 5 files changed, 170 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index b6258d95b1e8..076bd365197b 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1638,6 +1638,11 @@ void machine_run_board_init(MachineState *machine, const 
char *mem_path, Error *
"on", false);
 }
 
+if (machine_class->smp_props.topo_tree_supported &&
+!machine_create_topo_tree(machine, errp)) {
+return;
+}
+
 accel_init_interfaces(ACCEL_GET_CLASS(machine->accelerator));
 machine_class->init(machine);
 phase_advance(PHASE_MACHINE_INITIALIZED);
diff --git a/hw/cpu/cpu-slot.c b/hw/cpu/cpu-slot.c
index 4dbd5b7b7e00..1cc3b32ed675 100644
--- a/hw/cpu/cpu-slot.c
+++ b/hw/cpu/cpu-slot.c
@@ -12,8 +12,12 @@
 #include "qemu/osdep.h"
 
 #include "hw/boards.h"
+#include "hw/cpu/core.h"
 #include "hw/cpu/cpu-slot.h"
 #include "hw/cpu/cpu-topology.h"
+#include "hw/cpu/die.h"
+#include "hw/cpu/module.h"
+#include "hw/cpu/socket.h"
 #include "hw/qdev-core.h"
 #include "hw/qdev-properties.h"
 #include "hw/sysbus.h"
@@ -172,3 +176,152 @@ void machine_plug_cpu_slot(MachineState *ms)
 qbus_set_hotplug_handler(BUS(&slot->bus), OBJECT(ms));
 }
 }
+
+static int get_smp_info_by_level(const CpuTopology *smp_info,
+ CpuTopologyLevel child_level)
+{
+switch (child_level) {
+case CPU_TOPOLOGY_LEVEL_THREAD:
+return smp_info->threads;
+case CPU_TOPOLOGY_LEVEL_CORE:
+return smp_info->cores;
+case CPU_TOPOLOGY_LEVEL_MODULE:
+return smp_info->modules;
+case CPU_TOPOLOGY_LEVEL_DIE:
+return smp_info->dies;
+case CPU_TOPOLOGY_LEVEL_SOCKET:
+return smp_info->sockets;
+default:
+/* TODO: Add support for other levels. */
+g_assert_not_reached();
+}
+
+return 0;
+}
+
+static const char *get_topo_typename_by_level(CpuTopologyLevel level)
+{
+switch (level) {
+case CPU_TOPOLOGY_LEVEL_CORE:
+return TYPE_CPU_CORE;
+case CPU_TOPOLOGY_LEVEL_MODULE:
+return TYPE_CPU_MODULE;
+case CPU_TOPOLOGY_LEVEL_DIE:
+return TYPE_CPU_DIE;
+case CPU_TOPOLOGY_LEVEL_SOCKET:
+return TYPE_CPU_SOCKET;
+default:
+/* TODO: Add support for other levels. */
+g_assert_not_reached();
+}
+
+return NULL;
+}
+
+typedef struct SMPBuildCbData {
+DECLARE_BITMAP(create_levels, CPU_TOPOLOGY_LEVEL__MAX);
+const CpuTopology *smp_info;
+CPUTopoStat *stat;
+Error **errp;
+} SMPBuildCbData;
+
+static int create_smp_topo_children(DeviceState *dev, void *opaque)
+{
+Object *parent = OBJECT(dev);
+CpuTopologyLevel child_level;
+SMPBuildCbData *cb = opaque;
+CPUTopoState *topo = NULL;
+BusState *qbus;
+CPUBusState *cbus;
+Error **errp = cb->errp;
+int max_children;
+
+if (object_dynamic_cast(parent, TYPE_CPU_TOPO)) {
+topo = CPU_TOPO(parent);
+CpuTopologyLevel parent_level;
+
+parent_level = GET_CPU_TOPO_LEVEL(topo);
+child_level = find_last_bit(cb->create_levels, parent_level);
+
+if (child_level == parent_level) {
+return TOPO_FOREACH_CONTINUE;
+}
+
+cbus = topo->bus;
+} else if (object_dynamic_cast(parent, TYPE_CPU_SLOT)) {
+child_level = find_last_bit(cb->create_levels, 
CPU_TOPOLOGY_LEVEL__MAX);
+cbus = &CPU_SLOT(parent)->bus;
+} else {
+return TOPO_FOREACH_ERR;
+}
+
+qbus = BUS(cbus);
+max_children = get_smp_info_by_level(cb->smp_info, child_level);
+for (int i = 0; i < max_children; i++) {
+DeviceState *child;
+
+child = q

[RFC v2 05/15] qdev: Add method in BusClass to customize device index

2024-09-18 Thread Zhao Liu
Currently, when the bus assigns an index to a child device, it relies on
a monotonically increasing max_index.

However, when a device is removed from the bus, its index is not
reassigned to new devices, leading to "holes" in child indices.

For topology devices, such as CPUs/cores, arches define custom
sub-topology IDs. Some of these IDs are global (e.g., core-id for core
devices), while others are local (e.g., thread-id/core-id/module-id for
x86 CPUs).

Local IDs are indexes under the same parent device and align with
BusChild's index meaning. Therefore, local IDs in a topology context
should use BusChild.index.

Considering that topology devices support hot-plug and local IDs often
have range constraints, add a new method (BusClass.assign_free_index) to
allow the bus to customize index assignment.

Based on this method, the CPU bus will search for free index "holes"
created by unplugging and assign these free indices to newly inserted
devices.

Signed-off-by: Zhao Liu 
---
 hw/core/qdev.c|  8 +++-
 hw/cpu/cpu-topology.c | 37 +++
 include/hw/cpu/cpu-topology.h |  1 +
 include/hw/qdev-core.h|  2 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index ff073cbff56d..e3e9f0f303d6 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -78,11 +78,17 @@ static void bus_remove_child(BusState *bus, DeviceState 
*child)
 
 static void bus_add_child(BusState *bus, DeviceState *child)
 {
+BusClass *bc = BUS_GET_CLASS(bus);
 char name[32];
 BusChild *kid = g_malloc0(sizeof(*kid));
 
+if (bc->assign_free_index) {
+kid->index = bc->assign_free_index(bus);
+} else {
+kid->index = bus->max_index++;
+}
+
 bus->num_children++;
-kid->index = bus->max_index++;
 kid->child = child;
 child->bus_node = kid;
 object_ref(OBJECT(kid->child));
diff --git a/hw/cpu/cpu-topology.c b/hw/cpu/cpu-topology.c
index e68c06132e7d..3e8982ff7e6c 100644
--- a/hw/cpu/cpu-topology.c
+++ b/hw/cpu/cpu-topology.c
@@ -49,11 +49,40 @@ static bool cpu_bus_check_address(BusState *bus, 
DeviceState *dev,
 return cpu_parent_check_topology(bus->parent, dev, errp);
 }
 
+static int cpu_bus_assign_free_index(BusState *bus)
+{
+BusChild *kid;
+int index;
+
+if (bus->num_children == bus->max_index) {
+return bus->max_index++;
+}
+
+assert(bus->num_children < bus->max_index);
+/* TODO: Introduce the list sorted by index */
+for (index = 0; index < bus->num_children; index++) {
+bool existed = false;
+
+QTAILQ_FOREACH(kid, &bus->children, sibling) {
+if (kid->index == index) {
+existed = true;
+break;
+}
+}
+
+if (!existed) {
+break;
+}
+}
+return index;
+}
+
 static void cpu_bus_class_init(ObjectClass *oc, void *data)
 {
 BusClass *bc = BUS_CLASS(oc);
 
 bc->check_address = cpu_bus_check_address;
+bc->assign_free_index = cpu_bus_assign_free_index;
 }
 
 static const TypeInfo cpu_bus_type_info = {
@@ -177,3 +206,11 @@ int cpu_topo_get_instances_num(CPUTopoState *topo)
 
 return bus ? bus->num_children : 1;
 }
+
+int cpu_topo_get_index(CPUTopoState *topo)
+{
+BusChild *node = DEVICE(topo)->bus_node;
+
+assert(node);
+return node->index;
+}
diff --git a/include/hw/cpu/cpu-topology.h b/include/hw/cpu/cpu-topology.h
index 7a447ad16ee7..80aeff18baa3 100644
--- a/include/hw/cpu/cpu-topology.h
+++ b/include/hw/cpu/cpu-topology.h
@@ -64,5 +64,6 @@ struct CPUTopoState {
 #define GET_CPU_TOPO_LEVEL(topo)(CPU_TOPO_GET_CLASS(topo)->level)
 
 int cpu_topo_get_instances_num(CPUTopoState *topo);
+int cpu_topo_get_index(CPUTopoState *topo);
 
 #endif /* CPU_TOPO_H */
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 7cbc5fb97298..77223b28c788 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -342,6 +342,8 @@ struct BusClass {
  */
 bool (*check_address)(BusState *bus, DeviceState *dev, Error **errp);
 
+int (*assign_free_index)(BusState *bus);
+
 BusRealize realize;
 BusUnrealize unrealize;
 
-- 
2.34.1




[RFC v2 01/15] qdev: Add pointer to BusChild in DeviceState

2024-09-18 Thread Zhao Liu
The device topology structures based on buses are unidirectional: the
parent device can access the child device through the BusChild within
the bus, but not vice versa.

For the CPU topology tree constructed on the device-bus, it is necessary
for the child device to be able to access the parent device via the
parent bus. To address this, introduce a pointer to the BusChild, named
bus_node.

This pointer also simplifies the logic of bus_remove_child(). Instead of
the parent bus needing to traverse the children list to locate the
corresponding BusChild, it can now directly find it using the bus_node
pointer.

Signed-off-by: Zhao Liu 
---
 hw/core/qdev.c  | 29 ++---
 include/hw/qdev-core.h  |  4 
 include/qemu/typedefs.h |  1 +
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index db36f54d914a..4429856eaddd 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -57,25 +57,23 @@ static void bus_free_bus_child(BusChild *kid)
 
 static void bus_remove_child(BusState *bus, DeviceState *child)
 {
-BusChild *kid;
-
-QTAILQ_FOREACH(kid, &bus->children, sibling) {
-if (kid->child == child) {
-char name[32];
+BusChild *kid = child->bus_node;
+char name[32];
 
-snprintf(name, sizeof(name), "child[%d]", kid->index);
-QTAILQ_REMOVE_RCU(&bus->children, kid, sibling);
+if (!kid) {
+return;
+}
 
-bus->num_children--;
+snprintf(name, sizeof(name), "child[%d]", kid->index);
+QTAILQ_REMOVE_RCU(&bus->children, kid, sibling);
+child->bus_node = NULL;
+bus->num_children--;
 
-/* This gives back ownership of kid->child back to us.  */
-object_property_del(OBJECT(bus), name);
+/* This gives back ownership of kid->child back to us.  */
+object_property_del(OBJECT(bus), name);
 
-/* free the bus kid, when it is safe to do so*/
-call_rcu(kid, bus_free_bus_child, rcu);
-break;
-}
-}
+/* free the bus kid, when it is safe to do so*/
+call_rcu(kid, bus_free_bus_child, rcu);
 }
 
 static void bus_add_child(BusState *bus, DeviceState *child)
@@ -86,6 +84,7 @@ static void bus_add_child(BusState *bus, DeviceState *child)
 bus->num_children++;
 kid->index = bus->max_index++;
 kid->child = child;
+child->bus_node = kid;
 object_ref(OBJECT(kid->child));
 
 QTAILQ_INSERT_HEAD_RCU(&bus->children, kid, sibling);
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index aa97c34a4be7..85c7d462dfba 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -253,6 +253,10 @@ struct DeviceState {
  * @parent_bus: bus this device belongs to
  */
 BusState *parent_bus;
+/**
+ * @bus_node: bus node inserted in parent bus
+ */
+BusChild *bus_node;
 /**
  * @gpios: QLIST of named GPIOs the device provides.
  */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 9d222dc37628..aef41c4e67ce 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -32,6 +32,7 @@ typedef struct BdrvDirtyBitmapIter BdrvDirtyBitmapIter;
 typedef struct BlockBackend BlockBackend;
 typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
+typedef struct BusChild BusChild;
 typedef struct BusClass BusClass;
 typedef struct BusState BusState;
 typedef struct Chardev Chardev;
-- 
2.34.1




Re: [PATCH RFC V3 01/29] arm/virt,target/arm: Add new ARMCPU {socket,cluster,core,thread}-id property

2024-09-09 Thread Zhao Liu
On Wed, Sep 04, 2024 at 05:37:21PM +, Salil Mehta wrote:
> Date: Wed, 4 Sep 2024 17:37:21 +
> From: Salil Mehta 
> Subject: RE: [PATCH RFC V3 01/29] arm/virt,target/arm: Add new ARMCPU
>  {socket,cluster,core,thread}-id property
> 
> Hi Zhao,
> 
> >  From: zhao1@intel.com 
> >  Sent: Wednesday, September 4, 2024 3:43 PM
> >  To: Salil Mehta 
> >  
> >  Hi Salil,
> >  
> >  On Mon, Aug 19, 2024 at 11:53:52AM +, Salil Mehta wrote:
> >  > Date: Mon, 19 Aug 2024 11:53:52 +
> >  > From: Salil Mehta 
> >  > Subject: RE: [PATCH RFC V3 01/29] arm/virt,target/arm: Add new ARMCPU
> >  > {socket,cluster,core,thread}-id property
> >  
> >  [snip]
> >  
> >  > >  > NULL); @@ -2708,6 +2716,7 @@ static const CPUArchIdList
> >  > > *virt_possible_cpu_arch_ids(MachineState *ms)
> >  > >  >   {
> >  > >  >   int n;
> >  > >  >   unsigned int max_cpus = ms->smp.max_cpus;
> >  > >  > +unsigned int smp_threads = ms->smp.threads;
> >  > >  >   VirtMachineState *vms = VIRT_MACHINE(ms);
> >  > >  >   MachineClass *mc = MACHINE_GET_CLASS(vms);
> >  > >  >
> >  > >  > @@ -2721,6 +2730,7 @@ static const CPUArchIdList
> >  > > *virt_possible_cpu_arch_ids(MachineState *ms)
> >  > >  >   ms->possible_cpus->len = max_cpus;
> >  > >  >   for (n = 0; n < ms->possible_cpus->len; n++) {
> >  > >  >   ms->possible_cpus->cpus[n].type = ms->cpu_type;
> >  > >  > +ms->possible_cpus->cpus[n].vcpus_count = smp_threads;
> >  > >  >   ms->possible_cpus->cpus[n].arch_id =
> >  > >  >   virt_cpu_mp_affinity(vms, n);
> >  > >  >
> >  > >
> >  > >  Why @vcpus_count is initialized to @smp_threads? it needs to be
> >  > > documented in the commit log.
> >  >
> >  >
> >  > Because every thread internally amounts to a vCPU in QOM and which is
> >  > in 1:1 relationship with KVM vCPU. AFAIK, QOM does not strictly
> >  > follows any architecture. Once you start to get into details of
> >  > threads there are many aspects of shared resources one will have to
> >  > consider and these can vary across different implementations of
> >  architecture.
> >  
> >  For SPAPR CPU, the granularity of >possible_cpus->cpus[] is "core", and for
> >  x86, it's "thread" granularity.
> 
> 
> We have threads per-core at microarchitecture level in ARM as well. But each
> thread appears like a vCPU to OS and AFAICS there are no special attributes
> attached to it. SMT can be enabled/disabled at firmware and should get
> reflected in the configuration accordingly i.e. value of *threads-per-core* 
> changes between 1 and 'N'.  This means 'vcpus_count' has to reflect the
> correct configuration. But I think threads lack proper representation
> in Qemu QOM.

In topology related part, SMT (of x86) usually represents the logical
processor level. And thread has the same meaning. To change these
meanings is also possible, but I think it should be based on the actual
use case. we can consider the complexity of the implementation when
there is a need.

> In Qemu, each vCPU reflects an execution context (which gets uniquely mapped
> to KVM vCPU). AFAICS, we only have *CPUState* (Struct ArchCPU) as a 
> placeholder
> for this execution context and there is no *ThreadState* (derived out of
> Struct CPUState). Hence, we've  to map all the threads as QOM vCPUs. This 
> means
> the array of present or possible CPUs represented by 'struct CPUArchIdList' 
> contains
> all execution contexts which actually might be vCPU or a thread. Hence, usage 
> of
> *vcpus_count* seems quite superficial to me frankly.
>
> Also, AFAICS, KVM does not have the concept of the threads and only has
> KVM vCPUs, but you are still allowed to specify the topology with sockets, 
> dies,
> clusters, cores, threads in most architectures.  
 
There are some uses for topology, such as it affects scheduling behavior,
and it affects feature emulation, etc.
  
> >  And smp.threads means how many threads in one core, so for x86, the
> >  vcpus_count of a "thread" is 1, and for spapr, the vcpus_count of a "core"
> >  equals to smp.threads.
> 
> 
> Sure, but does the KVM specifies this? 

At least as you said, KVM (for x86) doesn't consider higher-level topologies
at the moment, but that's not to say that it won't in the future, as certain
registers do have topology dependencies.

> and how does these threads map to the QOM vCPU objects or execution context?

Each CPU object will create a (software) thread, you can refer the
function "kvm_start_vcpu_thread(CPUState *cpu)", which will be called
when CPU object realizes.

> AFAICS there is nothing but 'CPUState'
> which will be made part of the  possible vCPU list 'struct CPUArchIdList'.
 
As I said, an example is spapr ("spapr_possible_cpu_arch_ids()"), which
maps possible_cpu to core object. However, this is a very specific
example, and like Igor's slides said, I understand it's an architectural
requirement.

> >  
> >  IIUC, your granularity is still "thread", so that this filed should be 1.
> 

[PATCH v2 4/7] hw/core: Check smp cache topology support for machine

2024-09-08 Thread Zhao Liu
Add cache_supported flags in SMPCompatProps to allow machines to
configure various caches support.

And check the compatibility of the cache properties with the
machine support in machine_parse_smp_cache().

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since Patch v1:
 * Dropped machine_check_smp_cache_support() and did the check when
   -machine parses smp-cache in machine_parse_smp_cache().

Changes since RFC v2:
 * Split as a separate commit to just include compatibility checking and
   topology checking.
 * Allow setting "default" topology level even though the cache
   isn't supported by machine. (Daniel)
---
 hw/core/machine-smp.c | 78 +++
 include/hw/boards.h   |  3 ++
 2 files changed, 81 insertions(+)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index b517c3471d1a..9a281946762f 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -261,10 +261,47 @@ void machine_parse_smp_config(MachineState *ms,
 }
 }
 
+static bool machine_check_topo_support(MachineState *ms,
+   CpuTopologyLevel topo,
+   Error **errp)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+if ((topo == CPU_TOPOLOGY_LEVEL_MODULE && 
!mc->smp_props.modules_supported) ||
+(topo == CPU_TOPOLOGY_LEVEL_CLUSTER && 
!mc->smp_props.clusters_supported) ||
+(topo == CPU_TOPOLOGY_LEVEL_DIE && !mc->smp_props.dies_supported) ||
+(topo == CPU_TOPOLOGY_LEVEL_BOOK && !mc->smp_props.books_supported) ||
+(topo == CPU_TOPOLOGY_LEVEL_DRAWER && 
!mc->smp_props.drawers_supported)) {
+error_setg(errp,
+   "Invalid topology level: %s. "
+   "The topology level is not supported by this machine",
+   CpuTopologyLevel_str(topo));
+return false;
+}
+
+return true;
+}
+
+/*
+ * When both cache1 and cache2 are configured with specific topology levels
+ * (not default level), is cache1's topology level higher than cache2?
+ */
+static bool smp_cache_topo_cmp(const SmpCache *smp_cache,
+   CacheLevelAndType cache1,
+   CacheLevelAndType cache2)
+{
+if (smp_cache->props[cache1].topology != CPU_TOPOLOGY_LEVEL_DEFAULT &&
+smp_cache->props[cache1].topology > smp_cache->props[cache2].topology) 
{
+return true;
+}
+return false;
+}
+
 bool machine_parse_smp_cache(MachineState *ms,
  const SmpCachePropertiesList *caches,
  Error **errp)
 {
+MachineClass *mc = MACHINE_GET_CLASS(ms);
 const SmpCachePropertiesList *node;
 DECLARE_BITMAP(caches_bitmap, CACHE_LEVEL_AND_TYPE__MAX);
 
@@ -293,6 +330,47 @@ bool machine_parse_smp_cache(MachineState *ms,
 }
 }
 
+for (int i = 0; i < CACHE_LEVEL_AND_TYPE__MAX; i++) {
+const SmpCacheProperties *props = &ms->smp_cache.props[i];
+
+/*
+ * Allow setting "default" topology level even though the cache
+ * isn't supported by machine.
+ */
+if (props->topology != CPU_TOPOLOGY_LEVEL_DEFAULT &&
+!mc->smp_props.cache_supported[props->cache]) {
+error_setg(errp,
+   "%s cache topology not supported by this machine",
+   CacheLevelAndType_str(node->value->cache));
+return false;
+}
+
+if (!machine_check_topo_support(ms, props->topology, errp)) {
+return false;
+}
+}
+
+if (smp_cache_topo_cmp(&ms->smp_cache,
+   CACHE_LEVEL_AND_TYPE_L1D,
+   CACHE_LEVEL_AND_TYPE_L2) ||
+smp_cache_topo_cmp(&ms->smp_cache,
+   CACHE_LEVEL_AND_TYPE_L1I,
+   CACHE_LEVEL_AND_TYPE_L2)) {
+error_setg(errp,
+   "Invalid smp cache topology. "
+   "L2 cache topology level shouldn't be lower than L1 cache");
+return false;
+}
+
+if (smp_cache_topo_cmp(&ms->smp_cache,
+   CACHE_LEVEL_AND_TYPE_L2,
+   CACHE_LEVEL_AND_TYPE_L3)) {
+error_setg(errp,
+   "Invalid smp cache topology. "
+   "L3 cache topology level shouldn't be lower than L2 cache");
+return false;
+}
+
 return true;
 }
 
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 64439dc7da2c..6c3cdfa15f50 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -150,6 +150,8 @@ typedef struct {
  * @books_supported - whether books are supported by the machine
  * @drawers_supported - whether drawers are su

[PATCH v2 7/7] i386/pc: Support cache topology in -machine for PC machine

2024-09-08 Thread Zhao Liu
Allow user to configure l1d, l1i, l2 and l3 cache topologies for PC
machine.

Additionally, add the document of "-machine smp-cache" in
qemu-options.hx.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since Patch v1:
 * Merged document into this patch. (Markus)

Changes since RFC v2:
 * Used cache_supported array.
---
 hw/i386/pc.c|  4 
 qemu-options.hx | 28 +++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ba0ff511836c..d562fd25aad2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1788,6 +1788,10 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 mc->nvdimm_supported = true;
 mc->smp_props.dies_supported = true;
 mc->smp_props.modules_supported = true;
+mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1D] = true;
+mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L1I] = true;
+mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L2] = true;
+mc->smp_props.cache_supported[CACHE_LEVEL_AND_TYPE_L3] = true;
 mc->default_ram_id = "pc.ram";
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_AUTO;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index d94e2cbbaeb1..3936ff3e77f9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -39,7 +39,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "memory-encryption=@var{} memory encryption object to use 
(default=none)\n"
 "hmat=on|off controls ACPI HMAT support (default=off)\n"
 "memory-backend='backend-id' specifies explicitly provided 
backend for main RAM (default=none)\n"
-"
cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n",
+"
cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n"
+"
smp-cache.0.cache=cachename,smp-cache.0.topology=topologylevel\n",
 QEMU_ARCH_ALL)
 SRST
 ``-machine [type=]name[,prop=value[,...]]``
@@ -159,6 +160,31 @@ SRST
 ::
 
 -machine 
cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512
+
+``smp-cache.0.cache=cachename,smp-cache.0.topology=topologylevel``
+Define cache properties (now only the cache topology level) for SMP
+system.
+
+``cache=cachename`` specifies the cache that the properties will be
+applied on. This field is the combination of cache level and cache
+type. Currently it supports ``l1d`` (L1 data cache), ``l1i`` (L1
+instruction cache), ``l2`` (L2 unified cache) and ``l3`` (L3 unified
+cache).
+
+``topology=topologylevel`` sets the cache topology level. It accepts
+CPU topology levels including ``thread``, ``core``, ``module``,
+``cluster``, ``die``, ``socket``, ``book``, ``drawer`` and a special
+value ``default``. If ``default`` is set, then the cache topology will
+follow the architecture's default cache topology model. If other CPU
+topology level is set, the cache will be shared at corresponding CPU
+topology level. For example, ``topology=core`` makes the cache shared
+in a core.
+
+Example:
+
+::
+
+-machine 
smp-cache.0.cache=l1d,smp-cache.0.topology=core,smp-cache.1.cache=l1i,smp-cache.1.topology=core
 ERST
 
 DEF("M", HAS_ARG, QEMU_OPTION_M,
-- 
2.34.1




[PATCH v2 3/7] hw/core: Add smp cache topology for machine

2024-09-08 Thread Zhao Liu
With smp-cache object support, add smp cache topology for machine by
linking the smp-cache object.

Also add a helper to access cache topology level.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since Patch v1:
 * Integrated cache properties list into MachineState and used -machine
   to configure SMP cache properties. (Markus)

Changes since RFC v2:
 * Linked machine's smp_cache to smp-cache object instead of a builtin
   structure. This is to get around the fact that the keyval format of
   -machine can't support JSON.
 * Wrapped the cache topology level access into a helper.
---
 hw/core/machine-smp.c | 41 
 hw/core/machine.c | 44 +++
 include/hw/boards.h   | 10 ++
 3 files changed, 95 insertions(+)

diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
index 5d8d7edcbd3f..b517c3471d1a 100644
--- a/hw/core/machine-smp.c
+++ b/hw/core/machine-smp.c
@@ -261,6 +261,41 @@ void machine_parse_smp_config(MachineState *ms,
 }
 }
 
+bool machine_parse_smp_cache(MachineState *ms,
+ const SmpCachePropertiesList *caches,
+ Error **errp)
+{
+const SmpCachePropertiesList *node;
+DECLARE_BITMAP(caches_bitmap, CACHE_LEVEL_AND_TYPE__MAX);
+
+for (node = caches; node; node = node->next) {
+/* Prohibit users from setting the cache topology level to invalid. */
+if (node->value->topology == CPU_TOPOLOGY_LEVEL_INVALID) {
+error_setg(errp,
+   "Invalid cache topology level: %s. "
+   "The topology should match the "
+   "valid CPU topology level",
+   CpuTopologyLevel_str(node->value->topology));
+return false;
+}
+
+/* Prohibit users from repeating settings. */
+if (test_bit(node->value->cache, caches_bitmap)) {
+error_setg(errp,
+   "Invalid cache properties: %s. "
+   "The cache properties are duplicated",
+   CacheLevelAndType_str(node->value->cache));
+return false;
+} else {
+ms->smp_cache.props[node->value->cache].topology =
+node->value->topology;
+set_bit(node->value->cache, caches_bitmap);
+}
+}
+
+return true;
+}
+
 unsigned int machine_topo_get_cores_per_socket(const MachineState *ms)
 {
 return ms->smp.cores * ms->smp.modules * ms->smp.clusters * ms->smp.dies;
@@ -270,3 +305,9 @@ unsigned int machine_topo_get_threads_per_socket(const 
MachineState *ms)
 {
 return ms->smp.threads * machine_topo_get_cores_per_socket(ms);
 }
+
+CpuTopologyLevel machine_get_cache_topo_level(const MachineState *ms,
+  CacheLevelAndType cache)
+{
+return ms->smp_cache.props[cache].topology;
+}
diff --git a/hw/core/machine.c b/hw/core/machine.c
index adaba17ebac1..518beb9f883a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -932,6 +932,40 @@ static void machine_set_smp(Object *obj, Visitor *v, const 
char *name,
 machine_parse_smp_config(ms, config, errp);
 }
 
+static void machine_get_smp_cache(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+SmpCache *cache = &ms->smp_cache;
+SmpCachePropertiesList *head = NULL;
+SmpCachePropertiesList **tail = &head;
+
+for (int i = 0; i < CACHE_LEVEL_AND_TYPE__MAX; i++) {
+SmpCacheProperties *node = g_new(SmpCacheProperties, 1);
+
+node->cache = cache->props[i].cache;
+node->topology = cache->props[i].topology;
+QAPI_LIST_APPEND(tail, node);
+}
+
+visit_type_SmpCachePropertiesList(v, name, &head, errp);
+qapi_free_SmpCachePropertiesList(head);
+}
+
+static void machine_set_smp_cache(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+SmpCachePropertiesList *caches;
+
+if (!visit_type_SmpCachePropertiesList(v, name, &caches, errp)) {
+return;
+}
+
+machine_parse_smp_cache(ms, caches, errp);
+qapi_free_SmpCachePropertiesList(caches);
+}
+
 static void machine_get_boot(Object *obj, Visitor *v, const char *name,
 void *opaque, Error **errp)
 {
@@ -1057,6 +1091,11 @@ static void machine_class_init(ObjectClass *oc, void 
*data)
 object_class_property_set_description(oc, "smp",
 "CPU topology");
 
+object_class_property_add(oc, "smp-cache", "SmpCachePropertiesWrapper",
+machine_get_smp_cache, machine_set_smp_cache, NULL, NULL);
+o

[PATCH v2 6/7] i386/cpu: Update cache topology with machine's configuration

2024-09-08 Thread Zhao Liu
User will configure smp cache topology via -machine smp-cache.

For this case, update the x86 CPUs' cache topology with user's
configuration in MachineState.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since RFC v2:
 * Used smp_cache array to override cache topology.
 * Wrapped the updating into a function.
---
 target/i386/cpu.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e9f755000356..6d9f7dc0872a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7597,6 +7597,38 @@ static void x86_cpu_hyperv_realize(X86CPU *cpu)
 cpu->hyperv_limits[2] = 0;
 }
 
+#ifndef CONFIG_USER_ONLY
+static void x86_cpu_update_smp_cache_topo(MachineState *ms, X86CPU *cpu)
+{
+CPUX86State *env = &cpu->env;
+CpuTopologyLevel level;
+
+level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1D);
+if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+env->cache_info_cpuid4.l1d_cache->share_level = level;
+env->cache_info_amd.l1d_cache->share_level = level;
+}
+
+level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L1I);
+if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+env->cache_info_cpuid4.l1i_cache->share_level = level;
+env->cache_info_amd.l1i_cache->share_level = level;
+}
+
+level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L2);
+if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+env->cache_info_cpuid4.l2_cache->share_level = level;
+env->cache_info_amd.l2_cache->share_level = level;
+}
+
+level = machine_get_cache_topo_level(ms, CACHE_LEVEL_AND_TYPE_L3);
+if (level != CPU_TOPOLOGY_LEVEL_DEFAULT) {
+env->cache_info_cpuid4.l3_cache->share_level = level;
+env->cache_info_amd.l3_cache->share_level = level;
+}
+}
+#endif
+
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 {
 CPUState *cs = CPU(dev);
@@ -7821,6 +7853,13 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 #ifndef CONFIG_USER_ONLY
 MachineState *ms = MACHINE(qdev_get_machine());
+
+/*
+ * TODO: Add a SMPCompatProps.has_caches flag to avoid useless Updates
+ * if user didn't set smp_cache.
+ */
+x86_cpu_update_smp_cache_topo(ms, cpu);
+
 qemu_register_reset(x86_cpu_machine_reset_cb, cpu);
 
 if (cpu->env.features[FEAT_1_EDX] & CPUID_APIC || ms->smp.cpus > 1) {
-- 
2.34.1




[PATCH v2 5/7] i386/cpu: Support thread and module level cache topology

2024-09-08 Thread Zhao Liu
Allow cache to be defined at the thread and module level. This
increases flexibility for x86 users to customize their cache topology.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
 target/i386/cpu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e3a81bc64922..e9f755000356 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -241,9 +241,15 @@ static uint32_t max_thread_ids_for_cache(X86CPUTopoInfo 
*topo_info,
 uint32_t num_ids = 0;
 
 switch (share_level) {
+case CPU_TOPOLOGY_LEVEL_THREAD:
+num_ids = 1;
+break;
 case CPU_TOPOLOGY_LEVEL_CORE:
 num_ids = 1 << apicid_core_offset(topo_info);
 break;
+case CPU_TOPOLOGY_LEVEL_MODULE:
+num_ids = 1 << apicid_module_offset(topo_info);
+break;
 case CPU_TOPOLOGY_LEVEL_DIE:
 num_ids = 1 << apicid_die_offset(topo_info);
 break;
@@ -251,10 +257,6 @@ static uint32_t max_thread_ids_for_cache(X86CPUTopoInfo 
*topo_info,
 num_ids = 1 << apicid_pkg_offset(topo_info);
 break;
 default:
-/*
- * Currently there is no use case for THREAD and MODULE, so use
- * assert directly to facilitate debugging.
- */
 g_assert_not_reached();
 }
 
-- 
2.34.1




[PATCH v2 1/7] hw/core: Make CPU topology enumeration arch-agnostic

2024-09-08 Thread Zhao Liu
Cache topology needs to be defined based on CPU topology levels. Thus,
define CPU topology enumeration in qapi/machine.json to make it generic
for all architectures.

To match the general topology naming style, rename CPU_TOPO_LEVEL_* to
CPU_TOPOLOGY_LEVEL_*, and rename SMT and package levels to thread and
socket.

Also, enumerate additional topology levels for non-i386 arches, and add
a CPU_TOPOLOGY_LEVEL_DEFAULT to help future smp-cache object to work
with compatibility requirement of arch-specific cache topology models.

Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Changes since Patch v1:
 * Dropped prefix of CpuTopologyLevel enumeration. (Markus)
 * Rename CPU_TOPO_LEVEL_* to CPU_TOPOLOGY_LEVEL_* to match the QAPI's
   generated code. (Markus)

Changes since RFC v2:
 * Dropped cpu-topology.h and cpu-topology.c since QAPI has the helper
   (CpuTopologyLevel_str) to convert enum to string. (Markus)
 * Fixed text format in machine.json (CpuTopologyLevel naming, 2 spaces
   between sentences). (Markus)
 * Added a new level "default" to de-compatibilize some arch-specific
   topo settings. (Daniel)
 * Moved CpuTopologyLevel to qapi/machine-common.json, at where the
   cache enumeration and smp-cache object would be added.
   - If smp-cache object is defined in qapi/machine.json, storage-daemon
 will complain about the qmp cmds in qapi/machine.json during
 compiling.

Changes since RFC v1:
 * Used QAPI to enumerate CPU topology levels.
 * Dropped string_to_cpu_topo() since QAPI will help to parse the topo
   levels.
---
 hw/i386/x86-common.c   |   4 +-
 include/hw/i386/topology.h |  22 +-
 qapi/machine-common.json   |  46 +++-
 target/i386/cpu.c  | 144 ++---
 target/i386/cpu.h  |   4 +-
 5 files changed, 124 insertions(+), 96 deletions(-)

diff --git a/hw/i386/x86-common.c b/hw/i386/x86-common.c
index 992ea1f25e94..b21d2ab97349 100644
--- a/hw/i386/x86-common.c
+++ b/hw/i386/x86-common.c
@@ -273,12 +273,12 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 
 if (ms->smp.modules > 1) {
 env->nr_modules = ms->smp.modules;
-set_bit(CPU_TOPO_LEVEL_MODULE, env->avail_cpu_topo);
+set_bit(CPU_TOPOLOGY_LEVEL_MODULE, env->avail_cpu_topo);
 }
 
 if (ms->smp.dies > 1) {
 env->nr_dies = ms->smp.dies;
-set_bit(CPU_TOPO_LEVEL_DIE, env->avail_cpu_topo);
+set_bit(CPU_TOPOLOGY_LEVEL_DIE, env->avail_cpu_topo);
 }
 
 /*
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index dff49fce1154..bf740383038b 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -39,7 +39,7 @@
  *  CPUID Fn8000_0008_ECX[ApicIdCoreIdSize[3:0]] is set to apicid_core_width().
  */
 
-
+#include "qapi/qapi-types-machine-common.h"
 #include "qemu/bitops.h"
 
 /*
@@ -62,22 +62,6 @@ typedef struct X86CPUTopoInfo {
 unsigned threads_per_core;
 } X86CPUTopoInfo;
 
-/*
- * CPUTopoLevel is the general i386 topology hierarchical representation,
- * ordered by increasing hierarchical relationship.
- * Its enumeration value is not bound to the type value of Intel (CPUID[0x1F])
- * or AMD (CPUID[0x8026]).
- */
-enum CPUTopoLevel {
-CPU_TOPO_LEVEL_INVALID,
-CPU_TOPO_LEVEL_SMT,
-CPU_TOPO_LEVEL_CORE,
-CPU_TOPO_LEVEL_MODULE,
-CPU_TOPO_LEVEL_DIE,
-CPU_TOPO_LEVEL_PACKAGE,
-CPU_TOPO_LEVEL_MAX,
-};
-
 /* Return the bit width needed for 'count' IDs */
 static unsigned apicid_bitwidth_for_count(unsigned count)
 {
@@ -212,8 +196,8 @@ static inline apic_id_t 
x86_apicid_from_cpu_idx(X86CPUTopoInfo *topo_info,
  */
 static inline bool x86_has_extended_topo(unsigned long *topo_bitmap)
 {
-return test_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap) ||
-   test_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
+return test_bit(CPU_TOPOLOGY_LEVEL_MODULE, topo_bitmap) ||
+   test_bit(CPU_TOPOLOGY_LEVEL_DIE, topo_bitmap);
 }
 
 #endif /* HW_I386_TOPOLOGY_H */
diff --git a/qapi/machine-common.json b/qapi/machine-common.json
index fa6bd71d1280..148a2c8dccca 100644
--- a/qapi/machine-common.json
+++ b/qapi/machine-common.json
@@ -5,7 +5,7 @@
 # See the COPYING file in the top-level directory.
 
 ##
-# = Machines S390 data types
+# = Common machine types
 ##
 
 ##
@@ -19,3 +19,47 @@
 { 'enum': 'CpuS390Entitlement',
   'prefix': 'S390_CPU_ENTITLEMENT',
   'data': [ 'auto', 'low', 'medium', 'high' ] }
+
+##
+# @CpuTopologyLevel:
+#
+# An enumeration of CPU topology levels.
+#
+# @invalid: Invalid topology level.
+#
+# @thread: thread level, which would also be called SMT level or
+# logical processor level.  The @threads option in
+# SMPConfiguration is used to configure the topology of this
+# level.
+#
+# @core: core level.  The @cores option in SMPConfiguration is used
+

[PATCH v2 2/7] qapi/qom: Define cache enumeration and properties

2024-09-08 Thread Zhao Liu
The x86 and ARM need to allow user to configure cache properties
(current only topology):
 * For x86, the default cache topology model (of max/host CPU) does not
   always match the Host's real physical cache topology. Performance can
   increase when the configured virtual topology is closer to the
   physical topology than a default topology would be.
 * For ARM, QEMU can't get the cache topology information from the CPU
   registers, then user configuration is necessary. Additionally, the
   cache information is also needed for MPAM emulation (for TCG) to
   build the right PPTT.

Define smp-cache related enumeration and properties in QAPI, so that
user could configure cache properties for SMP system through -machine in
the subsequent patch.

Cache enumeration (CacheLevelAndType) is implemented as the combination
of cache level (level 1/2/3) and cache type (data/instruction/unified).

Currently, separated L1 cache (L1 data cache and L1 instruction cache)
with unified higher-level cache (e.g., unified L2 and L3 caches), is the
most common cache architectures.

Therefore, enumerate the L1 D-cache, L1 I-cache, L2 cache and L3 cache
with smp-cache object to add the basic cache topology support. Other
kinds of caches (e.g., L1 unified or L2/L3 separated caches) can be
added directly into CacheLevelAndType if necessary.

Cache properties (SmpCacheProperties) currently only contains cache
topology information, and other cache properties can be added in it
if necessary.

Note, define cache topology based on CPU topology level with two
reasons:

 1. In practice, a cache will always be bound to the CPU container
(either private in the CPU container or shared among multiple
containers), and CPU container is often expressed in terms of CPU
topology level.
 2. The x86's cache-related CPUIDs encode cache topology based on APIC
ID's CPU topology layout. And the ACPI PPTT table that ARM/RISCV
relies on also requires CPU containers to help indicate the private
shared hierarchy of the cache. Therefore, for SMP systems, it is
natural to use the CPU topology hierarchy directly in QEMU to define
the cache topology.

Suggested-by: Daniel P. Berrange 
Signed-off-by: Zhao Liu 
Tested-by: Yongwei Ma 
---
Suggested by credit:
 * Referred to Daniel's suggestion to introduce cache object list.
---
Changes since Patch v1:
 * Renamed SMPCacheProperty/SMPCacheProperties (QAPI structures) to
   SmpCacheProperties/SmpCachePropertiesWrapper. (Markus)
 * Renamed SMPCacheName (QAPI structure) to SmpCacheLevelAndType and
   dropped prefix. (Markus)
 * Renamed 'name' field in SmpCacheProperties to 'cache', since the
   type and level of the cache in SMP system could be able to specify
   all of these kinds of cache explicitly enough.
 * Renamed 'topo' field in SmpCacheProperties to 'topology'. (Markus)
 * Returned error information when user repeats setting cache
   properties. (Markus)
 * Renamed SmpCacheLevelAndType to CacheLevelAndType, since this
   representation is general across SMP or hybrid system.
 * Dropped handwriten smp-cache object and integrated cache pproperties
   list into MachineState (in next patch). (Markus)
 * Added the reason why x86 and ARM need to configure cache
   information. (Markus and Jonathan)

Changes since RFC v2:
 * New commit to implement cache list with JSON format instead of
   multiple sub-options in -smp.
---
 qapi/machine-common.json | 50 
 1 file changed, 50 insertions(+)

diff --git a/qapi/machine-common.json b/qapi/machine-common.json
index 148a2c8dccca..f6fe1a208214 100644
--- a/qapi/machine-common.json
+++ b/qapi/machine-common.json
@@ -63,3 +63,53 @@
 { 'enum': 'CpuTopologyLevel',
   'data': [ 'invalid', 'thread', 'core', 'module', 'cluster',
 'die', 'socket', 'book', 'drawer', 'default' ] }
+
+##
+# @CacheLevelAndType:
+#
+# Caches a system may have.  The enumeration value here is the
+# combination of cache level and cache type.
+#
+# @l1d: L1 data cache.
+#
+# @l1i: L1 instruction cache.
+#
+# @l2: L2 (unified) cache.
+#
+# @l3: L3 (unified) cache
+#
+# Since: 9.1
+##
+{ 'enum': 'CacheLevelAndType',
+  'data': [ 'l1d', 'l1i', 'l2', 'l3' ] }
+
+##
+# @SmpCacheProperties:
+#
+# Cache information for SMP system.
+#
+# @cache: Cache name, which is the combination of cache level
+# and cache type.
+#
+# @topology: Cache topology level.  It accepts the CPU topology
+# enumeration as the parameter, i.e., CPUs in the same
+# topology container share the same cache.
+#
+# Since: 9.1
+##
+{ 'struct': 'SmpCacheProperties',
+  'data': {
+  'cache': 'CacheLevelAndType',
+  'topology': 'CpuTopologyLevel' } }
+
+##
+# @SmpCachePropertiesWrapper:
+#
+# List wrapper of SmpCacheProperties.
+#
+# @caches: the list of SmpCacheProperties.
+#
+# Since 9.1
+##
+{ 'struct': 'SmpCachePropertiesWrapper',
+  'data': { 'caches': ['SmpCacheProperties'] } }
-- 
2.34.1




[PATCH v2 0/7] Introduce SMP Cache Topology

2024-09-08 Thread Zhao Liu
Wrapped the cache topology level access into a helper.
 * Split as a separate commit to just include compatibility checking and
   topology checking.
 * Allow setting "default" topology level even though the cache
   isn't supported by machine. (Daniel)
 * Rewrote the document of smp-cache object.

Main changes since RFC v1:
 * Split CpuTopology renaimg out of this RFC.
 * Use QAPI to enumerate CPU topology levels.
 * Drop string_to_cpu_topo() since QAPI will help to parse the topo
   levels.
 * Set has_*_cache field in machine_get_smp(). (JeeHeng)
 * Use "*_cache=topo_level" as -smp example as the original "level"
   term for a cache has a totally different meaning. (Jonathan)
---
Zhao Liu (7):
  hw/core: Make CPU topology enumeration arch-agnostic
  qapi/qom: Define cache enumeration and properties
  hw/core: Add smp cache topology for machine
  hw/core: Check smp cache topology support for machine
  i386/cpu: Support thread and module level cache topology
  i386/cpu: Update cache topology with machine's configuration
  i386/pc: Support cache topology in -machine for PC machine

 hw/core/machine-smp.c  | 119 +++
 hw/core/machine.c  |  44 +
 hw/i386/pc.c   |   4 +
 hw/i386/x86-common.c   |   4 +-
 include/hw/boards.h|  13 +++
 include/hw/i386/topology.h |  22 +
 qapi/machine-common.json   |  96 ++-
 qemu-options.hx|  28 +-
 target/i386/cpu.c  | 191 ++---
 target/i386/cpu.h  |   4 +-
 10 files changed, 425 insertions(+), 100 deletions(-)

-- 
2.34.1




Re: [PATCH v4 0/9] target/i386: Misc cleanup on KVM PV defs, outdated comments and error handling

2024-09-04 Thread Zhao Liu
Hi Paolo,

Just a kindly ping.

Thanks,
Zhao

On Wed, Jul 17, 2024 at 12:10:06AM +0800, Zhao Liu wrote:
> Date: Wed, 17 Jul 2024 00:10:06 +0800
> From: Zhao Liu 
> Subject: [PATCH v4 0/9] target/i386: Misc cleanup on KVM PV defs, outdated
>  comments and error handling
> X-Mailer: git-send-email 2.34.1
> 
> Hi,
> 
> This is my v4 cleanup series. Compared with v3 [1],
>  * Returned kvm_vm_ioctl() directly in kvm_install_msr_filters().
>  * Added a patch (patch 9) to clean up ARRAY_SIZE(msr_handlers).
> 
> 
> Background and Introduction
> ===
> 
> This series picks cleanup from my previous kvmclock [2] (as other
> renaming attempts were temporarily put on hold).
> 
> In addition, this series also include the cleanup on a historically
> workaround, recent comment of coco interface [3] and error handling
> corner cases in kvm_arch_init().
> 
> Avoiding the fragmentation of these misc cleanups, I consolidated them
> all in one series and was able to tackle them in one go!
> 
> [1]: 
> https://lore.kernel.org/qemu-devel/20240715044955.3954304-1-zhao1@intel.com/T/
> [2]: 
> https://lore.kernel.org/qemu-devel/20240329101954.3954987-1-zhao1@linux.intel.com/
> [3]: 
> https://lore.kernel.org/qemu-devel/2815f0f1-9e20-4985-849c-d74c6cdc9...@intel.com/
> 
> Thanks and Best Regards,
> Zhao
> ---
> Zhao Liu (9):
>   target/i386/kvm: Add feature bit definitions for KVM CPUID
>   target/i386/kvm: Remove local MSR_KVM_WALL_CLOCK and
> MSR_KVM_SYSTEM_TIME definitions
>   target/i386/kvm: Only save/load kvmclock MSRs when kvmclock enabled
>   target/i386/kvm: Save/load MSRs of kvmclock2
> (KVM_FEATURE_CLOCKSOURCE2)
>   target/i386/kvm: Drop workaround for KVM_X86_DISABLE_EXITS_HTL typo
>   target/i386/confidential-guest: Fix comment of
> x86_confidential_guest_kvm_type()
>   target/i386/kvm: Clean up return values of MSR filter related
> functions
>   target/i386/kvm: Clean up error handling in kvm_arch_init()
>   target/i386/kvm: Replace ARRAY_SIZE(msr_handlers) with
> KVM_MSR_FILTER_MAX_RANGES
> 
>  hw/i386/kvm/clock.c  |   5 +-
>  target/i386/confidential-guest.h |   2 +-
>  target/i386/cpu.h|  25 +++
>  target/i386/kvm/kvm.c| 113 +--
>  target/i386/kvm/kvm_i386.h   |   4 +-
>  5 files changed, 92 insertions(+), 57 deletions(-)
> 
> -- 
> 2.34.1
> 



Re: [RFC PATCH 0/2] Specifying cache topology on ARM

2024-09-02 Thread Zhao Liu
On Mon, Sep 02, 2024 at 11:25:19AM +0100, Alireza Sanaee wrote:
> 
> Hi Zhao,
> 
> Yes, please keep me CCed. 
> 
> One thing that I noticed, sometimes, since you were going down the
> Intel path, some variables couldn't be NULL. But when I was gonna go
> down to ARM path, I faced some scenarios where I ended up with
> some uninit vars which is still OK but could have been avoided.

Ah I didn't get your point very clearly. Could you please figure out
those places on my patches? Then I can fix them in my next version. :)

Thanks,
Zhao

> Looking forward to the next revision.
> 
> Alireza



Re: [PATCH 2/2] hw/acpi: add cache hierarchy node to pptt table

2024-08-31 Thread Zhao Liu
Hi Alireza,

On Fri, Aug 23, 2024 at 01:54:46PM +0100, Alireza Sanaee wrote:

[snip]

> +static int partial_cache_description(MachineState *ms, ACPIPPTTCache* caches,
> + int num_caches)
> +{
> +int level, c;
> +
> +for (level = 1; level < num_caches; level++) {
> +for (c = 0; c < num_caches; c++) {
> +if (caches[c].level != level) {
> +continue;
> +}
> +
> +switch (level) {
> +case 1:
> +/*
> + * L1 cache is assumed to have both L1I and L1D available.
> + * Technically both need to be checked.
> + */
> +if (machine_get_cache_topo_level(ms, SMP_CACHE_L1I) ==
> +CPU_TOPO_LEVEL_DEFAULT) {

This check just concerns L1i, but it looks not covering L1d, is L1d being
missed?  

> +assert(machine_get_cache_topo_level(ms, SMP_CACHE_L1D) !=
> +   CPU_TOPO_LEVEL_DEFAULT);

I understand you don't want user to configure other different levels for
L1d in this case...If so, it's better to return error (error_steg or
error_report or some other error print ways) to tell user his cache
configuration is invalid.

> +return level;
> +}
> +break;
> +case 2:
> +if (machine_get_cache_topo_level(ms, SMP_CACHE_L2) ==
> +CPU_TOPO_LEVEL_DEFAULT) {
> +return level;
> +}
> +break;
> +case 3:
> +if (machine_get_cache_topo_level(ms, SMP_CACHE_L3) ==
> +CPU_TOPO_LEVEL_DEFAULT) {
> +return level;
> +}
> +break;
> +}
> +}
> +}
> +
> +return 0;
> +}
> +

[snip]

> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index b0c68d66a3..b723248ecf 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -3093,6 +3093,11 @@ static void virt_machine_class_init(ObjectClass *oc, 
> void *data)
>  hc->unplug = virt_machine_device_unplug_cb;
>  mc->nvdimm_supported = true;
>  mc->smp_props.clusters_supported = true;
> +/* Supported cached */
> +mc->smp_props.cache_supported[SMP_CACHE_L1D] = true;
> +mc->smp_props.cache_supported[SMP_CACHE_L1I] = true;
> +mc->smp_props.cache_supported[SMP_CACHE_L2] = true;
> +mc->smp_props.cache_supported[SMP_CACHE_L3] = true;
>  mc->auto_enable_numa_with_memhp = true;
>  mc->auto_enable_numa_with_memdev = true;
>  /* platform instead of architectural choice */
> diff --git a/hw/core/machine-smp.c b/hw/core/machine-smp.c
> index bf6f2f9107..de95ec9c0f 100644
> --- a/hw/core/machine-smp.c
> +++ b/hw/core/machine-smp.c
> @@ -274,7 +274,11 @@ unsigned int machine_topo_get_threads_per_socket(const 
> MachineState *ms)
>  CpuTopologyLevel machine_get_cache_topo_level(const MachineState *ms,
>SMPCacheName cache)
>  {
> -return ms->smp_cache->props[cache].topo;
> +if (ms->smp_cache) {
> +return ms->smp_cache->props[cache].topo;
> +}
> +
> +return CPU_TOPO_LEVEL_DEFAULT;
>  }
>  
>  static bool machine_check_topo_support(MachineState *ms,

Maybe it's better to split smp-cache support/check on Arm in a seperate
patch.

Regards,
Zhao





Re: [RFC PATCH 0/2] Specifying cache topology on ARM

2024-08-31 Thread Zhao Liu
Hi Alireza,

Great to see your Arm side implementation!

On Fri, Aug 23, 2024 at 01:54:44PM +0100, Alireza Sanaee wrote:
> Date: Fri, 23 Aug 2024 13:54:44 +0100
> From: Alireza Sanaee 
> Subject: [RFC PATCH 0/2] Specifying cache topology on ARM
> X-Mailer: git-send-email 2.34.1
> 

[snip]

> 
> The following command will represent the system.
> 
> ./qemu-system-aarch64 \
>  -machine virt,**smp-cache=cache0** \
>  -cpu max \
>  -m 2048 \
>  -smp sockets=2,clusters=1,cores=2,threads=2 \
>  -kernel ./Image.gz \
>  -append "console=ttyAMA0 root=/dev/ram rdinit=/init acpi=force" \
>  -initrd rootfs.cpio.gz \
>  -bios ./edk2-aarch64-code.fd \
>  **-object 
> '{"qom-type":"smp-cache","id":"cache0","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"cluster"},{"name":"l3","topo":"socket"}]}'**
>  \
>  -nographic

I plan to refresh a new version soon, in which the smp-cache array will
be integrated into -machine totally. And I'cc you then.

Regards,
Zhao




Re: [PATCH v2] hw/virtio/vdpa-dev: Check returned value instead of dereferencing @errp

2024-08-31 Thread Zhao Liu
Hi Michael,

On Tue, Aug 20, 2024 at 06:55:29AM -0400, Michael S. Tsirkin wrote:

[snip]

> > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > index 64b96b226c39..8a1e16fce3de 100644
> > --- a/hw/virtio/vdpa-dev.c
> > +++ b/hw/virtio/vdpa-dev.c
> > @@ -63,19 +63,19 @@ static void vhost_vdpa_device_realize(DeviceState *dev, 
> > Error **errp)
> >  }
> >  
> >  v->vhostfd = qemu_open(v->vhostdev, O_RDWR, errp);
> > -if (*errp) {
> > +if (v->vhostfd < 0) {
> >  return;
> >  }
> >  
> >  v->vdev_id = vhost_vdpa_device_get_u32(v->vhostfd,
> > VHOST_VDPA_GET_DEVICE_ID, errp);
> > -if (*errp) {
> > +if (v->vdev_id < 0) {
> >  goto out;
> >  }
> 
> vdev_id is unsigned, no idea how is this supposed to work.
> 
> >  
> >  max_queue_size = vhost_vdpa_device_get_u32(v->vhostfd,
> > VHOST_VDPA_GET_VRING_NUM, 
> > errp);
> > -if (*errp) {
> > +if (max_queue_size < 0) {
> >  goto out;
> >  }
> >  
> max_queue_size is unsigned, too.
> 
> > @@ -89,7 +89,7 @@ static void vhost_vdpa_device_realize(DeviceState *dev, 
> > Error **errp)
> >  
> >  v->num_queues = vhost_vdpa_device_get_u32(v->vhostfd,
> >VHOST_VDPA_GET_VQS_COUNT, 
> > errp);
> > -if (*errp) {
> > +if (v->num_queues < 0) {
> >  goto out;
> >  }
> >  
> 
> num_queues is unsigned, too.

Oops, yes. The correct way is to check whether vhost_vdpa_device_get_u32
returns "(uint32_t)-1".

I can add a new macro like this:

#define VDPA_DEVICE_U32_VALUE_NONE ((uint32_t)-1)

Is this okay with you?

Thanks,
Zhao

> > @@ -127,7 +127,7 @@ static void vhost_vdpa_device_realize(DeviceState *dev, 
> > Error **errp)
> >  v->config_size = vhost_vdpa_device_get_u32(v->vhostfd,
> > VHOST_VDPA_GET_CONFIG_SIZE,
> > errp);
> > -if (*errp) {
> > +if (v->config_size < 0) {
> >  goto vhost_cleanup;
> >  }
> >  
> > -- 
> > 2.34.1
> 



Re: [PATCH v2] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f

2024-08-13 Thread Zhao Liu
On Mon, Aug 12, 2024 at 11:31:45PM -0400, Xiaoyao Li wrote:
> Date: Mon, 12 Aug 2024 23:31:45 -0400
> From: Xiaoyao Li 
> Subject: [PATCH v2] i386/cpu: Introduce enable_cpuid_0x1f to force exposing
>  CPUID 0x1f
> X-Mailer: git-send-email 2.34.1
> 
> Currently, QEMU exposes CPUID 0x1f to guest only when necessary, i.e.,
> when topology level that cannot be enumerated by leaf 0xB, e.g., die or
> module level, are configured for the guest, e.g., -smp xx,dies=2.
> 
> However, 1) TDX architecture forces to require CPUID 0x1f to configure CPU
> topology. and 2) There is a bug in Windows that Windows 10/11 expects valid
> 0x1f leafs when the maximum basic leaf > 0x1f[1].
> 
> Introduce a bool flag, enable_cpuid_0x1f, in CPU for the cases that
> require CPUID leaf 0x1f to be exposed to guest. For case 2), introduce
> a user settable property, "x-cpuid-0x1f" ,as well, which provides an opt-in
> interface for people to run the buggy Windows as a workaround. The default
> value of the property is set to false, thus making no effect on existing
> setup.
> 
> Introduce a new function x86_has_cpuid_0x1f(), which is the warpper of
> cpu->enable_cpuid_0x1f and x86_has_extended_topo() to check if it needs
> to enable cpuid leaf 0x1f for the guest.
> 
> [1] 
> https://lore.kernel.org/qemu-devel/20240724075226.212882-1-manish.mis...@nutanix.com/
> 
> Signed-off-by: Xiaoyao Li 
> ---
> changes in v2:
>  - Add more details in commit message;
>  - introduce a separate function x86_has_cpuid_0x1f() instead of
>modifying x86_has_extended_topo();
> ---
>  target/i386/cpu.c | 5 +++--
>  target/i386/cpu.h | 9 +
>  target/i386/kvm/kvm.c | 2 +-
>  3 files changed, 13 insertions(+), 3 deletions(-)

This wrapper x86_has_cpuid_0x1f() looks good for me.

Reviewed-by: Zhao Liu 




Re: [PATCH] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f

2024-08-13 Thread Zhao Liu
On Tue, Aug 13, 2024 at 10:52:27AM +0800, Xiaoyao Li wrote:

[snip]

> > Any levels that 0xb doesn't cover.
> 
> The name of extended_topo is so misleading. At least, it misleads me.
> 
> Both Intel and AMD support leaf 0xb and the name of leaf 0xb is "Extended
> topology enumeration". And here, x86_has_extended_topo() is used for topo
> levels that cannot be covered by 0xb.
> 

Yes, names are really hard, Intel and AMD have different naming for
topology leafs (the ones 0xb doesn't cover)... This helper has a
comment, which is also a clear expression of what it is doing.

Thanks,
Zhao




Re: [PATCH v3 4/4] target/i386: Mask CMPLegacy bit in CPUID[0x80000001].ECX for Zhaoxin CPUs

2024-08-12 Thread Zhao Liu
On Fri, Aug 09, 2024 at 05:42:59AM -0400, EwanHai wrote:
> Date: Fri, 9 Aug 2024 05:42:59 -0400
> From: EwanHai 
> Subject: [PATCH v3 4/4] target/i386: Mask CMPLegacy bit in
>  CPUID[0x8001].ECX for Zhaoxin CPUs
> X-Mailer: git-send-email 2.34.1
> 
> Zhaoxin CPUs (including vendors "Shanghai" and "Centaurhauls") handle the
> CMPLegacy bit similarly to Intel CPUs. Therefore, this commit masks the
> CMPLegacy bit in CPUID[0x8001].ECX for Zhaoxin CPUs, just as it is done
> for Intel CPUs.
> 
> AMD uses the CMPLegacy bit (CPUID[0x8001].ECX.bit1) along with other CPUID
> information to enumerate platform topology (e.g., the number of logical
> processors per package). However, for Intel and other CPUs that follow Intel's
> behavior, CPUID[0x8001].ECX.bit1 is reserved.
> 
> - Impact on Intel and similar CPUs:
> This change has no effect on Intel and similar CPUs, as the goal is to
> accurately emulate CPU CPUID information.
> 
> - Impact on Linux Guests running on Intel (and similar) vCPUs:
> During boot, Linux checks if the CPU supports Hyper-Threading.
> If it detects

Maybe "For the kernel before v6.9, if it detects"? About this change,
see the below comment...

> X86_FEATURE_CMP_LEGACY, it assumes Hyper-Threading is not supported. For Intel
> and similar vCPUs, if the CMPLegacy bit is not masked in 
> CPUID[0x8001].ECX,
> Linux will incorrectly assume that Hyper-Threading is not supported, even if
> the vCPU does support it.

...It seems this issue exists in the kernel before v6.9. Thomas'
topology refactoring has fixed this behavior:
* commit 22d63660c35e ("x86/cpu: Use common topology code for Intel")
* commit 598e719c40d6 ("x86/cpu: Use common topology code for Centaur
  and Zhaoxin")

> Signed-off-by: EwanHai 
> ---
>  target/i386/cpu.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

Just the above nit. Otherwise, LGTM,

Reviewed-by: Zhao Liu 





Re: [PATCH 8/8] qemu-options: Add the description of smp-cache object

2024-08-12 Thread Zhao Liu
Hi Markus,

On Fri, Aug 09, 2024 at 02:24:48PM +0200, Markus Armbruster wrote:
> Date: Fri, 09 Aug 2024 14:24:48 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
>  object
> 
> I apologize for the delay.

You're welcome! I appreciate your time, guidance and feedback.

> Zhao Liu  writes:
> 
> > On Thu, Aug 01, 2024 at 01:28:27PM +0200, Markus Armbruster wrote:
> 
> [...]
> 
> >> Can you provide a brief summary of the design alternatives that have
> >> been proposed so far?  Because I've lost track.
> >
> > No problem!
> >
> > Currently, we have the following options:
> >
> > * 1st: The first one is just to configure cache topology with several
> >   options in -smp:
> >
> >   -smp l1i-cache-topo=core,l1d-cache-topo-core
> >
> >   This one lacks scalability to support the cache size that ARM will
> >   need in the future.
> 
> -smp sets machine property "smp" of QAPI type SMPConfiguration.
> 
> So this one adds members l1i-cache-topo, l1d-cache-topo, ... to
> SMPConfiguration.

Yes.

> > * 2nd: The cache list object in -smp.
> >
> >   The idea was to use JSON to configure the cache list. However, the
> >   underlying implementation of -smp at the moment is keyval parsing,
> >   which is not compatible with JSON.
> 
> Keyval is a variation of the QEMU's traditional KEY=VALUE,... syntax
> that can serve as an alternative to JSON, with certain restrictions.
> Ideally, we provide both JSON and keyval syntax on the command line.

I see. It's the ideal state of the CLI, and -machine and -smp haven't
arrived here yet.

> Example: -blockdev supports both JSON and keyval.
> JSON:   -blockdev '{"driver": "null-co", "node-name": "node0"}'
> keyval: -blockdev null-co,node-name=node0
> 
> Unfortunately, we have many old interfaces that still lack JSON support.
> 
> >   If we can not insist on JSON format, then cache lists can also be
> >   implemented in the following way:
> >   
> >   -smp caches.0.name=l1i,caches.0.topo=core,\
> >caches.1.name=l1d,caches.1.topo=core
> 
> This one adds a single member caches to SMPConfiguration.  It is an
> array of objects.

Yes.

> > * 3rd: The cache list object linked in -machine.
> >
> >   Considering that -object is JSON-compatible so that defining lists via
> >   JSON is more friendly, I implemented the caches list via -object and
> >   linked it to MachineState:
> >
> >   -object 
> > '{"qom-type":"smp-cache","id":"obj","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"}]}'
> >   -machine smp-caches=obj
> 
> This one wraps the same array of objects in a new user-creatable object,
> then sets machine property "smp-caches" to that object.
> 
> We can set machine properties directly with -machine.  But -machine
> doesn't support JSON, yet.
> 
> Wrapping in an object moves the configuration to -object, which does
> support JSON.
> 
> Half way between 2nd and 3rd:
> 
>   * Cache list object in machine
> 
> -machine caches.0.name=l1i,caches.0.topo=core,\
>  caches.1.name=l1d,caches.1.topo=core

I got your point, and putting the array in -machine does align with the
design of the other machine options nowadays.

> > * 4th: The per cache object without any list:
> >
> >   -object smp-cache,id=cache0,name=l1i,topo=core \
> >   -object smp-cache,id=cache1,name=l1d,topo=core
> >
> >   This proposal is clearer, but there are a few opens:
> >   - I plan to push qom-topo forward, which would abstract CPU related
> > topology levels and cache to "device" instead of object. Is there a
> > conflict here?
> 
> Can't say, since I don't understand where you want to go.
> 
> Looks like your trying to design an interface for what you want to do
> now, and are wondering whether it could evolve to accomodate what you
> want to do later.
> 
> It's often better to design the interface for everything you already
> know you want to do, then take out the parts you want to do later.

Thanks! From this point of view, then per cache of objects does not meet
my needs.

> >   - Multiple cache objects can't be linked to the machine on the command
> > line, so I maintain a static cache list in smp_cache.c and expose
> > the cache information to the machine through some interface. is this
>

Re: [PATCH 1/1] target/i386: Fix arguments for vmsr_read_thread_stat()

2024-08-09 Thread Zhao Liu
On Wed, Aug 07, 2024 at 02:43:20PM +0200, Anthony Harivel wrote:
> Date: Wed,  7 Aug 2024 14:43:20 +0200
> From: Anthony Harivel 
> Subject: [PATCH 1/1] target/i386: Fix arguments for vmsr_read_thread_stat()
> 
> Snapshot of the stat utime and stime for each thread, taken before and
> after the pause, must be stored in separate locations
> 
> Signed-off-by: Anthony Harivel 
> ---
>  target/i386/kvm/kvm.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Zhao Liu 




Re: [PATCH v1 2/3] target/i386: Add VMX control bits for nested FRED support

2024-08-09 Thread Zhao Liu
On Thu, Aug 08, 2024 at 11:38:11PM -0700, Xin Li wrote:
> Date: Thu, 8 Aug 2024 23:38:11 -0700
> From: Xin Li 
> Subject: Re: [PATCH v1 2/3] target/i386: Add VMX control bits for nested
>  FRED support
> 
> > > > > @@ -1450,7 +1450,7 @@ FeatureWordInfo 
> > > > > feature_word_info[FEATURE_WORDS] = {
> > > > >NULL, "vmx-entry-ia32e-mode", NULL, NULL,
> > > > >NULL, "vmx-entry-load-perf-global-ctrl", 
> > > > > "vmx-entry-load-pat", "vmx-entry-load-efer",
> > > > >"vmx-entry-load-bndcfgs", NULL, 
> > > > > "vmx-entry-load-rtit-ctl", NULL,
> > > > > -NULL, NULL, "vmx-entry-load-pkrs", NULL,
> > > > > +NULL, NULL, "vmx-entry-load-pkrs", "vmx-entry-load-fred",
> > > > 
> > > > Should we also define VMX_VM_ENTRY_LOAD_FRED? "vmx-entry-load-rtit-ctl"
> > > > and "vmx-entry-load-pkrs" have their corresponding bit definitions, even
> > > > if they are not used.
> > > 
> > > I'm not sure, but why add something that is not being used (thus not
> > > tested)?
> > 
> > Yes, the use of macros is a factor. My another consideration is the
> > integrity of the feature definitions. When the such feature definitions
> > were first introduced in commit 704798add83b (”target/i386: add VMX
> > definitions”), I understand thay were mainly used to enumerate and
> > reflect hardware support and not all defs are used directly.
> > 
> > The feat word name and the feature definition should essentially be
> > bound, and it might be possible to generate the feature definition
> > from the feat word via some script without having to add it manually,
> > but right now there is no work on this, and no additional constraints,
> > so we have to manually add and manually check it to make sure that the
> > two correspond to each other. When a feature word is added, it means
> > that Host supports the corresponding feature, and from an integrity
> > perspective, so it is natural to continue adding definition (just like
> > the commit 52a44ad2b92b ("target/i386: Expose VMX entry/exit load pkrs
> > control bits")), right?
> > 
> > Though I found that there are still some mismatches between the feature
> > word and the corresponding definition, but ideally they should coexist.
> > 
> > About the test, if it's just enumerated and not added to a specific CPU
> > model or involved by other logic, it's harmless?
> 
> Unless tests are ready, such code are literally dead code, and could get
> broken w/o being noticed for a long time.
> 
> I think we should add it only when tests are also added.  Otherwise we added
> burden to maintainers, hoping test will be added soon, which often
> never happen.

It makes sense and can reduce the burden on maintainers. Now I totally
agree with you.

Thanks,
Zhao




Re: [PATCH v1 3/3] target/i386: Raise the highest index value used for any VMCS encoding

2024-08-09 Thread Zhao Liu
On Fri, Aug 09, 2024 at 12:38:02AM -0700, Xin Li wrote:
> Date: Fri, 9 Aug 2024 00:38:02 -0700
> From: Xin Li 
> Subject: Re: [PATCH v1 3/3] target/i386: Raise the highest index value used
>  for any VMCS encoding
> 
> On 8/8/2024 11:27 PM, Xin Li wrote:
> > > > +    if (f[FEAT_7_1_EAX] & CPUID_7_1_EAX_FRED) {
> > > > +    /* FRED injected-event data (0x2052).  */
> > > > +    kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x52);
> > > 
> > > HMM, I have the questions when I check the FRED spec.
> > > 
> > > Section 9.3.4 said, (for injected-event data) "This field has uses the
> > > encoding pair 2052H/2053H."
> > > 
> > > So why adjust the highest index to 0x52 other than 0x53?
> 
> Okay, found it in the Intel SDM:
> 
> Index. Bits 9:1 distinguish components with the same field width and type.
> 
> Bit 0 is not included in the index field.

Thanks for your education and explanation! I see, for
IA32_VMX_VMCS_ENUM, bit 0 is reserved and only index field is enough.

Regards,
Zhao




Re: [PATCH v2 1/2] kvm: replace fprintf with error_report() in kvm_init() for error conditions

2024-08-08 Thread Zhao Liu
On Fri, Aug 09, 2024 at 10:40:53AM +0530, Ani Sinha wrote:
> Date: Fri,  9 Aug 2024 10:40:53 +0530
> From: Ani Sinha 
> Subject: [PATCH v2 1/2] kvm: replace fprintf with error_report() in
>  kvm_init() for error conditions
> X-Mailer: git-send-email 2.45.2
> 
> error_report() is more appropriate for error situations. Replace fprintf with
> error_report. Cosmetic. No functional change.
> 
> CC: qemu-triv...@nongnu.org
> CC: zhao1@intel.com
> Signed-off-by: Ani Sinha 
> ---
>  accel/kvm/kvm-all.c | 40 ++--
>  1 file changed, 18 insertions(+), 22 deletions(-)
> 
> changelog:
> v2: fix a bug.

Generally good to me. Only some nits below, otherwise,

Reviewed-by: Zhao Liu 

>  #ifdef TARGET_S390X
>  if (ret == -EINVAL) {
> -fprintf(stderr,
> -"Host kernel setup problem detected. Please verify:\n");
> -fprintf(stderr, "- for kernels supporting the switch_amode or"
> -" user_mode parameters, whether\n");
> -fprintf(stderr,
> -"  user space is running in primary address space\n");
> -fprintf(stderr,
> -"- for kernels supporting the vm.allocate_pgste sysctl, "
> -"whether it is enabled\n");
> +error_report("Host kernel setup problem detected. Please 
> verify:");

The doc of error_report() said it doesn't want multiple sentences or trailing
punctuation:

"The resulting message should be a single phrase, with no newline or trailing
punctuation."

So I think these extra messages (with complex formatting & content) are
better printed with error_printf() as I suggested in [1].

[1]: 
https://lore.kernel.org/qemu-devel/zrwp0fwpnzeav...@intel.com/T/#m953afd879eb6279fcdf03cda594c43f1829bdffe

> +error_report("- for kernels supporting the switch_amode or"
> +" user_mode parameters, whether");
> +error_report("  user space is running in primary address space");
> +error_report("- for kernels supporting the vm.allocate_pgste "
> +"sysctl, whether it is enabled");
>  }
>  #elif defined(TARGET_PPC)
>  if (ret == -EINVAL) {
> -fprintf(stderr,
> -"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
> -(type == 2) ? "pr" : "hv");
> +error_report("PPC KVM module is not loaded. Try modprobe 
> kvm_%s.",
> +(type == 2) ? "pr" : "hv");

Same here. A trailing punctuation. If possible, feel free to refer to
the comment in [1].

>  }
>  #endif

[snip]

> @@ -2542,8 +2538,8 @@ static int kvm_init(MachineState *ms)
>  }
>  if (missing_cap) {
>  ret = -EINVAL;
> -fprintf(stderr, "kvm does not support %s\n%s",
> -missing_cap->name, upgrade_note);
> +error_report("kvm does not support %s", missing_cap->name);
> +error_report("%s", upgrade_note);

"upgrade_note" string also has the trailing punctuation, and it's
also better to use error_printf() to replace the 2nd error_report().

For this patch, error_report() is already a big step forward, so I think
these few nits doesn't block this patch.

Thank you for your patience.
Zhao




Re: [PATCH v3 2/2] kvm: refactor core virtual machine creation into its own function

2024-08-08 Thread Zhao Liu
On Fri, Aug 09, 2024 at 10:40:54AM +0530, Ani Sinha wrote:
> Date: Fri,  9 Aug 2024 10:40:54 +0530
> From: Ani Sinha 
> Subject: [PATCH v3 2/2] kvm: refactor core virtual machine creation into
>  its own function
> X-Mailer: git-send-email 2.45.2
> 
> Refactoring the core logic around KVM_CREATE_VM into its own separate function
> so that it can be called from other functions in subsequent patches. There is
> no functional change in this patch.
> 
> CC: pbonz...@redhat.com
> CC: zhao1@intel.com
> CC: cfont...@suse.de
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Ani Sinha 
> ---
>  accel/kvm/kvm-all.c | 86 -
>  1 file changed, 53 insertions(+), 33 deletions(-)
> 
> changelog:
> v2: s/fprintf/warn_report as suggested by zhao
> v3: s/warn_report/error_report. function names adjusted to conform to
> other names. fprintf -> error_report() moved to its own patch.

Reviewed-by: Zhao Liu 




Re: [PATCH v2 4/4] target/i386: Update CMPLegacy handling for Zhaoxin CPUs

2024-08-08 Thread Zhao Liu
On Thu, Aug 08, 2024 at 11:25:45PM -0400, Ewan Hai wrote:

[snip]

> Thank you for your suggestion; the changes will indeed make it clearer.
> I have a question: since you’ve already added your reviewed-by tag to
> the first three patches, if I want to modify these descriptions, should
> I submit a v3 patchset containing all four patches, or should I only send a
> new patch titled "target/i386: Mask CMPLegacy bit in CPUID[0x8001].ecx
> for Zhaoxin/Centaur CPUs"?

The v3 should contain all 4 patches, and you can add my R/b tag in the
first three patches.




Re: [PATCH v2] kvm: refactor core virtual machine creation into its own function

2024-08-08 Thread Zhao Liu
Hi Ani,

On Thu, Aug 08, 2024 at 05:08:38PM +0530, Ani Sinha wrote:
> Date: Thu,  8 Aug 2024 17:08:38 +0530
> From: Ani Sinha 
> Subject: [PATCH v2] kvm: refactor core virtual machine creation into its
>  own function
> X-Mailer: git-send-email 2.45.2
> 
> Refactoring the core logic around KVM_CREATE_VM into its own separate function
> so that it can be called from other functions in subsequent patches. There is
> no functional change in this patch.
> 
> CC: pbonz...@redhat.com
> CC: zhao1@intel.com
> Signed-off-by: Ani Sinha 
> ---
>  accel/kvm/kvm-all.c | 93 +++--
>  1 file changed, 56 insertions(+), 37 deletions(-)
> 
> changelog:
> v2: s/fprintf/warn_report as suggested by zhao

Thanks for your change!

> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 75d11a07b2..c2e177c39f 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2385,6 +2385,60 @@ uint32_t kvm_dirty_ring_size(void)
>  return kvm_state->kvm_dirty_ring_size;
>  }
>  
> +static int do_kvm_create_vm(MachineState *ms, int type)
> +{
> +KVMState *s;
> +int ret;
> +
> +s = KVM_STATE(ms->accelerator);
> +
> +do {
> +ret = kvm_ioctl(s, KVM_CREATE_VM, type);
> +} while (ret == -EINTR);
> +
> +if (ret < 0) {
> +warn_report("ioctl(KVM_CREATE_VM) failed: %d %s", -ret,
> +strerror(-ret));
> +
> +#ifdef TARGET_S390X
> +if (ret == -EINVAL) {
> +warn_report("Host kernel setup problem detected. Please 
> verify:");
> +warn_report("- for kernels supporting the switch_amode or"
> +" user_mode parameters, whether");
> +warn_report("  user space is running in primary address space");
> +warn_report("- for kernels supporting the vm.allocate_pgste "
> +"sysctl, whether it is enabled");
> +}
> +#elif defined(TARGET_PPC)
> +if (ret == -EINVAL) {
> +warn_report("PPC KVM module is not loaded. Try modprobe kvm_%s.",
> +(type == 2) ? "pr" : "hv");
> +}
> +#endif

I think error level message is more appropriate than warn because after
the print QEMU handles error and terminates the Guest startup.

What about the following change?

#ifdef TARGET_S390X
if (ret == -EINVAL) {
error_report("Host kernel setup problem detected");
error_printf("Please verify:\n");
error_printf("- for kernels supporting the switch_amode or"
 " user_mode parameters, whether\n");
error_printf("  user space is running in primary address space\n");
error_printf("- for kernels supporting the vm.allocate_pgste "
 "sysctl, whether it is enabled\n");
}
#elif defined(TARGET_PPC)
if (ret == -EINVAL) {
error_report("PPC KVM module is not loaded");
error_printf("Try modprobe kvm_%s.\n",
 (type == 2) ? "pr" : "hv");
}
#endif

The above uses error_report() to just print error reason/error code
since for error_report, "The resulting message should be a single
phrase, with no newline or trailing punctuation."

Other specific hints or information are printed by error_printf()
because style.rst suggests "Use error_printf() & friends to print
additional information."

Thanks,
Zhao

> +}
> +
> +return ret;
> +}
> +
> +static int find_kvm_machine_type(MachineState *ms)
> +{
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
> +int type;
> +
> +if (object_property_find(OBJECT(current_machine), "kvm-type")) {
> +g_autofree char *kvm_type;
> +kvm_type = object_property_get_str(OBJECT(current_machine),
> +   "kvm-type",
> +   &error_abort);
> +type = mc->kvm_type(ms, kvm_type);
> +} else if (mc->kvm_type) {
> +type = mc->kvm_type(ms, NULL);
> +} else {
> +type = kvm_arch_get_default_type(ms);
> +}
> +return type;
> +}
> +
>  static int kvm_init(MachineState *ms)
>  {
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2467,49 +2521,14 @@ static int kvm_init(MachineState *ms)
>  }
>  s->as = g_new0(struct KVMAs, s->nr_as);
>  
> -if (object_property_find(OBJECT(current_machine), "kvm-type")) {
> -g_autofree char *kvm_type = 
> object_property_get_str(OBJECT(current_machine),
> -"kvm-type",
> -&error_abort);
> -type = mc->kvm_type(ms, kvm_type);
> -} else if (mc->kvm_type) {
> -type = mc->kvm_type(ms, NULL);
> -} else {
> -type = kvm_arch_get_default_type(ms);
> -}
> -
> +type = find_kvm_machine_type(ms);
>  if (type < 0) {
>  ret = -EINVAL;
>  goto err;
>  }
>  
> -do {
> -re

Re: [PATCH v2 4/4] target/i386: Update CMPLegacy handling for Zhaoxin CPUs

2024-08-08 Thread Zhao Liu
On Thu, Aug 08, 2024 at 09:44:18PM -0400, Ewan Hai wrote:
> Date: Thu, 8 Aug 2024 21:44:18 -0400
> From: Ewan Hai 
> Subject: Re: [PATCH v2 4/4] target/i386: Update CMPLegacy handling for
>  Zhaoxin CPUs
> 
> 
> Hi Zhao Liu,
> 
> Thank you for your feedback.
> 
> On 8/8/24 06:30, Zhao Liu wrote:
> > Hi EwanHai,
> > 
> > On Thu, Jul 04, 2024 at 07:25:11AM -0400, EwanHai wrote:
> > > Date: Thu, 4 Jul 2024 07:25:11 -0400
> > > From: EwanHai 
> > > Subject: [PATCH v2 4/4] target/i386: Update CMPLegacy handling for Zhaoxin
> > >   CPUs
> > > X-Mailer: git-send-email 2.34.1
> > > 
> > > Zhaoxin CPUs handle the CMPLegacy bit in the same way
> > > as Intel CPUs.

Here it could be clearer to say "Don't set up CMPLegacy bit in
CPUID[0x8001].ecx for VIA/Zhaoxin CPUs".

> This patch simplifies the existing logic by
> > > using the IS_XXX_CPU macro and includes checks for Zhaoxin
> > > vendor to align their behavior with Intel.
> > > 
> > > Signed-off-by: EwanHai 
> > > ---
> > >   target/i386/cpu.c | 4 +---
> > >   1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index a3747fc487..c52a4cf3ba 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -6945,9 +6945,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t 
> > > index, uint32_t count,
> > >* So don't set it here for Intel to make Linux guests happy.
> > >*/
> > >   if (threads_per_pkg > 1) {
> > > -if (env->cpuid_vendor1 != CPUID_VENDOR_INTEL_1 ||
> > > -env->cpuid_vendor2 != CPUID_VENDOR_INTEL_2 ||
> > > -env->cpuid_vendor3 != CPUID_VENDOR_INTEL_3) {
> > > +if (!IS_INTEL_CPU(env) && !IS_ZHAOXIN_CPU(env)) {
> > This change implicitly changes the behavior of existing VIA CPU.
> > 
> > Is this a bug for the original VIA? If so, I suggest a separate patch to
> > fix it and explain the effect on the VIA (Zhaoxin1) CPU.
> > 
> > Regards,
> > Zhao
>
> The reason for this change is not due to a discovered bug, but rather
> because both Centaurhauls and Shanghai CPUs follow Intel’s behavior
> regarding the CMPLegacy bit. Specifically, AMD CPUs enumerate the
> threads per package information in the CPUID leaf 0x8001 output
> ECX register, while Intel (and **other processors following Intel’s
> behavior**) do not. Therefore, this modification is simply intended to
> logically supplement the existing code.

I see, thanks.

> Given this, do you think it would be appropriate for me to submit
> a separate patch to explain this behavior and its effect on
> VIA (Zhaoxin1) CPUs? If so, I will submmit this change in a separate
> patch.

I think there's no need to split this.

However, I think it's necessary to state the effect of the change in
the changelog/commit message. It's also worth stating if it won't have
any effect on the OS/software. Afterall, the comment of this bit said
it affects Linux kernel.

Also, changes to the old VIA behavior are worth stating in the commit
message, i.e., this patch's changes to Zhaoxin CPUs include the previous
VIA CPUs.

Additionally, considering this change is to fix the CPUID which doesn't
match the bare metal, then what about changing the subject to

"target/i386: Mask CMPLegacy bit in CPUID[0x8001].ecx for Zhaoxin/VIA
CPUs"?

Thanks,
Zhao

> > >   *ecx |= 1 << 1;/* CmpLegacy bit */
> > >   }
> > >   }
> > > --
> > > 2.34.1
> > > 
> 



Re: [PATCH] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f

2024-08-08 Thread Zhao Liu
On Thu, Aug 08, 2024 at 09:59:07PM +0800, Xiaoyao Li wrote:
> Date: Thu, 8 Aug 2024 21:59:07 +0800
> From: Xiaoyao Li 
> Subject: Re: [PATCH] i386/cpu: Introduce enable_cpuid_0x1f to force
>  exposing CPUID 0x1f
> 
> On 8/8/2024 6:09 PM, Zhao Liu wrote:
> > Hi Xiaoyao,
> > 
> > Patch is generally fine for me. Just a few nits:
> > 
> > On Fri, Aug 02, 2024 at 03:24:26AM -0400, Xiaoyao Li wrote:
> > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > index dff49fce1154..b63bce2f4c82 100644
> > > --- a/include/hw/i386/topology.h
> > > +++ b/include/hw/i386/topology.h
> > > @@ -207,13 +207,4 @@ static inline apic_id_t 
> > > x86_apicid_from_cpu_idx(X86CPUTopoInfo *topo_info,
> > >   return x86_apicid_from_topo_ids(topo_info, &topo_ids);
> > >   }
> > > -/*
> > > - * Check whether there's extended topology level (module or die)?
> > > - */
> > > -static inline bool x86_has_extended_topo(unsigned long *topo_bitmap)
> > > -{
> > > -return test_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap) ||
> > > -   test_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> > > -}
> > > -
> > 
> > [snip]
> > 
> > > +/*
> > > + * Check whether there's v2 extended topology level (module or die)?
> > > + */
> > > +bool x86_has_v2_extended_topo(X86CPU *cpu)
> > > +{
> > > +if (cpu->enable_cpuid_0x1f) {
> > > +return true;
> > > +}
> > > +
> > > +return test_bit(CPU_TOPO_LEVEL_MODULE, cpu->env.avail_cpu_topo) ||
> > > +   test_bit(CPU_TOPO_LEVEL_DIE, cpu->env.avail_cpu_topo);
> > > +}
> > > +
> > 
> > I suggest to decouple 0x1f enablement and extended topo check, since as
> > the comment of CPUTopoLevel said:
> > 
> > /*
> >   * CPUTopoLevel is the general i386 topology hierarchical representation,
> >   * ordered by increasing hierarchical relationship.
> >   * Its enumeration value is not bound to the type value of Intel 
> > (CPUID[0x1F])
> >   * or AMD (CPUID[0x8026]).
> >   */
> > 
> > The topology enumeration is generic and is not bound to the vendor.
> 
> I don't quit get your point. All the current usages of
> x86_has_extended_topo() are for CPUID leaf 0x1f, which is an Intel specific
> leaf.

0x1f is just a user of that helper, and AMD's leaf would be another
potential user, even if it is not implemented yet.

What this helper does is check the topology hierarchy set by -smp for
x86 CPUs, and has nothing to do with enabling 0x1f or not. You cannot
falsely report the presence of module/die if -smp doesn't configure such
levels but 0x1f is forcibly enabled.

> Are you saying x86_has_extended_topo() will be used for leaf 0x8026 for
> AMD as well?
> 
> or maybe I misunderstand the meaning "extend_topo". The extend_topo just
> means the topo level of module and die, and the topo level of smt and core
> are non-extended? 

Any levels that 0xb doesn't cover.

> If so, this is new to me, could I ask where the
> definitions come from? or just QEMU defines them itself?
>
> > [snip]
> > 
> > > diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> > > index c6cc035df3d8..211a42ffbfa6 100644
> > > --- a/target/i386/cpu.h
> > > +++ b/target/i386/cpu.h
> > > @@ -2110,6 +2110,9 @@ struct ArchCPU {
> > >   /* Compatibility bits for old machine types: */
> > >   bool enable_cpuid_0xb;
> > > +/* Force to expose cpuid 0x1f */
> > 
> > Maybe "Force to enable cpuid 0x1f"?
> 
> I can change to it.
> 
> > > +bool enable_cpuid_0x1f;
> > > +
> > >   /* Enable auto level-increase for all CPUID leaves */
> > >   bool full_cpuid_auto_level;i
> > 
> > Regards,
> > Zhao
> > 
> 



Re: [PATCH] kvm: refactor core virtual machine creation into its own function

2024-08-08 Thread Zhao Liu
Hi Ani,

On Thu, Aug 08, 2024 at 04:03:36PM +0530, Ani Sinha wrote:
> Date: Thu,  8 Aug 2024 16:03:36 +0530
> From: Ani Sinha 
> Subject: [PATCH] kvm: refactor core virtual machine creation into its own
>  function
> X-Mailer: git-send-email 2.45.2
> 
> Refactoring the core logic around KVM_CREATE_VM into its own separate function
> so that it can be called from other functions in subsequent patches. There is
> no functional change in this patch.
> 
> CC: pbonz...@redhat.com
> Signed-off-by: Ani Sinha 
> ---
>  accel/kvm/kvm-all.c | 97 -
>  1 file changed, 60 insertions(+), 37 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 75d11a07b2..2bcd00126a 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2385,6 +2385,64 @@ uint32_t kvm_dirty_ring_size(void)
>  return kvm_state->kvm_dirty_ring_size;
>  }
>  
> +static int do_create_vm(MachineState *ms, int type)
> +{
> +KVMState *s;
> +int ret;
> +
> +s = KVM_STATE(ms->accelerator);
> +
> +do {
> +ret = kvm_ioctl(s, KVM_CREATE_VM, type);
> +} while (ret == -EINTR);
> +
> +if (ret < 0) {
> +fprintf(stderr, "ioctl(KVM_CREATE_VM) failed: %d %s\n", -ret,
> +strerror(-ret));
> +
> +#ifdef TARGET_S390X
> +if (ret == -EINVAL) {
> +fprintf(stderr,
> +"Host kernel setup problem detected. Please verify:\n");
> +fprintf(stderr, "- for kernels supporting the switch_amode or"
> +" user_mode parameters, whether\n");
> +fprintf(stderr,
> +"  user space is running in primary address space\n");
> +fprintf(stderr,
> +"- for kernels supporting the vm.allocate_pgste sysctl, "
> +"whether it is enabled\n");

Is it possible to convert fprintf to error_report()? Just like the
commit d0e16850eed3 ("hw/xen: convert stderr prints to error/warn
reports").

Regards,
Zhao

> +}
> +#elif defined(TARGET_PPC)
> +if (ret == -EINVAL) {
> +fprintf(stderr,
> +"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
> +(type == 2) ? "pr" : "hv");
> +}
> +#endif
> +}
> +
> +return ret;
> +}
> +
> +static int find_kvm_machine_type(MachineState *ms)
> +{
> +MachineClass *mc = MACHINE_GET_CLASS(ms);
> +int type;
> +
> +if (object_property_find(OBJECT(current_machine), "kvm-type")) {
> +g_autofree char *kvm_type;
> +kvm_type = object_property_get_str(OBJECT(current_machine),
> +   "kvm-type",
> +   &error_abort);
> +type = mc->kvm_type(ms, kvm_type);
> +} else if (mc->kvm_type) {
> +type = mc->kvm_type(ms, NULL);
> +} else {
> +type = kvm_arch_get_default_type(ms);
> +}
> +return type;
> +}
> +
>  static int kvm_init(MachineState *ms)
>  {
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
> @@ -2467,49 +2525,14 @@ static int kvm_init(MachineState *ms)
>  }
>  s->as = g_new0(struct KVMAs, s->nr_as);
>  
> -if (object_property_find(OBJECT(current_machine), "kvm-type")) {
> -g_autofree char *kvm_type = 
> object_property_get_str(OBJECT(current_machine),
> -"kvm-type",
> -&error_abort);
> -type = mc->kvm_type(ms, kvm_type);
> -} else if (mc->kvm_type) {
> -type = mc->kvm_type(ms, NULL);
> -} else {
> -type = kvm_arch_get_default_type(ms);
> -}
> -
> +type = find_kvm_machine_type(ms);
>  if (type < 0) {
>  ret = -EINVAL;
>  goto err;
>  }
>  
> -do {
> -ret = kvm_ioctl(s, KVM_CREATE_VM, type);
> -} while (ret == -EINTR);
> -
> +ret = do_create_vm(ms, type);
>  if (ret < 0) {
> -fprintf(stderr, "ioctl(KVM_CREATE_VM) failed: %d %s\n", -ret,
> -strerror(-ret));
> -
> -#ifdef TARGET_S390X
> -if (ret == -EINVAL) {
> -fprintf(stderr,
> -"Host kernel setup problem detected. Please verify:\n");
> -fprintf(stderr, "- for kernels supporting the switch_amode or"
> -" user_mode parameters, whether\n");
> -fprintf(stderr,
> -"  user space is running in primary address space\n");
> -fprintf(stderr,
> -"- for kernels supporting the vm.allocate_pgste sysctl, "
> -"whether it is enabled\n");
> -}
> -#elif defined(TARGET_PPC)
> -if (ret == -EINVAL) {
> -fprintf(stderr,
> -"PPC KVM module is not loaded. Try modprobe kvm_%s.\n",
> -(type == 2) ? "pr" : "hv");
> -}
> -#endif
>  goto err;
>  }
>  
> -- 
> 2.45.2
> 

Re: [PATCH v2 3/4] target/i386: Introduce Zhaoxin Yongfeng CPU model

2024-08-08 Thread Zhao Liu
On Thu, Jul 04, 2024 at 07:25:10AM -0400, EwanHai wrote:
> Date: Thu, 4 Jul 2024 07:25:10 -0400
> From: EwanHai 
> Subject: [PATCH v2 3/4] target/i386: Introduce Zhaoxin Yongfeng CPU model
> X-Mailer: git-send-email 2.34.1
> 
> Introduce support for the Zhaoxin Yongfeng CPU model.
> The Zhaoxin Yongfeng CPU is Zhaoxin's latest server CPU.
> 
> This new cpu model ensure that QEMU can correctly emulate the Zhaoxin
> Yongfeng CPU, providing accurate functionality and performance 
> characteristics.
> 
> Signed-off-by: EwanHai 
> ---
>  target/i386/cpu.c | 124 ++
>  1 file changed, 124 insertions(+)
 
Reviewed-by: Zhao Liu 




Re: [PATCH v2 2/4] target/i386: Add CPUID leaf 0xC000_0001 EDX definitions

2024-08-08 Thread Zhao Liu
On Thu, Jul 04, 2024 at 07:25:09AM -0400, EwanHai wrote:
> Date: Thu, 4 Jul 2024 07:25:09 -0400
> From: EwanHai 
> Subject: [PATCH v2 2/4] target/i386: Add CPUID leaf 0xC000_0001 EDX
>  definitions
> X-Mailer: git-send-email 2.34.1
> 
> Add new CPUID feature flags for various Zhaoxin PadLock extensions.
> These definitions will be used for Zhaoxin CPU models.
> 
> Signed-off-by: EwanHai 
> ---
>  target/i386/cpu.h | 21 +
>  1 file changed, 21 insertions(+)

Reviewed-by: Zhao Liu 




Re: [PATCH v2 1/4] target/i386: Add support for Zhaoxin CPU vendor identification

2024-08-08 Thread Zhao Liu
On Thu, Jul 04, 2024 at 07:25:08AM -0400, EwanHai wrote:
> Date: Thu, 4 Jul 2024 07:25:08 -0400
> From: EwanHai 
> Subject: [PATCH v2 1/4] target/i386: Add support for Zhaoxin CPU vendor
>  identification
> X-Mailer: git-send-email 2.34.1
> 
> Zhaoxin currently uses two vendors: "Shanghai" and "Centaurhauls".
> It is important to note that the latter now belongs to Zhaoxin. Therefore,
> this patch replaces CPUID_VENDOR_VIA with CPUID_VENDOR_ZHAOXIN1.
> 
> The previous CPUID_VENDOR_VIA macro was only defined but never used in
> QEMU, making this change straightforward.
> 
> Additionally, the IS_ZHAOXIN_CPU macro has been added to simplify the
> checks for Zhaoxin CPUs.
> 
> Signed-off-by: EwanHai 
> ---
>  target/i386/cpu.h | 20 +++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)

Reviewed-by: Zhao Liu 




Re: [PATCH v2 4/4] target/i386: Update CMPLegacy handling for Zhaoxin CPUs

2024-08-08 Thread Zhao Liu
Hi EwanHai,

On Thu, Jul 04, 2024 at 07:25:11AM -0400, EwanHai wrote:
> Date: Thu, 4 Jul 2024 07:25:11 -0400
> From: EwanHai 
> Subject: [PATCH v2 4/4] target/i386: Update CMPLegacy handling for Zhaoxin
>  CPUs
> X-Mailer: git-send-email 2.34.1
> 
> Zhaoxin CPUs handle the CMPLegacy bit in the same way
> as Intel CPUs. This patch simplifies the existing logic by
> using the IS_XXX_CPU macro and includes checks for Zhaoxin
> vendor to align their behavior with Intel.
> 
> Signed-off-by: EwanHai 
> ---
>  target/i386/cpu.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index a3747fc487..c52a4cf3ba 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6945,9 +6945,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>   * So don't set it here for Intel to make Linux guests happy.
>   */
>  if (threads_per_pkg > 1) {
> -if (env->cpuid_vendor1 != CPUID_VENDOR_INTEL_1 ||
> -env->cpuid_vendor2 != CPUID_VENDOR_INTEL_2 ||
> -env->cpuid_vendor3 != CPUID_VENDOR_INTEL_3) {
> +if (!IS_INTEL_CPU(env) && !IS_ZHAOXIN_CPU(env)) {

This change implicitly changes the behavior of existing VIA CPU.

Is this a bug for the original VIA? If so, I suggest a separate patch to
fix it and explain the effect on the VIA (Zhaoxin1) CPU.

Regards,
Zhao

>  *ecx |= 1 << 1;/* CmpLegacy bit */
>  }
>  }
> -- 
> 2.34.1
> 



Re: [PATCH] i386/cpu: Introduce enable_cpuid_0x1f to force exposing CPUID 0x1f

2024-08-08 Thread Zhao Liu
Hi Xiaoyao,

Patch is generally fine for me. Just a few nits:

On Fri, Aug 02, 2024 at 03:24:26AM -0400, Xiaoyao Li wrote:
> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> index dff49fce1154..b63bce2f4c82 100644
> --- a/include/hw/i386/topology.h
> +++ b/include/hw/i386/topology.h
> @@ -207,13 +207,4 @@ static inline apic_id_t 
> x86_apicid_from_cpu_idx(X86CPUTopoInfo *topo_info,
>  return x86_apicid_from_topo_ids(topo_info, &topo_ids);
>  }
>  
> -/*
> - * Check whether there's extended topology level (module or die)?
> - */
> -static inline bool x86_has_extended_topo(unsigned long *topo_bitmap)
> -{
> -return test_bit(CPU_TOPO_LEVEL_MODULE, topo_bitmap) ||
> -   test_bit(CPU_TOPO_LEVEL_DIE, topo_bitmap);
> -}
> -

[snip]

> +/*
> + * Check whether there's v2 extended topology level (module or die)?
> + */
> +bool x86_has_v2_extended_topo(X86CPU *cpu)
> +{
> +if (cpu->enable_cpuid_0x1f) {
> +return true;
> +}
> +
> +return test_bit(CPU_TOPO_LEVEL_MODULE, cpu->env.avail_cpu_topo) ||
> +   test_bit(CPU_TOPO_LEVEL_DIE, cpu->env.avail_cpu_topo);
> +}
> +

I suggest to decouple 0x1f enablement and extended topo check, since as
the comment of CPUTopoLevel said:

/*
 * CPUTopoLevel is the general i386 topology hierarchical representation,
 * ordered by increasing hierarchical relationship.
 * Its enumeration value is not bound to the type value of Intel (CPUID[0x1F])
 * or AMD (CPUID[0x8026]).
 */

The topology enumeration is generic and is not bound to the vendor.

[snip]

> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index c6cc035df3d8..211a42ffbfa6 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -2110,6 +2110,9 @@ struct ArchCPU {
>  /* Compatibility bits for old machine types: */
>  bool enable_cpuid_0xb;
>  
> +/* Force to expose cpuid 0x1f */

Maybe "Force to enable cpuid 0x1f"?

> +bool enable_cpuid_0x1f;
> +
>  /* Enable auto level-increase for all CPUID leaves */
>  bool full_cpuid_auto_level;i

Regards,
Zhao




Re: [PATCH v1 2/3] target/i386: Add VMX control bits for nested FRED support

2024-08-08 Thread Zhao Liu
Hi Xin,

On Thu, Aug 08, 2024 at 12:04:42AM -0700, Xin Li wrote:
> Date: Thu, 8 Aug 2024 00:04:42 -0700
> From: Xin Li 
> Subject: Re: [PATCH v1 2/3] target/i386: Add VMX control bits for nested
>  FRED support
> 
> On 8/7/2024 8:58 AM, Zhao Liu wrote:
> > On Wed, Aug 07, 2024 at 01:18:11AM -0700, Xin Li (Intel) wrote:
> > > @@ -1435,7 +1435,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
> > >   "vmx-exit-save-efer", "vmx-exit-load-efer",
> > >   "vmx-exit-save-preemption-timer", 
> > > "vmx-exit-clear-bndcfgs",
> > >   NULL, "vmx-exit-clear-rtit-ctl", NULL, NULL,
> > > -NULL, "vmx-exit-load-pkrs", NULL, NULL,
> > > +NULL, "vmx-exit-load-pkrs", NULL, "vmx-exit-secondary-ctls",
> > 
> > Oh, the order of my reviews is mixed up.
> > It's better to move VMX_VM_EXIT_ACTIVATE_SECONDARY_CONTROLS into this patch.
> 
> Usually a simple definition is added in a patch where it is used, not in
> qemu?
> 
> > >   },
> > >   .msr = {
> > >   .index = MSR_IA32_VMX_TRUE_EXIT_CTLS,
> > > @@ -1450,7 +1450,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
> > >   NULL, "vmx-entry-ia32e-mode", NULL, NULL,
> > >   NULL, "vmx-entry-load-perf-global-ctrl", 
> > > "vmx-entry-load-pat", "vmx-entry-load-efer",
> > >   "vmx-entry-load-bndcfgs", NULL, "vmx-entry-load-rtit-ctl", 
> > > NULL,
> > > -NULL, NULL, "vmx-entry-load-pkrs", NULL,
> > > +NULL, NULL, "vmx-entry-load-pkrs", "vmx-entry-load-fred",
> > 
> > Should we also define VMX_VM_ENTRY_LOAD_FRED? "vmx-entry-load-rtit-ctl"
> > and "vmx-entry-load-pkrs" have their corresponding bit definitions, even
> > if they are not used.
> 
> I'm not sure, but why add something that is not being used (thus not
> tested)?

Yes, the use of macros is a factor. My another consideration is the
integrity of the feature definitions. When the such feature definitions
were first introduced in commit 704798add83b (”target/i386: add VMX
definitions”), I understand thay were mainly used to enumerate and
reflect hardware support and not all defs are used directly.

The feat word name and the feature definition should essentially be
bound, and it might be possible to generate the feature definition
from the feat word via some script without having to add it manually,
but right now there is no work on this, and no additional constraints,
so we have to manually add and manually check it to make sure that the
two correspond to each other. When a feature word is added, it means
that Host supports the corresponding feature, and from an integrity
perspective, so it is natural to continue adding definition (just like
the commit 52a44ad2b92b ("target/i386: Expose VMX entry/exit load pkrs
control bits")), right?

Though I found that there are still some mismatches between the feature
word and the corresponding definition, but ideally they should coexist.

About the test, if it's just enumerated and not added to a specific CPU
model or involved by other logic, it's harmless?

Thanks,
Zhao




Re: [PATCH v1 2/3] target/i386: Add VMX control bits for nested FRED support

2024-08-07 Thread Zhao Liu
Hi Xin,

On Wed, Aug 07, 2024 at 01:18:11AM -0700, Xin Li (Intel) wrote:
> Date: Wed,  7 Aug 2024 01:18:11 -0700
> From: "Xin Li (Intel)" 
> Subject: [PATCH v1 2/3] target/i386: Add VMX control bits for nested FRED
>  support
> X-Mailer: git-send-email 2.45.2
> 
> Add definitions of
>   1) VM-exit activate secondary controls bit
>   2) VM-entry load FRED bit
> which are required to enable nested FRED.
> 
> Reviewed-by: Zhao Liu 
> Signed-off-by: Xin Li (Intel) 
> ---
>  target/i386/cpu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 85ef7452c0..31f287cae0 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1435,7 +1435,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>  "vmx-exit-save-efer", "vmx-exit-load-efer",
>  "vmx-exit-save-preemption-timer", "vmx-exit-clear-bndcfgs",
>  NULL, "vmx-exit-clear-rtit-ctl", NULL, NULL,
> -NULL, "vmx-exit-load-pkrs", NULL, NULL,
> +NULL, "vmx-exit-load-pkrs", NULL, "vmx-exit-secondary-ctls",

Oh, the order of my reviews is mixed up.
It's better to move VMX_VM_EXIT_ACTIVATE_SECONDARY_CONTROLS into this patch.

>  },
>  .msr = {
>  .index = MSR_IA32_VMX_TRUE_EXIT_CTLS,
> @@ -1450,7 +1450,7 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
>  NULL, "vmx-entry-ia32e-mode", NULL, NULL,
>  NULL, "vmx-entry-load-perf-global-ctrl", "vmx-entry-load-pat", 
> "vmx-entry-load-efer",
>  "vmx-entry-load-bndcfgs", NULL, "vmx-entry-load-rtit-ctl", NULL,
> -NULL, NULL, "vmx-entry-load-pkrs", NULL,
> +NULL, NULL, "vmx-entry-load-pkrs", "vmx-entry-load-fred",

Should we also define VMX_VM_ENTRY_LOAD_FRED? "vmx-entry-load-rtit-ctl"
and "vmx-entry-load-pkrs" have their corresponding bit definitions, even
if they are not used.

Regards,
Zhao

>  NULL, NULL, NULL, NULL,
>  NULL, NULL, NULL, NULL,
>  },
> -- 
> 2.45.2
> 



Re: [PATCH v1 3/3] target/i386: Raise the highest index value used for any VMCS encoding

2024-08-07 Thread Zhao Liu
Hi Xin,

On Wed, Aug 07, 2024 at 01:18:12AM -0700, Xin Li (Intel) wrote:
> Date: Wed,  7 Aug 2024 01:18:12 -0700
> From: "Xin Li (Intel)" 
> Subject: [PATCH v1 3/3] target/i386: Raise the highest index value used for
>  any VMCS encoding
> X-Mailer: git-send-email 2.45.2
> 
> From: Lei Wang 
> 
> Because the index value of the VMCS field encoding of FRED injected-event
> data (one of the newly added VMCS fields for FRED transitions), 0x52, is
> larger than any existing index value, raise the highest index value used
> for any VMCS encoding to 0x52.
> 
> Because the index value of the VMCS field encoding of Secondary VM-exit
> controls, 0x44, is larger than any existing index value, raise the highest
> index value used for any VMCS encoding to 0x44.
> 
> Co-developed-by: Xin Li 
> Signed-off-by: Xin Li 
> Signed-off-by: Lei Wang 
> Signed-off-by: Xin Li (Intel) 
> ---
>  target/i386/cpu.h | 1 +
>  target/i386/kvm/kvm.c | 9 -
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 118ef9cb68..62324c3dcd 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1186,6 +1186,7 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU 
> *cpu, FeatureWord w);
>  #define VMX_VM_EXIT_PT_CONCEAL_PIP  0x0100
>  #define VMX_VM_EXIT_CLEAR_IA32_RTIT_CTL 0x0200
>  #define VMX_VM_EXIT_LOAD_IA32_PKRS  0x2000
> +#define VMX_VM_EXIT_ACTIVATE_SECONDARY_CONTROLS 0x8000

It's necessary to add the corresponding feat_name to FEAT_VMX_EXIT_CTLS
feat word array, which could help filter the user's settings in the -cpu.

>  #define VMX_VM_ENTRY_LOAD_DEBUG_CONTROLS0x0004
>  #define VMX_VM_ENTRY_IA32E_MODE 0x0200
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 31f149c990..fac5990274 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -3694,7 +3694,14 @@ static void kvm_msr_entry_add_vmx(X86CPU *cpu, 
> FeatureWordArray f)
>  kvm_msr_entry_add(cpu, MSR_IA32_VMX_CR4_FIXED0,
>CR4_VMXE_MASK);
>  
> -if (f[FEAT_VMX_SECONDARY_CTLS] & VMX_SECONDARY_EXEC_TSC_SCALING) {
> +if (f[FEAT_7_1_EAX] & CPUID_7_1_EAX_FRED) {
> +/* FRED injected-event data (0x2052).  */
> +kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x52);

HMM, I have the questions when I check the FRED spec.

Section 9.3.4 said, (for injected-event data) "This field has uses the
encoding pair 2052H/2053H."

So why adjust the highest index to 0x52 other than 0x53?

And it seems FRED introduces another field "original-event data"
(0x2404/0x2405), why not consider this field here as well?

> +} else if (f[FEAT_VMX_EXIT_CTLS] &
> +   VMX_VM_EXIT_ACTIVATE_SECONDARY_CONTROLS) {
> +/* Secondary VM-exit controls (0x2044).  */
> +kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x44);
> +} else if (f[FEAT_VMX_SECONDARY_CTLS] & VMX_SECONDARY_EXEC_TSC_SCALING) {
>  /* TSC multiplier (0x2032).  */
>  kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM, 0x32);
>  } else {

Maybe we could adjust the index in a cleaner way like
x86_cpu_adjust_level(), but the current case-by-case is ok for me as
well.

Regards,
Zhao





Re: [PATCH v1 1/3] target/i386: Delete duplicated macro definition CR4_FRED_MASK

2024-08-07 Thread Zhao Liu
On Wed, Aug 07, 2024 at 01:18:10AM -0700, Xin Li (Intel) wrote:
> Date: Wed,  7 Aug 2024 01:18:10 -0700
> From: "Xin Li (Intel)" 
> Subject: [PATCH v1 1/3] target/i386: Delete duplicated macro definition
>  CR4_FRED_MASK
> X-Mailer: git-send-email 2.45.2
> 
> Macro CR4_FRED_MASK is defined twice, delete one.
> 
> Signed-off-by: Xin Li (Intel) 
> ---
>  target/i386/cpu.h | 6 --
>  1 file changed, 6 deletions(-)

Reviewed-by: Zhao Liu 




Re: [PATCH 8/8] qemu-options: Add the description of smp-cache object

2024-08-07 Thread Zhao Liu
Hi Markus,

Just a kindly ping. Hopefully we can continue this discussion when
you're free.

Regards,
Zhao

On Fri, Aug 02, 2024 at 03:58:02PM +0800, Zhao Liu wrote:
> Date: Fri, 2 Aug 2024 15:58:02 +0800
> From: Zhao Liu 
> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
>  object
> 
> On Thu, Aug 01, 2024 at 01:28:27PM +0200, Markus Armbruster wrote:
> > Date: Thu, 01 Aug 2024 13:28:27 +0200
> > From: Markus Armbruster 
> > Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> >  object
> > 
> > Zhao Liu  writes:
> > 
> > > On Thu, Jul 25, 2024 at 11:07:12AM +0200, Markus Armbruster wrote:
> > >> Date: Thu, 25 Jul 2024 11:07:12 +0200
> > >> From: Markus Armbruster 
> > >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> > >>  object
> > >> 
> > >> Zhao Liu  writes:
> > >> 
> > >> > Hi Markus and Daniel,
> > >> >
> > >> > I have the questions about the -object per cache implementation:
> > >> >
> > >> > On Wed, Jul 24, 2024 at 02:39:29PM +0200, Markus Armbruster wrote:
> > >> >> Date: Wed, 24 Jul 2024 14:39:29 +0200
> > >> >> From: Markus Armbruster 
> > >> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of 
> > >> >> smp-cache
> > >> >>  object
> > >> >> 
> > >> >> Zhao Liu  writes:
> > >> >> 
> > >> >> > Hi Markus,
> > >> >> >
> > >> >> > On Mon, Jul 22, 2024 at 03:37:43PM +0200, Markus Armbruster wrote:
> > >> >> >> Date: Mon, 22 Jul 2024 15:37:43 +0200
> > >> >> >> From: Markus Armbruster 
> > >> >> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of 
> > >> >> >> smp-cache
> > >> >> >>  object
> > >> >> >> 
> > >> >> >> Zhao Liu  writes:
> > >> >> >> 
> > >> >> >> > Signed-off-by: Zhao Liu 
> > >> >> >> 
> > >> >> >> This patch is just documentation.  The code got added in some 
> > >> >> >> previous
> > >> >> >> patch.  Would it make sense to squash this patch into that previous
> > >> >> >> patch?
> > >> >> >
> > >> >> > OK, I'll merge them.
> > >> >> >
> > >> >> >> > ---
> > >> >> >> > Changes since RFC v2:
> > >> >> >> >  * Rewrote the document of smp-cache object.
> > >> >> >> >
> > >> >> >> > Changes since RFC v1:
> > >> >> >> >  * Use "*_cache=topo_level" as -smp example as the original 
> > >> >> >> > "level"
> > >> >> >> >term for a cache has a totally different meaning. (Jonathan)
> > >> >> >> > ---
> > >> >> >> >  qemu-options.hx | 58 
> > >> >> >> > +
> > >> >> >> >  1 file changed, 58 insertions(+)
> > >> >> >> >
> > >> >> >> > diff --git a/qemu-options.hx b/qemu-options.hx
> > >> >> >> > index 8ca7f34ef0c8..4b84f4508a6e 100644
> > >> >> >> > --- a/qemu-options.hx
> > >> >> >> > +++ b/qemu-options.hx
> > >> >> >> > @@ -159,6 +159,15 @@ SRST
> > >> >> >> >  ::
> > >> >> >> >  
> > >> >> >> >  -machine 
> > >> >> >> > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512
> > >> >> >> > +
> > >> >> >> > +``smp-cache='id'``
> > >> >> >> > +Allows to configure cache property (now only the cache 
> > >> >> >> > topology level).
> > >> >> >> > +
> > >> >> >> > +For example:
> > >> >> >> > +::
> > >> >> >> > +
> > >> >> >> > +-object 
> > >>

Re: [RFC 0/5] accel/kvm: Support KVM PMU filter

2024-08-02 Thread Zhao Liu
On Fri, Aug 02, 2024 at 05:41:57PM +0800, Shaoqin Huang wrote:
> Date: Fri, 2 Aug 2024 17:41:57 +0800
> From: Shaoqin Huang 
> Subject: Re: [RFC 0/5] accel/kvm: Support KVM PMU filter
> 
> Hi Zhao,
> 
> On 8/2/24 17:37, Zhao Liu wrote:
> > Hello Shaoqin,
> > 
> > On Fri, Aug 02, 2024 at 05:01:47PM +0800, Shaoqin Huang wrote:
> > > Date: Fri, 2 Aug 2024 17:01:47 +0800
> > > From: Shaoqin Huang 
> > > Subject: Re: [RFC 0/5] accel/kvm: Support KVM PMU filter
> > > 
> > > Hi Zhao,
> > > 
> > > On 7/10/24 12:51, Zhao Liu wrote:
> > > > Hi QEMU maintainers, arm and PMU folks,
> > > > 
> > > > I picked up Shaoqing's previous work [1] on the KVM PMU filter for arm,
> > > > and now is trying to support this feature for x86 with a JSON-compatible
> > > > API.
> > > > 
> > > > While arm and x86 use different KVM ioctls to configure the PMU filter,
> > > > considering they all have similar inputs (PMU event + action), it is
> > > > still possible to abstract a generic, cross-architecture kvm-pmu-filter
> > > > object and provide users with a sufficiently generic or near-consistent
> > > > QAPI interface.
> > > > 
> > > > That's what I did in this series, a new kvm-pmu-filter object, with the
> > > > API like:
> > > > 
> > > > -object 
> > > > '{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0xc4"}]}'
> > > > 
> > > > For i386, this object is inserted into kvm accelerator and is extended
> > > > to support fixed-counter and more formats ("x86-default" and
> > > > "x86-masked-entry"):
> > > > 
> > > > -accel kvm,pmu-filter=f0 \
> > > > -object 
> > > > pmu='{"qom-type":"kvm-pmu-filter","id":"f0","x86-fixed-counter":{"action":"allow","bitmap":"0x0"},"events":[{"action":"allow","format":"x86-masked-entry","select":"0xc4","mask":"0xff","match":"0","exclude":true},{"action":"allow","format":"x86-masked-entry","select":"0xc5","mask":"0xff","match":"0","exclude":true}]}'
> > > 
> > > What if I want to create the PMU Filter on ARM to deny the event range
> > > [0x5,0x10], and allow deny event 0x13, how should I write the json?
> > > 
> > 
> > Cuurently this doesn't support the event range (since the raw format of
> > x86 event cannot be said to be continuous).
> > 
> > So with the basic support, we need to configure events one by one:
> > 
> > -object 
> > pmu='{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0x5"},{"action":"allow","format":"raw","select":"0x6"},{"action":"allow","format":"raw","code":"0x7"},{"action":"allow","format":"raw","code":"0x8"},{"action":"allow","format":"raw","code":"0x9"},{"action":"allow","format":"raw","code":"0x10"},{"action":"deny","format":"raw","code":"0x13"}]}'
> > 
> > This one looks a lot more complicated, but in the future, arm could
> > further support event-range (maybe implement event-range via mask), but
> > I think this could be arch-specific format since not all architectures'
> > events are continuous.
> > 
> > Additional, I'm a bit confused by your example, and I hope you can help
> > me understand that: when configuring 0x5~0x10 to be allow, isn't it true
> > that all other events are denied by default, so denying 0x13 again is a
> > redundant operation? What is the default action for all other events
> > except 0x5~0x10 and 0x13?
> > 
> > If we specify action as allow for 0x5~0x10 and deny for the rest by
> > default, then there is no need to set an action for each event but only
> > a global one (as suggested by Dapeng), so the above command line can be
> > simplified as:
> > 
> > -object 
> > pmu='{"qom-type":"kvm-pmu-filter","id":"f0","action":"allow","events":[{"format":"raw","code":"0x5"},{"format":"raw","select":"0x6"},{"format":"raw","code":"0x7"},{"format":"raw","code":"0x8"},{"format":"raw","code":"0x9"},{"format":"raw","code":"0x10"}]}'
> > 
> 
> Yes you are right. On Arm when you first set the PMU Filter, if the first
> filter is allow, then all other event will be denied by default. The reverse
> is also the same, if the first filter is deny, then all other event will be
> allowed by default.
> 
> On ARM the PMU Filter is much more simper than x86 I think. We only need to
> care about the special event with allow or deny action.
> 
> If we don't support event range filter, I think that's fine. This can be
> added in the future.

This is good news for me, I can implement global action in the next
version and iterate further! Thank you for your confirmation!

Regards,
Zhao




Re: [RFC 0/5] accel/kvm: Support KVM PMU filter

2024-08-02 Thread Zhao Liu
Hello Shaoqin,

On Fri, Aug 02, 2024 at 05:01:47PM +0800, Shaoqin Huang wrote:
> Date: Fri, 2 Aug 2024 17:01:47 +0800
> From: Shaoqin Huang 
> Subject: Re: [RFC 0/5] accel/kvm: Support KVM PMU filter
> 
> Hi Zhao,
> 
> On 7/10/24 12:51, Zhao Liu wrote:
> > Hi QEMU maintainers, arm and PMU folks,
> > 
> > I picked up Shaoqing's previous work [1] on the KVM PMU filter for arm,
> > and now is trying to support this feature for x86 with a JSON-compatible
> > API.
> > 
> > While arm and x86 use different KVM ioctls to configure the PMU filter,
> > considering they all have similar inputs (PMU event + action), it is
> > still possible to abstract a generic, cross-architecture kvm-pmu-filter
> > object and provide users with a sufficiently generic or near-consistent
> > QAPI interface.
> > 
> > That's what I did in this series, a new kvm-pmu-filter object, with the
> > API like:
> > 
> > -object 
> > '{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0xc4"}]}'
> > 
> > For i386, this object is inserted into kvm accelerator and is extended
> > to support fixed-counter and more formats ("x86-default" and
> > "x86-masked-entry"):
> > 
> > -accel kvm,pmu-filter=f0 \
> > -object 
> > pmu='{"qom-type":"kvm-pmu-filter","id":"f0","x86-fixed-counter":{"action":"allow","bitmap":"0x0"},"events":[{"action":"allow","format":"x86-masked-entry","select":"0xc4","mask":"0xff","match":"0","exclude":true},{"action":"allow","format":"x86-masked-entry","select":"0xc5","mask":"0xff","match":"0","exclude":true}]}'
> 
> What if I want to create the PMU Filter on ARM to deny the event range
> [0x5,0x10], and allow deny event 0x13, how should I write the json?
>

Cuurently this doesn't support the event range (since the raw format of
x86 event cannot be said to be continuous).

So with the basic support, we need to configure events one by one:

-object 
pmu='{"qom-type":"kvm-pmu-filter","id":"f0","events":[{"action":"allow","format":"raw","code":"0x5"},{"action":"allow","format":"raw","select":"0x6"},{"action":"allow","format":"raw","code":"0x7"},{"action":"allow","format":"raw","code":"0x8"},{"action":"allow","format":"raw","code":"0x9"},{"action":"allow","format":"raw","code":"0x10"},{"action":"deny","format":"raw","code":"0x13"}]}'

This one looks a lot more complicated, but in the future, arm could
further support event-range (maybe implement event-range via mask), but
I think this could be arch-specific format since not all architectures'
events are continuous.

Additional, I'm a bit confused by your example, and I hope you can help
me understand that: when configuring 0x5~0x10 to be allow, isn't it true
that all other events are denied by default, so denying 0x13 again is a
redundant operation? What is the default action for all other events
except 0x5~0x10 and 0x13?

If we specify action as allow for 0x5~0x10 and deny for the rest by
default, then there is no need to set an action for each event but only
a global one (as suggested by Dapeng), so the above command line can be
simplified as:

-object 
pmu='{"qom-type":"kvm-pmu-filter","id":"f0","action":"allow","events":[{"format":"raw","code":"0x5"},{"format":"raw","select":"0x6"},{"format":"raw","code":"0x7"},{"format":"raw","code":"0x8"},{"format":"raw","code":"0x9"},{"format":"raw","code":"0x10"}]}'

Thanks,
Zhao




Re: [PATCH 8/8] qemu-options: Add the description of smp-cache object

2024-08-02 Thread Zhao Liu
On Thu, Aug 01, 2024 at 01:28:27PM +0200, Markus Armbruster wrote:
> Date: Thu, 01 Aug 2024 13:28:27 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
>  object
> 
> Zhao Liu  writes:
> 
> > On Thu, Jul 25, 2024 at 11:07:12AM +0200, Markus Armbruster wrote:
> >> Date: Thu, 25 Jul 2024 11:07:12 +0200
> >> From: Markus Armbruster 
> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> >>  object
> >> 
> >> Zhao Liu  writes:
> >> 
> >> > Hi Markus and Daniel,
> >> >
> >> > I have the questions about the -object per cache implementation:
> >> >
> >> > On Wed, Jul 24, 2024 at 02:39:29PM +0200, Markus Armbruster wrote:
> >> >> Date: Wed, 24 Jul 2024 14:39:29 +0200
> >> >> From: Markus Armbruster 
> >> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> >> >>  object
> >> >> 
> >> >> Zhao Liu  writes:
> >> >> 
> >> >> > Hi Markus,
> >> >> >
> >> >> > On Mon, Jul 22, 2024 at 03:37:43PM +0200, Markus Armbruster wrote:
> >> >> >> Date: Mon, 22 Jul 2024 15:37:43 +0200
> >> >> >> From: Markus Armbruster 
> >> >> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of 
> >> >> >> smp-cache
> >> >> >>  object
> >> >> >> 
> >> >> >> Zhao Liu  writes:
> >> >> >> 
> >> >> >> > Signed-off-by: Zhao Liu 
> >> >> >> 
> >> >> >> This patch is just documentation.  The code got added in some 
> >> >> >> previous
> >> >> >> patch.  Would it make sense to squash this patch into that previous
> >> >> >> patch?
> >> >> >
> >> >> > OK, I'll merge them.
> >> >> >
> >> >> >> > ---
> >> >> >> > Changes since RFC v2:
> >> >> >> >  * Rewrote the document of smp-cache object.
> >> >> >> >
> >> >> >> > Changes since RFC v1:
> >> >> >> >  * Use "*_cache=topo_level" as -smp example as the original "level"
> >> >> >> >term for a cache has a totally different meaning. (Jonathan)
> >> >> >> > ---
> >> >> >> >  qemu-options.hx | 58 
> >> >> >> > +
> >> >> >> >  1 file changed, 58 insertions(+)
> >> >> >> >
> >> >> >> > diff --git a/qemu-options.hx b/qemu-options.hx
> >> >> >> > index 8ca7f34ef0c8..4b84f4508a6e 100644
> >> >> >> > --- a/qemu-options.hx
> >> >> >> > +++ b/qemu-options.hx
> >> >> >> > @@ -159,6 +159,15 @@ SRST
> >> >> >> >  ::
> >> >> >> >  
> >> >> >> >  -machine 
> >> >> >> > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512
> >> >> >> > +
> >> >> >> > +``smp-cache='id'``
> >> >> >> > +Allows to configure cache property (now only the cache 
> >> >> >> > topology level).
> >> >> >> > +
> >> >> >> > +For example:
> >> >> >> > +::
> >> >> >> > +
> >> >> >> > +-object 
> >> >> >> > '{"qom-type":"smp-cache","id":"cache","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"module"},{"name":"l3","topo":"die"}]}'
> >> >> >> > +-machine smp-cache=cache
> >> >> >> >  ERST
> >> >> >> >  
> >> >> >> >  DEF("M", HAS_ARG, QEMU_OPTION_M,
> >> >> >> > @@ -5871,6 +5880,55 @@ SRST
> >> >> >> >  ::
> >> >> >> >  

Re: [PATCH v1] target/i386: Always set leaf 0x1f

2024-08-01 Thread Zhao Liu
On Wed, Jul 31, 2024 at 07:30:44PM +0530, Manish wrote:
> Date: Wed, 31 Jul 2024 19:30:44 +0530
> From: Manish 
> Subject: Re: [PATCH v1] target/i386: Always set leaf 0x1f
> 
> 
> On 30/07/24 6:39 pm, Igor Mammedov wrote:
> > !---|
> >CAUTION: External Email
> > 
> > |---!
> > 
> > On Mon, 29 Jul 2024 19:42:39 +0700
> > Manish  wrote:
> > 
> > > On 29/07/24 7:18 pm, Igor Mammedov wrote:
> > > > !---|
> > > > CAUTION: External Email
> > > > 
> > > > |-------!
> > > > 
> > > > On Wed, 24 Jul 2024 23:00:13 +0800
> > > > Zhao Liu  wrote:
> > > > > Hi Igor,
> > > > > 
> > > > > On Wed, Jul 24, 2024 at 02:54:32PM +0200, Igor Mammedov wrote:
> > > > > > Date: Wed, 24 Jul 2024 14:54:32 +0200
> > > > > > From: Igor Mammedov 
> > > > > > Subject: Re: [PATCH v1] target/i386: Always set leaf 0x1f
> > > > > > X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-redhat-linux-gnu)
> > > > > > 
> > > > > > On Wed, 24 Jul 2024 12:13:28 +0100
> > > > > > John Levon  wrote:
> > > > > > > On Wed, Jul 24, 2024 at 03:59:29PM +0530, Manish wrote:
> > > > > > > > > > Leaf 0x1f is superset of 0xb, so it makes sense to set 0x1f 
> > > > > > > > > > equivalent
> > > > > > > > > > to 0xb by default and workaround windows issue.>
> > > > > > > > > > This change adds a
> > > > > > > > > > new property 'cpuid-0x1f-enforce' to set leaf 0x1f 
> > > > > > > > > > equivalent to 0xb in
> > > > > > > > > > case extended CPU topology is not configured and behave as 
> > > > > > > > > > before otherwise.
> > > > > > > > > repeating question
> > > > > > > > > why we need to use extra property instead of just adding 0x1f 
> > > > > > > > > leaf for CPU models
> > > > > > > > > that supposed to have it?
> > > > > > > > As i mentioned in earlier response. "Windows expects it only 
> > > > > > > > when we have
> > > > > > > > set max cpuid level greater than or equal to 0x1f. I mean if it 
> > > > > > > > is exposed
> > > > > > > > it should not be all zeros. SapphireRapids CPU definition 
> > > > > > > > raised cpuid level
> > > > > > > > to 0x20, so we starting seeing it with SapphireRapids."
> > > > > > > > 
> > > > > > > > Windows does not expect 0x1f to be present for any CPU model. 
> > > > > > > > But if it is
> > > > > > > > exposed to the guest, it expects non-zero values.
> > > > > > > I think Igor is suggesting:
> > > > > > > 
> > > > > > >- leave x86_cpu_expand_features() alone completely
> > > > > > yep, drop that if possible
> > > > > > 
> > > > > > >- change the 0x1f handling to always report topology i.e. 
> > > > > > > never report all
> > > > > > >  zeroes
> > > > > > Do this but only for CPU models that have this leaf per spec,
> > > > > > to avoid live migration issues create a new version of CPU model,
> > > > > > so it would apply only for new version. This way older versions
> > > > > > and migration won't be affected.
> > > > > So that in the future every new Intel CPU model will need to always
> > > > > enable 0x1f. Sounds like an endless game. So my question is: at what
> > > > > point is it ok to consider defaulting to always enable 0x1f and just
> > > > > disable it for the old CPU model?
> > > > I have suggested to enable 0x1f leaf excluding:
> > > >  * existing cpu models (versions)
> > > >  * cpu models that do not have this leaf in real world should
> > > >not have it in QEMU either.
> > > > 
> > > > If cpu

Re: [PATCH 8/8] qemu-options: Add the description of smp-cache object

2024-08-01 Thread Zhao Liu
On Thu, Jul 25, 2024 at 11:07:12AM +0200, Markus Armbruster wrote:
> Date: Thu, 25 Jul 2024 11:07:12 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
>  object
> 
> Zhao Liu  writes:
> 
> > Hi Markus and Daniel,
> >
> > I have the questions about the -object per cache implementation:
> >
> > On Wed, Jul 24, 2024 at 02:39:29PM +0200, Markus Armbruster wrote:
> >> Date: Wed, 24 Jul 2024 14:39:29 +0200
> >> From: Markus Armbruster 
> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> >>  object
> >> 
> >> Zhao Liu  writes:
> >> 
> >> > Hi Markus,
> >> >
> >> > On Mon, Jul 22, 2024 at 03:37:43PM +0200, Markus Armbruster wrote:
> >> >> Date: Mon, 22 Jul 2024 15:37:43 +0200
> >> >> From: Markus Armbruster 
> >> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> >> >>  object
> >> >> 
> >> >> Zhao Liu  writes:
> >> >> 
> >> >> > Signed-off-by: Zhao Liu 
> >> >> 
> >> >> This patch is just documentation.  The code got added in some previous
> >> >> patch.  Would it make sense to squash this patch into that previous
> >> >> patch?
> >> >
> >> > OK, I'll merge them.
> >> >
> >> >> > ---
> >> >> > Changes since RFC v2:
> >> >> >  * Rewrote the document of smp-cache object.
> >> >> >
> >> >> > Changes since RFC v1:
> >> >> >  * Use "*_cache=topo_level" as -smp example as the original "level"
> >> >> >term for a cache has a totally different meaning. (Jonathan)
> >> >> > ---
> >> >> >  qemu-options.hx | 58 
> >> >> > +
> >> >> >  1 file changed, 58 insertions(+)
> >> >> >
> >> >> > diff --git a/qemu-options.hx b/qemu-options.hx
> >> >> > index 8ca7f34ef0c8..4b84f4508a6e 100644
> >> >> > --- a/qemu-options.hx
> >> >> > +++ b/qemu-options.hx
> >> >> > @@ -159,6 +159,15 @@ SRST
> >> >> >  ::
> >> >> >  
> >> >> >  -machine 
> >> >> > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512
> >> >> > +
> >> >> > +``smp-cache='id'``
> >> >> > +Allows to configure cache property (now only the cache 
> >> >> > topology level).
> >> >> > +
> >> >> > +For example:
> >> >> > +::
> >> >> > +
> >> >> > +-object 
> >> >> > '{"qom-type":"smp-cache","id":"cache","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"module"},{"name":"l3","topo":"die"}]}'
> >> >> > +-machine smp-cache=cache
> >> >> >  ERST
> >> >> >  
> >> >> >  DEF("M", HAS_ARG, QEMU_OPTION_M,
> >> >> > @@ -5871,6 +5880,55 @@ SRST
> >> >> >  ::
> >> >> >  
> >> >> >  (qemu) qom-set /objects/iothread1 poll-max-ns 10
> >> >> > +
> >> >> > +``-object 
> >> >> > '{"qom-type":"smp-cache","id":id,"caches":[{"name":cache_name,"topo":cache_topo}]}'``
> >> >> > +Create an smp-cache object that configures machine's cache
> >> >> > +property. Currently, cache property only include cache 
> >> >> > topology
> >> >> > +level.
> >> >> > +
> >> >> > +This option must be written in JSON format to support JSON 
> >> >> > list.
> >> >> 
> >> >> Why?
> >> >
> >> > I'm not familiar with this, so I hope you could educate me if I'm wrong.
> >> >
> >> > All I know so 

Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES

2024-07-31 Thread Zhao Liu
On Wed, Jul 31, 2024 at 07:21:58AM +0200, Mauro Carvalho Chehab wrote:

[snip]

> > The name looks inconsistent with the style of other MachineClass virtual
> > methods. What about the name like "notify_xxx"? And pls add the comment
> > about this new method.
> > 
> > BTW, I found this method is called in generic_error_device_notify() of
> > Patch 6. And the mc->generic_error_device_notify() - as the virtual
> > metchod of MachineClass looks just to implement a hook, and it doesn't
> > seem to have anything to do with MachineClass/MachineState, so my
> > question is why do we need to add this method to MachineClass?
> > 
> > Could we maintain a notifier list in ghes.c and expose an interface
> > to allow arm code register a notifier? This eliminates the need to add
> > the “notify” method to MachineClass.
> 
> Makes sense. I'll change the logic to use this notifier list code inside
> ghes.c, and drop generic_error_device_notify():
> 
>   NotifierList generic_error_notifiers =
>   NOTIFIER_LIST_INITIALIZER(error_device_notifiers);
> 
>   /* Notify BIOS about an error via Generic Error Device - GED */
>   static void generic_error_device_notify(void)
>   {
>   notifier_list_notify(&generic_error_notifiers, NULL);
>   }

Fine for me.

Regards,
Zhao





Re: [PATCH 0/4] i386: Clean up SGX for microvm, completely

2024-07-30 Thread Zhao Liu
On Tue, Jul 30, 2024 at 05:44:30PM +0200, Paolo Bonzini wrote:
> Date: Tue, 30 Jul 2024 17:44:30 +0200
> From: Paolo Bonzini 
> Subject: Re: [PATCH 0/4] i386: Clean up SGX for microvm, completely
> X-Mailer: git-send-email 2.45.2
> 
> Queued, thanks.
> 

Thanks Paolo! BTW, could you please have a look at another 2 cleanup
seriess from me? :)

https://lore.kernel.org/qemu-devel/20240716161015.263031-1-zhao1@intel.com/

and

https://lore.kernel.org/qemu-devel/20240619144215.3273989-1-zhao1@intel.com/

Best Regards,
Zhao




Re: [PATCH 07/18] qapi/machine: Drop temporary 'prefix'

2024-07-30 Thread Zhao Liu
On Tue, Jul 30, 2024 at 10:10:21AM +0200, Markus Armbruster wrote:
> Date: Tue, 30 Jul 2024 10:10:21 +0200
> From: Markus Armbruster 
> Subject: [PATCH 07/18] qapi/machine: Drop temporary 'prefix'
> 
> Recent commit "qapi: Smarter camel_to_upper() to reduce need for
> 'prefix'" added a temporary 'prefix' to delay changing the generated
> code.
> 
> Revert it.  This improves HmatLBDataType's generated enumeration
> constant prefix from HMATLB_DATA_TYPE to HMAT_LB_DATA_TYPE.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  qapi/machine.json| 1 -
>  hw/core/numa.c   | 4 ++--
>  hw/pci-bridge/cxl_upstream.c | 4 ++--
>  3 files changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Zhao Liu 




Re: [PATCH 09/18] qapi/machine: Rename CpuS390* to S390Cpu, and drop 'prefix'

2024-07-30 Thread Zhao Liu
On Tue, Jul 30, 2024 at 10:10:23AM +0200, Markus Armbruster wrote:
> Date: Tue, 30 Jul 2024 10:10:23 +0200
> From: Markus Armbruster 
> Subject: [PATCH 09/18] qapi/machine: Rename CpuS390* to S390Cpu, and drop
>  'prefix'
> 
> QAPI's 'prefix' feature can make the connection between enumeration
> type and its constants less than obvious.  It's best used with
> restraint.
> 
> CpuS390Entitlement has a 'prefix' to change the generated enumeration
> constants' prefix from CPU_S390_POLARIZATION to S390_CPU_POLARIZATION.
 ^^
 CPU_S390_ENTITLEMENT S390_CPU_ENTITLEMENT

> Rename the type to S390CpuEntitlement, so that 'prefix' is not needed.
> 
> Likewise change CpuS390Polarization to S390CpuPolarization, and
> CpuS390State to S390CpuState.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  qapi/machine-common.json|  5 ++---
>  qapi/machine-target.json| 11 +--
>  qapi/machine.json   |  9 -
>  qapi/pragma.json|  6 +++---
>  include/hw/qdev-properties-system.h |  2 +-
>  include/hw/s390x/cpu-topology.h |  2 +-
>  target/s390x/cpu.h  |  2 +-
>  hw/core/qdev-properties-system.c|  6 +++---
>  hw/s390x/cpu-topology.c |  6 +++---
>  9 files changed, 23 insertions(+), 26 deletions(-)

[snip]

> diff --git a/qapi/pragma.json b/qapi/pragma.json
> index 59fbe74b8c..beddea5ca4 100644
> --- a/qapi/pragma.json
> +++ b/qapi/pragma.json
> @@ -47,9 +47,9 @@
>  'BlockdevSnapshotWrapper',
>  'BlockdevVmdkAdapterType',
>  'ChardevBackendKind',
> -'CpuS390Entitlement',
> -'CpuS390Polarization',
> -'CpuS390State',
> +'S390CpuEntitlement',
> +'S390CpuPolarization',
> +'S390CpuState',
>  'CxlCorErrorType',
>  'DisplayProtocol',
>  'DriveBackupWrapper',

It seems to be in alphabetical order. The new names don't follow the
original order.

Just the above nits,

Reviewed-by: Zhao Liu 





Re: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES

2024-07-30 Thread Zhao Liu
Hi Mauro,

On Mon, Jul 29, 2024 at 03:21:06PM +0200, Mauro Carvalho Chehab wrote:
> Date: Mon, 29 Jul 2024 15:21:06 +0200
> From: Mauro Carvalho Chehab 
> Subject: [PATCH v4 2/6] arm/virt: Wire up GPIO error source for ACPI / GHES
> X-Mailer: git-send-email 2.45.2
> 
> From: Jonathan Cameron 
> 
> Creates a Generic Event Device (GED) as specified at
> ACPI 6.5 specification at 18.3.2.7.2:
> https://uefi.org/specs/ACPI/6.5/18_Platform_Error_Interfaces.html#event-notification-for-generic-error-sources
> with HID PNP0C33.
> 
> The PNP0C33 device is used to report hardware errors to
> the bios via ACPI APEI Generic Hardware Error Source (GHES).
> 
> It is aligned with Linux Kernel patch:
> https://lore.kernel.org/lkml/1272350481-27951-8-git-send-email-ying.hu...@intel.com/
> 
> [mchehab: use a define for the generic event pin number and do some cleanups]
> Signed-off-by: Jonathan Cameron 
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  hw/arm/virt-acpi-build.c | 30 ++
>  hw/arm/virt.c| 14 --
>  include/hw/arm/virt.h|  1 +
>  include/hw/boards.h  |  1 +
>  4 files changed, 40 insertions(+), 6 deletions(-)

[snip]

> +static void virt_set_error(void)
> +{
> +qemu_set_irq(qdev_get_gpio_in(gpio_error_dev, 0), 1);
> +}
> +

[snip]

> +mc->generic_error_device_notify = virt_set_error;

[snip]

> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 48ff6d8b93f7..991f99138e57 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -308,6 +308,7 @@ struct MachineClass {
>  int64_t (*get_default_cpu_node_id)(const MachineState *ms, int idx);
>  ram_addr_t (*fixup_ram_size)(ram_addr_t size);
>  uint64_t smbios_memory_device_size;
> +void (*generic_error_device_notify)(void);

The name looks inconsistent with the style of other MachineClass virtual
methods. What about the name like "notify_xxx"? And pls add the comment
about this new method.

BTW, I found this method is called in generic_error_device_notify() of
Patch 6. And the mc->generic_error_device_notify() - as the virtual
metchod of MachineClass looks just to implement a hook, and it doesn't
seem to have anything to do with MachineClass/MachineState, so my
question is why do we need to add this method to MachineClass?

Could we maintain a notifier list in ghes.c and expose an interface
to allow arm code register a notifier? This eliminates the need to add
the “notify” method to MachineClass.

Regards,
Zhao




[PATCH 2/4] target/i386/cpu: Explicitly express SGX_LC and SGX feature words dependency

2024-07-29 Thread Zhao Liu
At present, cpu_x86_cpuid() silently masks off SGX_LC if SGX is absent.

This is not proper because the user is not told about the dependency
between the two.

So explicitly define the dependency between SGX_LC and SGX feature
words, so that user could get a warning when SGX_LC is enabled but
SGX is absent.

Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 2b3642c9b13c..7a6d0b05ce27 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1730,6 +1730,10 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_7_1_EAX, CPUID_7_1_EAX_WRMSRNS },
 .to = { FEAT_7_1_EAX,   CPUID_7_1_EAX_FRED },
 },
+{
+.from = { FEAT_7_0_EBX, CPUID_7_0_EBX_SGX },
+.to = { FEAT_7_0_ECX,   CPUID_7_0_ECX_SGX_LC },
+},
 };
 
 typedef struct X86RegisterInfo32 {
@@ -6545,11 +6549,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx |= CPUID_7_0_ECX_OSPKE;
 }
 *edx = env->features[FEAT_7_0_EDX]; /* Feature flags */
-
-if ((*ecx & CPUID_7_0_ECX_SGX_LC)
-&& (!(*ebx & CPUID_7_0_EBX_SGX))) {
-*ecx &= ~CPUID_7_0_ECX_SGX_LC;
-}
 } else if (count == 1) {
 *eax = env->features[FEAT_7_1_EAX];
 *edx = env->features[FEAT_7_1_EDX];
-- 
2.34.1




[PATCH 1/4] target/i386/cpu: Remove unnecessary SGX feature words checks

2024-07-29 Thread Zhao Liu
CPUID.0x7.0.ebx and CPUID.0x7.0.ecx leaves have been expressed as the
feature word lists, and the Host capability support has been checked
in x86_cpu_filter_features().

Therefore, such checks on SGX feature "words" are redundant, and
the follow-up adjustments to those feature "words" will not actually
take effect.

Remove unnecessary SGX feature words related checks.

Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 4688d140c2d9..2b3642c9b13c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6537,8 +6537,6 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 case 7:
 /* Structured Extended Feature Flags Enumeration Leaf */
 if (count == 0) {
-uint32_t eax_0_unused, ebx_0, ecx_0, edx_0_unused;
-
 /* Maximum ECX value for sub-leaves */
 *eax = env->cpuid_level_func7;
 *ebx = env->features[FEAT_7_0_EBX]; /* Feature flags */
@@ -6548,20 +6546,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 *edx = env->features[FEAT_7_0_EDX]; /* Feature flags */
 
-/*
- * SGX cannot be emulated in software.  If hardware does not
- * support enabling SGX and/or SGX flexible launch control,
- * then we need to update the VM's CPUID values accordingly.
- */
-x86_cpu_get_supported_cpuid(0x7, 0,
-&eax_0_unused, &ebx_0,
-&ecx_0, &edx_0_unused);
-if ((*ebx & CPUID_7_0_EBX_SGX) && !(ebx_0 & CPUID_7_0_EBX_SGX)) {
-*ebx &= ~CPUID_7_0_EBX_SGX;
-}
-
 if ((*ecx & CPUID_7_0_ECX_SGX_LC)
-&& (!(*ebx & CPUID_7_0_EBX_SGX) || !(ecx_0 & 
CPUID_7_0_ECX_SGX_LC))) {
+&& (!(*ebx & CPUID_7_0_EBX_SGX))) {
 *ecx &= ~CPUID_7_0_ECX_SGX_LC;
 }
 } else if (count == 1) {
-- 
2.34.1




[PATCH 4/4] target/i386/cpu: Mask off SGX/SGX_LC feature words for non-PC machine

2024-07-29 Thread Zhao Liu
Only PC machine supports SGX, so mask off SGX related feature words for
non-PC machine (microvm).

Signed-off-by: Zhao Liu 
---
 hw/i386/sgx-stub.c|  5 +
 hw/i386/sgx.c |  8 
 include/hw/i386/sgx-epc.h |  1 +
 target/i386/cpu.c | 15 +++
 4 files changed, 29 insertions(+)

diff --git a/hw/i386/sgx-stub.c b/hw/i386/sgx-stub.c
index 16b1dfd90bb5..38ff75e9f377 100644
--- a/hw/i386/sgx-stub.c
+++ b/hw/i386/sgx-stub.c
@@ -32,6 +32,11 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 memset(&pcms->sgx_epc, 0, sizeof(SGXEPCState));
 }
 
+bool check_sgx_support(void)
+{
+return false;
+}
+
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size)
 {
 return true;
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 849472a12865..4900dd414a1f 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -266,6 +266,14 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict)
size);
 }
 
+bool check_sgx_support(void)
+{
+if (!object_dynamic_cast(qdev_get_machine(), TYPE_PC_MACHINE)) {
+return false;
+}
+return true;
+}
+
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size)
 {
 PCMachineState *pcms =
diff --git a/include/hw/i386/sgx-epc.h b/include/hw/i386/sgx-epc.h
index 3e00efd870c9..41d55da47999 100644
--- a/include/hw/i386/sgx-epc.h
+++ b/include/hw/i386/sgx-epc.h
@@ -58,6 +58,7 @@ typedef struct SGXEPCState {
 int nr_sections;
 } SGXEPCState;
 
+bool check_sgx_support(void);
 bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size);
 void sgx_epc_build_srat(GArray *table_data);
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7f55e9ba3ed8..66f9737a117c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6103,6 +6103,21 @@ uint64_t x86_cpu_get_supported_feature_word(X86CPU *cpu, 
FeatureWord w)
 }
 break;
 
+case FEAT_7_0_EBX:
+#ifndef CONFIG_USER_ONLY
+if (!check_sgx_support()) {
+unavail = CPUID_7_0_EBX_SGX;
+}
+#endif
+break;
+case FEAT_7_0_ECX:
+#ifndef CONFIG_USER_ONLY
+if (!check_sgx_support()) {
+unavail = CPUID_7_0_ECX_SGX_LC;
+}
+#endif
+break;
+
 default:
 break;
 }
-- 
2.34.1




[PATCH 3/4] target/i386/cpu: Add dependencies of CPUID 0x12 leaves

2024-07-29 Thread Zhao Liu
As SDM stated, CPUID 0x12 leaves depend on CPUID_7_0_EBX_SGX (SGX
feature word).

Since FEAT_SGX_12_0_EAX, FEAT_SGX_12_0_EBX and FEAT_SGX_12_1_EAX define
multiple feature words, add the dependencies of those registers to
report the warning to user if SGX is absent.

Signed-off-by: Zhao Liu 
---
 target/i386/cpu.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7a6d0b05ce27..7f55e9ba3ed8 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1734,6 +1734,18 @@ static FeatureDep feature_dependencies[] = {
 .from = { FEAT_7_0_EBX, CPUID_7_0_EBX_SGX },
 .to = { FEAT_7_0_ECX,   CPUID_7_0_ECX_SGX_LC },
 },
+{
+.from = { FEAT_7_0_EBX, CPUID_7_0_EBX_SGX },
+.to = { FEAT_SGX_12_0_EAX,  ~0ull },
+},
+{
+.from = { FEAT_7_0_EBX, CPUID_7_0_EBX_SGX },
+.to = { FEAT_SGX_12_0_EBX,  ~0ull },
+},
+{
+.from = { FEAT_7_0_EBX, CPUID_7_0_EBX_SGX },
+.to = { FEAT_SGX_12_1_EAX,  ~0ull },
+},
 };
 
 typedef struct X86RegisterInfo32 {
-- 
2.34.1




[PATCH 0/4] i386: Clean up SGX for microvm, completely

2024-07-29 Thread Zhao Liu
Hi,

Currently, only PC machine supports SGX and microvm doesn't.

The commit 13be929aff80 ("target/i386: do not crash if microvm guest
uses SGX CPUID leaves") has cleaned up the CPUID 0x12.{0x2..N} for
microvm to avoid Guest crash.

Per my comment on that commit [1], microvm deserves more cleanup to
mask off CPUID 0x12.{0x0,0x1} subleaves as well. But once I actually got
my hands on this, I realized not only we need to clean up CPUID 0x12,
but also we should clean up CPUID 0x7.0.ebx[SGX] for microvm.

Thus, I have this series to completely clean up SGX for microvm.

[1]: https://lore.kernel.org/qemu-devel/zpcz0cfjw8ext...@intel.com/

Thanks and Best Regards,
Zhao
---
Zhao Liu (4):
  target/i386/cpu: Remove unnecessary SGX feature words checks
  target/i386/cpu: Explicitly express SGX_LC and SGX feature words
dependency
  target/i386/cpu: Add dependencies of CPUID 0x12 leaves
  target/i386/cpu: Mask off SGX/SGX_LC feature words for non-PC machine

 hw/i386/sgx-stub.c|  5 
 hw/i386/sgx.c |  8 +++
 include/hw/i386/sgx-epc.h |  1 +
 target/i386/cpu.c | 50 ---
 4 files changed, 45 insertions(+), 19 deletions(-)

-- 
2.34.1




Re: [PATCH] doc/net/l2tpv3: Update boolean fields' description to avoid short-form use

2024-07-29 Thread Zhao Liu
On Tue, Jul 30, 2024 at 10:54:07AM +0800, Jason Wang wrote:
> Date: Tue, 30 Jul 2024 10:54:07 +0800
> From: Jason Wang 
> Subject: Re: [PATCH] doc/net/l2tpv3: Update boolean fields' description to
>  avoid short-form use
> 
> On Wed, Jul 17, 2024 at 2:15 PM Zhao Liu  wrote:
> >
> > Hi Jason,
> >
> > Just a kind ping. Does this update satisfy you?
> > Since the original example generates the warning.
> >
> > Thanks,
> > Zhao
> 
> Queued.
> 
> Thanks
>

Hi Jason, thank you! I noticed Michael has already helped me merge this
(commit cb8de74ac6df "doc/net/l2tpv3: Update boolean fields' description
to avoid short-form use"). 

Best Regards,
Zhao




Re: [PATCH] accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic

2024-07-26 Thread Zhao Liu
On Thu, Jul 25, 2024 at 03:51:32PM +0100, Salil Mehta via wrote:
> Date: Thu, 25 Jul 2024 15:51:32 +0100
> From: Salil Mehta via 
> Subject: [PATCH] accel/kvm/kvm-all: Fixes the missing break in vCPU unpark
>  logic
> X-Mailer: git-send-email 2.34.1
> 
> Loop should exit prematurely on successfully finding out the parked vCPU 
> (struct
> KVMParkedVcpu) in the 'struct KVMState' maintained 'kvm_parked_vcpus' list of
> parked vCPUs.
> 
> Fixes: Coverity CID 1558552
> Fixes: 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking} 
> code")
> Reported-by: Peter Maydell 
> Suggested-by: Peter Maydell 
> Message-ID: 
> 
> Signed-off-by: Salil Mehta 
> ---
>  accel/kvm/kvm-all.c | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Zhao Liu 




Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object

2024-07-25 Thread Zhao Liu
Thanks Jonathon!

On Thu, Jul 25, 2024 at 11:59:02AM +0100, Jonathan Cameron wrote:

[snip]

> > > I think I understand why you want to configure caches.  My question was
> > > about the connection to SMP.
> > > 
> > > Say we run a guest with a single core, no SMP.  Could configuring caches
> > > still be useful then?  
> > 
> > Probably not useful to configure topology (sizes are a separate question)
> > - any sensible default should be fine.
> >

Yes, I agree.

Regards,
Zhao




Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object

2024-07-25 Thread Zhao Liu
Hi Markus,

On Thu, Jul 25, 2024 at 10:51:49AM +0200, Markus Armbruster wrote:

[snip]

> >> What's the use case?  The commit messages don't tell.
> >
> > i386 has the default cache topology model: l1 per core/l2 per core/l3
> > per die.
> >
> > Cache topology affects scheduler performance, e.g., kernel's cluster
> > scheduling.
> >
> > Of course I can hardcode some cache topology model in the specific cpu
> > model that corresponds to the actual hardware, but for -cpu host/max,
> > the default i386 cache topology model has no flexibility, and the
> > host-cpu-cache option doesn't have enough fine-grained control over the
> > cache topology.
> >
> > So I want to provide a way to allow user create more fleasible cache
> > topology. Just like cpu topology.
> 
> 
> So the use case is exposing a configurable cache topology to the guest
> in order to increase performance.  Performance can increase when the
> configured virtual topology is closer to the physical topology than a
> default topology would be.  This can be the case with CPU host or max.
> 
> Correct?

Yes! That's x86 use case. Jonathan also helped me explain his ARM use case.

> >> Why does that use case make no sense without SMP?
> >
> > As the example I mentioned, for Intel hyrbid architecture, P cores has
> > l2 per core and E cores has l2 per module. Then either setting the l2
> > topology level as core nor module, can emulate the real case.
> >
> > Even considering the more extreme case of Intel 14th MTL CPU, where
> > some E cores have L3 and some don't even have L3. As well as the last
> > time you and Daniel mentioned that in the future we could consider
> > covering more cache properties such as cache size. But the l3 size can
> > be different in the same system, like AMD's x3D technology. So
> > generally configuring properties for @name in a list can't take into
> > account the differences of heterogeneous caches with the same @name.
> >
> > Hope my poor english explains the problem well. :-)
> 
> I think I understand why you want to configure caches.  My question was
> about the connection to SMP.
> 
> Say we run a guest with a single core, no SMP.  Could configuring caches
> still be useful then?

No, for this case the CPU topology (of x86) would be 1 core per module, 1
module per die, 1 die per socket.

Then this core actually owns the l1/l2/l3.

> >> Can the same @name occur multiple times?  Documentation doesn't tell.
> >> If yes, what does that mean?
> >
> > Yes, this means the later one will override the previous one with the same
> > name.
> 
> Needs documenting.
> 
> If you make it an error, you don't have to document it :)

OK!

> >> Say we later add value "l1" for unified level 1 cache.  Would "l1" then
> >> conflict with "l1d" and "l1u"?
> >
> > Yes, we should check in smp/machine code and ban l1 and l1i/l1d at the
> > same time. This check I suppose is easy to add.
> >
> >> May @topo be "invalid"?  Documentation doesn't tell.  If yes, what does
> >> that mean?
> >
> > Yes, just follow the intel's spec, invalid means the current topology
> > information is invalid, which is used to encode x86 CPUIDs. So when I
> > move this level to qapi, I just keeped this. Otherwise, I need to
> > re-implement the i386 specific invalid level.
> 
> I'm afraid I don't understand what is supposed to happen when I tell
> QEMU to make a cache's topology invalid.

Currently this series doesn't allow users to set invalid, if they do, QEMU
reports an error.

So this invalid is just for QEMU internal use. Do you think it's okay?

[snip]

> > Ah, I also considerred this. I didn't use "type" because people usually
> > uses cache type to indicate INSTRUCTION/DATA/UNIFIED and cache level to
> > indicate LEVEL 1/LEVEL 2/LEVEL 3. The enumeration here is a combination of
> > type+level. So I think it's better to avoid the type term.
> 
> SmpCacheLevelAndType is quite a mouthful.

Better name! Thanks!

Regards,
Zhao




Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object

2024-07-24 Thread Zhao Liu
Hi Daniel,

On Wed, Jul 24, 2024 at 10:03:02PM +0800, Zhao Liu wrote:
> Date: Wed, 24 Jul 2024 22:03:02 +0800
> From: Zhao Liu 
> Subject: Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object
> 
> On Wed, Jul 24, 2024 at 01:47:16PM +0100, Daniel P. Berrang? wrote:
> > Date: Wed, 24 Jul 2024 13:47:16 +0100
> > From: "Daniel P. Berrang?" 
> > Subject: Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object
> > 
> > On Wed, Jul 24, 2024 at 01:35:17PM +0200, Markus Armbruster wrote:
> > > Zhao Liu  writes:
> > > 
> > > > Hi Markus,
> > > >> SmpCachesProperties and SmpCacheProperties would put the singular
> > > >> vs. plural where it belongs.  Sounds a bit awkward to me, though.
> > > >> Naming is hard.
> > > >
> > > > For SmpCachesProperties, it's easy to overlook the first "s".
> > > >
> > > >> Other ideas, anybody?
> > > >
> > > > Maybe SmpCacheOptions or SmpCachesPropertyWrapper?
> > > 
> > > I wonder why we have a single QOM object to configure all caches, and
> > > not one QOM object per cache.
> > 
> > Previous versions of this series were augmenting the existing
> > -smp command line.
> 
> Ah, yes, since -smp, as a sugar option of -machine, doesn't support
> JSON. In -smp, we need to use keyval's style to configure as:
> 
> -smp caches.0.name=l1i,caches.0.topo=core
> 
> I think JSON is the more elegant way to go, so I chose -object.

I may have to retract this assertion considering more issues, I could
fall back to -smp and support it in keyval format, I think it's also ok
for me if you also like keyval format, sorry for my repetition, we can
discuss this in this thread:

https://lore.kernel.org/qemu-devel/20240704031603.1744546-1-zhao1@intel.com/T/#m8adba8ba14ebac0c9935fbf45983cc71e53ccf45

Thanks,
Zhao





Re: [PATCH v1] target/i386: Always set leaf 0x1f

2024-07-24 Thread Zhao Liu
Hi Igor,

On Wed, Jul 24, 2024 at 02:54:32PM +0200, Igor Mammedov wrote:
> Date: Wed, 24 Jul 2024 14:54:32 +0200
> From: Igor Mammedov 
> Subject: Re: [PATCH v1] target/i386: Always set leaf 0x1f
> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.42; x86_64-redhat-linux-gnu)
> 
> On Wed, 24 Jul 2024 12:13:28 +0100
> John Levon  wrote:
> 
> > On Wed, Jul 24, 2024 at 03:59:29PM +0530, Manish wrote:
> > 
> > > > > Leaf 0x1f is superset of 0xb, so it makes sense to set 0x1f equivalent
> > > > > to 0xb by default and workaround windows issue.>
> > > > > This change adds a
> > > > > new property 'cpuid-0x1f-enforce' to set leaf 0x1f equivalent to 0xb 
> > > > > in
> > > > > case extended CPU topology is not configured and behave as before 
> > > > > otherwise.  
> > > > repeating question
> > > > why we need to use extra property instead of just adding 0x1f leaf for 
> > > > CPU models
> > > > that supposed to have it?  
> > > 
> > > As i mentioned in earlier response. "Windows expects it only when we have
> > > set max cpuid level greater than or equal to 0x1f. I mean if it is exposed
> > > it should not be all zeros. SapphireRapids CPU definition raised cpuid 
> > > level
> > > to 0x20, so we starting seeing it with SapphireRapids."
> > > 
> > > Windows does not expect 0x1f to be present for any CPU model. But if it is
> > > exposed to the guest, it expects non-zero values.  
> > 
> > I think Igor is suggesting:
> > 
> >  - leave x86_cpu_expand_features() alone completely
> yep, drop that if possible
> 
>  
> >  - change the 0x1f handling to always report topology i.e. never report all
> >zeroes
> 
> Do this but only for CPU models that have this leaf per spec,
> to avoid live migration issues create a new version of CPU model,
> so it would apply only for new version. This way older versions
> and migration won't be affected. 

So that in the future every new Intel CPU model will need to always
enable 0x1f. Sounds like an endless game. So my question is: at what
point is it ok to consider defaulting to always enable 0x1f and just
disable it for the old CPU model?

Thanks,
Zhao




Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object

2024-07-24 Thread Zhao Liu
Hi Markus,

I realized I should reply this mail first...

On Wed, Jul 24, 2024 at 01:35:17PM +0200, Markus Armbruster wrote:
> Date: Wed, 24 Jul 2024 13:35:17 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object
> 
> Zhao Liu  writes:
> 
> > Hi Markus,
> >
> > On Mon, Jul 22, 2024 at 03:33:13PM +0200, Markus Armbruster wrote:
> >> Date: Mon, 22 Jul 2024 15:33:13 +0200
> >> From: Markus Armbruster 
> >> Subject: Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object
> >> 
> >> Zhao Liu  writes:
> >> 
> >> > Introduce smp-cache object so that user could define cache properties.
> >> >
> >> > In smp-cache object, define cache topology based on CPU topology level
> >> > with two reasons:
> >> >
> >> > 1. In practice, a cache will always be bound to the CPU container
> >> >(either private in the CPU container or shared among multiple
> >> >containers), and CPU container is often expressed in terms of CPU
> >> >topology level.
> >> > 2. The x86's cache-related CPUIDs encode cache topology based on APIC
> >> >ID's CPU topology layout. And the ACPI PPTT table that ARM/RISCV
> >> >relies on also requires CPU containers to help indicate the private
> >> >shared hierarchy of the cache. Therefore, for SMP systems, it is
> >> >natural to use the CPU topology hierarchy directly in QEMU to define
> >> >the cache topology.
> >> >
> >> > Currently, separated L1 cache (L1 data cache and L1 instruction cache)
> >> > with unified higher-level cache (e.g., unified L2 and L3 caches), is the
> >> > most common cache architectures.
> >> >
> >> > Therefore, enumerate the L1 D-cache, L1 I-cache, L2 cache and L3 cache
> >> > with smp-cache object to add the basic cache topology support.
> >> >
> >> > Suggested-by: Daniel P. Berrange 
> >> > Signed-off-by: Zhao Liu 
> >> 
> >> [...]
> >> 
> >> > diff --git a/qapi/machine-common.json b/qapi/machine-common.json
> >> > index 82413c668bdb..8b8c0e9eeb86 100644
> >> > --- a/qapi/machine-common.json
> >> > +++ b/qapi/machine-common.json
> >> > @@ -64,3 +64,53 @@
> >> >'prefix': 'CPU_TOPO_LEVEL',
> >> >'data': [ 'invalid', 'thread', 'core', 'module', 'cluster',
> >> >  'die', 'socket', 'book', 'drawer', 'default' ] }
> >> > +
> >> > +##
> >> > +# @SMPCacheName:
> >> 
> >> Why the SMP in this name?  Because it's currently only used by SMP
> >> stuff?  Or is there another reason I'm missing?
> >
> > Yes, I suppose it can only be used in SMP case.
> >
> > Because Intel's heterogeneous CPUs have different topologies for cache,
> > for example, Alderlake's L2, for P core, L2 is per P-core, but for E
> > core, L2 is per module (4 E cores per module). Thus I would like to keep
> > the topology semantics of this object and -smp as consistent as possible.
> >
> > Do you agree?
> 
> I don't know enough to meaningfully agree or disagree.  I know just
> enough to annoy you with questions :)

Welcome and no problem!

> This series adds a way to configure caches.
> 
> Structure of the configuration data: a list
> 
> [{"name": N, "topo": T}, ...]
> 
> where N can be "l1d", "l1i", "l2", or "l3",
>   and T can be "invalid", "thread", "core", "module", "cluster",
>"die", "socket", "book", "drawer", or "default".
> 
> What's the use case?  The commit messages don't tell.

i386 has the default cache topology model: l1 per core/l2 per core/l3
per die.

Cache topology affects scheduler performance, e.g., kernel's cluster
scheduling.

Of course I can hardcode some cache topology model in the specific cpu
model that corresponds to the actual hardware, but for -cpu host/max,
the default i386 cache topology model has no flexibility, and the
host-cpu-cache option doesn't have enough fine-grained control over the
cache topology.

So I want to provide a way to allow user create more fleasible cache
topology. Just like cpu topology.

> Why does that use case make no sense without SMP?

As the example I

Re: [PATCH 8/8] qemu-options: Add the description of smp-cache object

2024-07-24 Thread Zhao Liu
Hi Markus and Daniel,

I have the questions about the -object per cache implementation:

On Wed, Jul 24, 2024 at 02:39:29PM +0200, Markus Armbruster wrote:
> Date: Wed, 24 Jul 2024 14:39:29 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
>  object
> 
> Zhao Liu  writes:
> 
> > Hi Markus,
> >
> > On Mon, Jul 22, 2024 at 03:37:43PM +0200, Markus Armbruster wrote:
> >> Date: Mon, 22 Jul 2024 15:37:43 +0200
> >> From: Markus Armbruster 
> >> Subject: Re: [PATCH 8/8] qemu-options: Add the description of smp-cache
> >>  object
> >> 
> >> Zhao Liu  writes:
> >> 
> >> > Signed-off-by: Zhao Liu 
> >> 
> >> This patch is just documentation.  The code got added in some previous
> >> patch.  Would it make sense to squash this patch into that previous
> >> patch?
> >
> > OK, I'll merge them.
> >
> >> > ---
> >> > Changes since RFC v2:
> >> >  * Rewrote the document of smp-cache object.
> >> >
> >> > Changes since RFC v1:
> >> >  * Use "*_cache=topo_level" as -smp example as the original "level"
> >> >term for a cache has a totally different meaning. (Jonathan)
> >> > ---
> >> >  qemu-options.hx | 58 +
> >> >  1 file changed, 58 insertions(+)
> >> >
> >> > diff --git a/qemu-options.hx b/qemu-options.hx
> >> > index 8ca7f34ef0c8..4b84f4508a6e 100644
> >> > --- a/qemu-options.hx
> >> > +++ b/qemu-options.hx
> >> > @@ -159,6 +159,15 @@ SRST
> >> >  ::
> >> >  
> >> >  -machine 
> >> > cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512
> >> > +
> >> > +``smp-cache='id'``
> >> > +Allows to configure cache property (now only the cache topology 
> >> > level).
> >> > +
> >> > +For example:
> >> > +::
> >> > +
> >> > +-object 
> >> > '{"qom-type":"smp-cache","id":"cache","caches":[{"name":"l1d","topo":"core"},{"name":"l1i","topo":"core"},{"name":"l2","topo":"module"},{"name":"l3","topo":"die"}]}'
> >> > +-machine smp-cache=cache
> >> >  ERST
> >> >  
> >> >  DEF("M", HAS_ARG, QEMU_OPTION_M,
> >> > @@ -5871,6 +5880,55 @@ SRST
> >> >  ::
> >> >  
> >> >  (qemu) qom-set /objects/iothread1 poll-max-ns 10
> >> > +
> >> > +``-object 
> >> > '{"qom-type":"smp-cache","id":id,"caches":[{"name":cache_name,"topo":cache_topo}]}'``
> >> > +Create an smp-cache object that configures machine's cache
> >> > +property. Currently, cache property only include cache topology
> >> > +level.
> >> > +
> >> > +This option must be written in JSON format to support JSON list.
> >> 
> >> Why?
> >
> > I'm not familiar with this, so I hope you could educate me if I'm wrong.
> >
> > All I know so far is for -object that defining a list can only be done in
> > JSON format and not with a numeric index like a keyval based option, like:
> >
> > -object smp-cache,id=cache0,caches.0.name=l1i,caches.0.topo=core: Parameter 
> > 'caches' is missing
> >
> > the above doesn't work.
> >
> > Is there any other way to specify a list in command line?
> 
> The command line is a big, sprawling mess :)
> 
> -object supports either a JSON or a QemuOpts argument.  *Not* keyval!
> 
> Both QemuOpts and keyval parse something like KEY=VALUE,...  Keyval
> supports arrays and objects via dotted keys.  QemuOpts doesn't natively
> support arrays and objects, but its users can hack around that
> limitation in various ways.  -object doesn't.  So you're right, it's
> JSON or bust here.
> 
> However, if we used one object per cache instead, we could get something
> like
> 
> -object smp-cache,name=l1d,...
> -object smp-cache,name=l1u,...
> -object smp-cache,name=l2,...
> ...

Current, I use -object to create a smp_cache object, and link it to
MachineState by -machine,smp-cache=obj_id.

Then for the objects per cache, how could I link them to machine?

Is it possible that I create something static in smp_cache.c and expose
all the cache information to machine through some interface?

Additionally, I would like to consider for the long term heterogeneous
cache, as asked before in [1], does the object per cache conflict with
the cache device I'm considering? Considering cache device is further
because I want to create CPU/cache topology via -device and build a
topology tree.

[1]: https://lore.kernel.org/qemu-devel/zl88dywle3scd...@intel.com/

I think this is becoming a nightmare I can't get around. Naming is
difficult, and sorting out interface design I think is also a difficult
task.

If you feel that there is indeed a conflict, then I'm also willing
to fall back to -smp again and do it based on keyval's list, as originally
suggested by Daniel. Sorry for the repetition on thoughts/design, I hope
that discussion with you I can make sense of the current and subsequent
paths without getting out of hand!

Best Regards,
Zhao




Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object

2024-07-24 Thread Zhao Liu
On Wed, Jul 24, 2024 at 01:47:16PM +0100, Daniel P. Berrangé wrote:
> Date: Wed, 24 Jul 2024 13:47:16 +0100
> From: "Daniel P. Berrangé" 
> Subject: Re: [PATCH 2/8] qapi/qom: Introduce smp-cache object
> 
> On Wed, Jul 24, 2024 at 01:35:17PM +0200, Markus Armbruster wrote:
> > Zhao Liu  writes:
> > 
> > > Hi Markus,
> > >> SmpCachesProperties and SmpCacheProperties would put the singular
> > >> vs. plural where it belongs.  Sounds a bit awkward to me, though.
> > >> Naming is hard.
> > >
> > > For SmpCachesProperties, it's easy to overlook the first "s".
> > >
> > >> Other ideas, anybody?
> > >
> > > Maybe SmpCacheOptions or SmpCachesPropertyWrapper?
> > 
> > I wonder why we have a single QOM object to configure all caches, and
> > not one QOM object per cache.
> 
> Previous versions of this series were augmenting the existing
> -smp command line.

Ah, yes, since -smp, as a sugar option of -machine, doesn't support
JSON. In -smp, we need to use keyval's style to configure as:

-smp caches.0.name=l1i,caches.0.topo=core

I think JSON is the more elegant way to go, so I chose -object.

> Now the design has switched to use -object,
> I agree that it'd be simplest to just have one -object flag
> added per cache level we want to defnie.
> 

OK.

Thanks,
Zhao




Re: [PATCH] target/i386: Always set leaf 0x1f

2024-07-23 Thread Zhao Liu
On Wed, Jul 24, 2024 at 08:25:12AM +0800, Xiaoyao Li wrote:
> Date: Wed, 24 Jul 2024 08:25:12 +0800
> From: Xiaoyao Li 
> Subject: Re: [PATCH] target/i386: Always set leaf 0x1f
> 
> On 7/23/2024 10:26 PM, Zhao Liu wrote:
> > (+Xiaoyao, whose TDX work may also be related with this.)
> 
> I have a similar patch for TDX because TDX requires CPUID leaf 0x1f to
> configure topology as a must.
> 
> (I haven't post to QEMU community yet. I'm not sure how people want to
> proceed, refine this patch or I can post my version?)
> 
> https://github.com/intel-staging/qemu-tdx/commit/de08fd30926bc9d7997af6bd12cfff1b998da8b7

Hi Xiaoyao, the logic is similar, if Manish is willing to keep iterating,
we can help him improve to cover all the cases we need, then TDX and his
case could both benefit.

> https://github.com/intel-staging/qemu-tdx/commit/f81d2bcb67e4b01577723cc621099b0c6d558334
> 
> 
> 
> > Hi Manish,
> > 
> > Thanks for your patch! Some comments below.
> > 
> > On Mon, Jul 22, 2024 at 10:18:59AM +, manish.mishra wrote:
> > > Date: Mon, 22 Jul 2024 10:18:59 +
> > > From: "manish.mishra" 
> > > Subject: [PATCH] target/i386: Always set leaf 0x1f
> > > X-Mailer: git-send-email 2.22.3
> > > 
> > > QEMU does not set 0x1f in case VM does not have extended CPU topology
> > > and expects guests to fallback to 0xb. Some versions of windows i.e.
> > > windows 10, 11 does not like this behavior and expects this leaf to be
> > > populated. This is observed with windows VMs with secure boot, uefi
> > > and HyperV role enabled.
> > > 
> > > Leaf 0x1f is superset of 0xb, so it makes sense to set 0x1f equivalent
> > > to 0xb by default and workaround windows issue. This change adds a
> > > new property 'cpuid-0x1f-enforce' to set leaf 0x1f equivalent to 0xb in
> > > case extended CPU topology is not present and behave as before otherwise.
> > > ---
> > >   hw/i386/pc.c  |  1 +
> > >   target/i386/cpu.c | 71 +++
> > >   target/i386/cpu.h |  5 +++
> > >   target/i386/kvm/kvm.c |  4 ++-
> > >   4 files changed, 53 insertions(+), 28 deletions(-)
> > > 
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index c74931d577..4cab04e443 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -85,6 +85,7 @@ GlobalProperty pc_compat_9_0[] = {
> > >   { TYPE_X86_CPU, "guest-phys-bits", "0" },
> > >   { "sev-guest", "legacy-vm-type", "on" },
> > >   { TYPE_X86_CPU, "legacy-multi-node", "on" },
> > > +{ TYPE_X86_CPU, "cpuid-0x1f-enforce", "false" },
> > >   };
> > >   const size_t pc_compat_9_0_len = G_N_ELEMENTS(pc_compat_9_0);
> > 
> > Yes, this is needed, but the 9.1 soft freeze is coming close soon, so
> > you may have to add pc_compat_9_1[] if it doesn't get accepted before
> > the soft freeze.
> > 
> > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > > index 4688d140c2..f89b2ef335 100644
> > > --- a/target/i386/cpu.c
> > > +++ b/target/i386/cpu.c
> > > @@ -416,6 +416,43 @@ static void encode_topo_cpuid1f(CPUX86State *env, 
> > > uint32_t count,
> > >   assert(!(*eax & ~0x1f));
> > >   }
> > > +static void encode_topo_cpuid_b(CPUX86State *env, uint32_t count,
> > > +X86CPUTopoInfo *topo_info,
> > > +uint32_t threads_per_pkg,
> > > +uint32_t *eax, uint32_t *ebx,
> > > +uint32_t *ecx, uint32_t *edx)
> > > +{
> > > +X86CPU *cpu = env_archcpu(env);
> > > +
> > > +if (!cpu->enable_cpuid_0xb) {
> > > +*eax = *ebx = *ecx = *edx = 0;
> > > +return;
> > > +}
> > > +
> > > +*ecx = count & 0xff;
> > > +*edx = cpu->apic_id;
> > > +
> > > +switch (count) {
> > > +case 0:
> > > +*eax = apicid_core_offset(topo_info);
> > > +*ebx = topo_info->threads_per_core;
> > > +*ecx |= CPUID_B_ECX_TOPO_LEVEL_SMT << 8;
> > > +break;
> > > +case 1:
> > > +*eax = apicid_pkg_offset(topo_info);
> > > +*ebx = threads_per_pkg;
> > > +  

[PATCH v2] hw/nubus/nubus-virtio-mmio: Fix missing ERRP_GUARD() in nubus_virtio_mmio_realize()

2024-07-23 Thread Zhao Liu
According to the comment in qapi/error.h, dereferencing @errp requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
* - It must not be dereferenced, because it may be null.
...
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.
*
* Using it when it's not needed is safe, but please avoid cluttering
* the source with useless code.

In nubus_virtio_mmio_realize(), @errp is dereferenced without
ERRP_GUARD().

Although nubus_virtio_mmio_realize() - as a DeviceClass.realize()
method - is never passed a null @errp argument, it should follow the
rules on @errp usage.  Add the ERRP_GUARD() there.

Reviewed-by: Markus Armbruster 
Signed-off-by: Zhao Liu 
---
v2: Used Markus' words in commit message and added his r/b tag.
---
 hw/nubus/nubus-virtio-mmio.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/nubus/nubus-virtio-mmio.c b/hw/nubus/nubus-virtio-mmio.c
index 58a63c84d0be..a5558d3ec28b 100644
--- a/hw/nubus/nubus-virtio-mmio.c
+++ b/hw/nubus/nubus-virtio-mmio.c
@@ -23,6 +23,7 @@ static void nubus_virtio_mmio_set_input_irq(void *opaque, int 
n, int level)
 
 static void nubus_virtio_mmio_realize(DeviceState *dev, Error **errp)
 {
+ERRP_GUARD();
 NubusVirtioMMIODeviceClass *nvmdc = NUBUS_VIRTIO_MMIO_GET_CLASS(dev);
 NubusVirtioMMIO *s = NUBUS_VIRTIO_MMIO(dev);
 NubusDevice *nd = NUBUS_DEVICE(dev);
-- 
2.34.1




Re: [PATCH] hw/nubus/nubus-virtio-mmio: Fix missing ERRP_GUARD() in nubus_virtio_mmio_realize()

2024-07-23 Thread Zhao Liu
Hi Markus,

On Tue, Jul 23, 2024 at 12:21:17PM +0200, Markus Armbruster wrote:
> Date: Tue, 23 Jul 2024 12:21:17 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH] hw/nubus/nubus-virtio-mmio: Fix missing ERRP_GUARD()
>  in nubus_virtio_mmio_realize()
> 
> Zhao Liu  writes:
> 
> > As the comment in qapi/error, dereferencing @errp requires
> 
> Suggest "According to the comment in qapi/error.h".
 
Thanks! Good words.

> > ERRP_GUARD():
> >
> > * = Why, when and how to use ERRP_GUARD() =
> > *
> > * Without ERRP_GUARD(), use of the @errp parameter is restricted:
> > * - It must not be dereferenced, because it may be null.
> > ...
> > * ERRP_GUARD() lifts these restrictions.
> > *
> > * To use ERRP_GUARD(), add it right at the beginning of the function.
> > * @errp can then be used without worrying about the argument being
> > * NULL or &error_fatal.
> > *
> > * Using it when it's not needed is safe, but please avoid cluttering
> > * the source with useless code.
> >
> > But in nubus_virtio_mmio_realize(), @errp is dereferenced without
> > ERRP_GUARD().
> 
> Suggest to scratch "But".

No problem, will do.

> > Although nubus_virtio_mmio_realize() - as a DeviceClass.realize()
> > method - doesn't get the NULL @errp parameter, it hasn't triggered the
> > bug that dereferencing the NULL @errp. It's still necessary to follow
> > the requirement of @errp, so add missing ERRP_GUARD() in
> > nubus_virtio_mmio_realize().
> 
> Suggest
> 
>   Although nubus_virtio_mmio_realize() - as a DeviceClass.realize()
>   method - is never passed a null @errp argument, it should follow the
>   rules on @errp usage.  Add the ERRP_GUARD() there.

Thanks for the text! It sounds much more authentic!

> > Cc: Laurent Vivier 
> > Cc: Philippe Mathieu-Daudé 
> > Signed-off-by: Zhao Liu 
> > ---
> >  hw/nubus/nubus-virtio-mmio.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/nubus/nubus-virtio-mmio.c b/hw/nubus/nubus-virtio-mmio.c
> > index 58a63c84d0be..a5558d3ec28b 100644
> > --- a/hw/nubus/nubus-virtio-mmio.c
> > +++ b/hw/nubus/nubus-virtio-mmio.c
> > @@ -23,6 +23,7 @@ static void nubus_virtio_mmio_set_input_irq(void *opaque, 
> > int n, int level)
> >  
> >  static void nubus_virtio_mmio_realize(DeviceState *dev, Error **errp)
> >  {
> > +ERRP_GUARD();
> >  NubusVirtioMMIODeviceClass *nvmdc = NUBUS_VIRTIO_MMIO_GET_CLASS(dev);
> >  NubusVirtioMMIO *s = NUBUS_VIRTIO_MMIO(dev);
> >  NubusDevice *nd = NUBUS_DEVICE(dev);
>SysBusDevice *sbd;
>int i, offset;
> 
>nvmdc->parent_realize(dev, errp);
> 
> Here's the dereference:
> 
>if (*errp) {
>return;
>}
> 
> Reviewed-by: Markus Armbruster 
> 

Thanks! Will refresh a v2 soon.

Regards,
Zhao




Re: [PATCH] target/i386: Always set leaf 0x1f

2024-07-23 Thread Zhao Liu
(+Xiaoyao, whose TDX work may also be related with this.)

Hi Manish,

Thanks for your patch! Some comments below.

On Mon, Jul 22, 2024 at 10:18:59AM +, manish.mishra wrote:
> Date: Mon, 22 Jul 2024 10:18:59 +
> From: "manish.mishra" 
> Subject: [PATCH] target/i386: Always set leaf 0x1f
> X-Mailer: git-send-email 2.22.3
> 
> QEMU does not set 0x1f in case VM does not have extended CPU topology
> and expects guests to fallback to 0xb. Some versions of windows i.e.
> windows 10, 11 does not like this behavior and expects this leaf to be
> populated. This is observed with windows VMs with secure boot, uefi
> and HyperV role enabled.
> 
> Leaf 0x1f is superset of 0xb, so it makes sense to set 0x1f equivalent
> to 0xb by default and workaround windows issue. This change adds a
> new property 'cpuid-0x1f-enforce' to set leaf 0x1f equivalent to 0xb in
> case extended CPU topology is not present and behave as before otherwise.
> ---
>  hw/i386/pc.c  |  1 +
>  target/i386/cpu.c | 71 +++
>  target/i386/cpu.h |  5 +++
>  target/i386/kvm/kvm.c |  4 ++-
>  4 files changed, 53 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index c74931d577..4cab04e443 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -85,6 +85,7 @@ GlobalProperty pc_compat_9_0[] = {
>  { TYPE_X86_CPU, "guest-phys-bits", "0" },
>  { "sev-guest", "legacy-vm-type", "on" },
>  { TYPE_X86_CPU, "legacy-multi-node", "on" },
> +{ TYPE_X86_CPU, "cpuid-0x1f-enforce", "false" },
>  };
>  const size_t pc_compat_9_0_len = G_N_ELEMENTS(pc_compat_9_0);

Yes, this is needed, but the 9.1 soft freeze is coming close soon, so
you may have to add pc_compat_9_1[] if it doesn't get accepted before
the soft freeze.

> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 4688d140c2..f89b2ef335 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -416,6 +416,43 @@ static void encode_topo_cpuid1f(CPUX86State *env, 
> uint32_t count,
>  assert(!(*eax & ~0x1f));
>  }
>  
> +static void encode_topo_cpuid_b(CPUX86State *env, uint32_t count,
> +X86CPUTopoInfo *topo_info,
> +uint32_t threads_per_pkg,
> +uint32_t *eax, uint32_t *ebx,
> +uint32_t *ecx, uint32_t *edx)
> +{
> +X86CPU *cpu = env_archcpu(env);
> +
> +if (!cpu->enable_cpuid_0xb) {
> +*eax = *ebx = *ecx = *edx = 0;
> +return;
> +}
> +
> +*ecx = count & 0xff;
> +*edx = cpu->apic_id;
> +
> +switch (count) {
> +case 0:
> +*eax = apicid_core_offset(topo_info);
> +*ebx = topo_info->threads_per_core;
> +*ecx |= CPUID_B_ECX_TOPO_LEVEL_SMT << 8;
> +break;
> +case 1:
> +*eax = apicid_pkg_offset(topo_info);
> +*ebx = threads_per_pkg;
> +*ecx |= CPUID_B_ECX_TOPO_LEVEL_CORE << 8;
> +break;
> +default:
> +*eax = 0;
> +*ebx = 0;
> +*ecx |= CPUID_B_ECX_TOPO_LEVEL_INVALID << 8;
> +}
> +
> +assert(!(*eax & ~0x1f));
> +*ebx &= 0x; /* The count doesn't need to be reliable. */
> +}
> +
>  /* Encode cache info for CPUID[0x8005].ECX or CPUID[0x8005].EDX */
>  static uint32_t encode_cache_cpuid8005(CPUCacheInfo *cache)
>  {
> @@ -6601,33 +6638,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  break;
>  case 0xB:
>  /* Extended Topology Enumeration Leaf */
> -if (!cpu->enable_cpuid_0xb) {
> -*eax = *ebx = *ecx = *edx = 0;
> -break;
> -}
> -
> -*ecx = count & 0xff;
> -*edx = cpu->apic_id;
> -
> -switch (count) {
> -case 0:
> -*eax = apicid_core_offset(&topo_info);
> -*ebx = topo_info.threads_per_core;
> -*ecx |= CPUID_B_ECX_TOPO_LEVEL_SMT << 8;
> -break;
> -case 1:
> -*eax = apicid_pkg_offset(&topo_info);
> -*ebx = threads_per_pkg;
> -*ecx |= CPUID_B_ECX_TOPO_LEVEL_CORE << 8;
> -break;
> -default:
> -*eax = 0;
> -*ebx = 0;
> -*ecx |= CPUID_B_ECX_TOPO_LEVEL_INVALID << 8;
> -}
> -
> -assert(!(*eax & ~0x1f));
> -*ebx &= 0x; /* The count doesn't need to be reliable. */
> +encode_topo_cpuid_b(env, count, &topo_info, threads_per_pkg,
> +eax, ebx, ecx, edx);
>  break;
>  case 0x1C:
>  if (cpu->enable_pmu && (env->features[FEAT_7_0_EDX] & 
> CPUID_7_0_EDX_ARCH_LBR)) {
> @@ -6639,6 +6651,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  /* V2 Extended Topology Enumeration Leaf */
>  if (!x86_has_extended_topo(env->avail_cpu_topo)) {
>  *eax = *ebx = *ecx = *edx = 0;

Re: [PATCH 1/8] hw/core: Make CPU topology enumeration arch-agnostic

2024-07-23 Thread Zhao Liu
On Tue, Jul 23, 2024 at 12:14:30PM +0200, Markus Armbruster wrote:
> Date: Tue, 23 Jul 2024 12:14:30 +0200
> From: Markus Armbruster 
> Subject: Re: [PATCH 1/8] hw/core: Make CPU topology enumeration
>  arch-agnostic
> 
> Zhao Liu  writes:
> 
> > Hi Markus,
> >
> > On Mon, Jul 22, 2024 at 03:24:24PM +0200, Markus Armbruster wrote:
> >> Date: Mon, 22 Jul 2024 15:24:24 +0200
> >> From: Markus Armbruster 
> >> Subject: Re: [PATCH 1/8] hw/core: Make CPU topology enumeration
> >>  arch-agnostic
> >> 
> >> One little thing...
> >> 
> >> Zhao Liu  writes:
> >> 
> >> > Cache topology needs to be defined based on CPU topology levels. Thus,
> >> > define CPU topology enumeration in qapi/machine.json to make it generic
> >> > for all architectures.
> >> >
> >> > To match the general topology naming style, rename CPU_TOPO_LEVEL_SMT
> >> > and CPU_TOPO_LEVEL_PACKAGE to CPU_TOPO_LEVEL_THREAD and
> >> > CPU_TOPO_LEVEL_SOCKET.
> >> >
> >> > Also, enumerate additional topology levels for non-i386 arches, and add
> >> > a CPU_TOPO_LEVEL_DEFAULT to help future smp-cache object de-compatibilize
> >> > arch-specific cache topology settings.
> >> >
> >> > Signed-off-by: Zhao Liu 
> >> 
> >> [...]
> >> 
> >> > diff --git a/qapi/machine-common.json b/qapi/machine-common.json
> >> > index fa6bd71d1280..82413c668bdb 100644
> >> > --- a/qapi/machine-common.json
> >> > +++ b/qapi/machine-common.json
> >> > @@ -5,7 +5,7 @@
> >> >  # See the COPYING file in the top-level directory.
> >> >  
> >> >  ##
> >> > -# = Machines S390 data types
> >> > +# = Common machine types
> >> >  ##
> >> >  
> >> >  ##
> >> > @@ -19,3 +19,48 @@
> >> >  { 'enum': 'CpuS390Entitlement',
> >> >'prefix': 'S390_CPU_ENTITLEMENT',
> >> >'data': [ 'auto', 'low', 'medium', 'high' ] }
> >> > +
> >> > +##
> >> > +# @CpuTopologyLevel:
> >> > +#
> >> > +# An enumeration of CPU topology levels.
> >> > +#
> >> > +# @invalid: Invalid topology level.
> >> > +#
> >> > +# @thread: thread level, which would also be called SMT level or
> >> > +# logical processor level.  The @threads option in
> >> > +# SMPConfiguration is used to configure the topology of this
> >> > +# level.
> >> > +#
> >> > +# @core: core level.  The @cores option in SMPConfiguration is used
> >> > +# to configure the topology of this level.
> >> > +#
> >> > +# @module: module level.  The @modules option in SMPConfiguration is
> >> > +# used to configure the topology of this level.
> >> > +#
> >> > +# @cluster: cluster level.  The @clusters option in SMPConfiguration
> >> > +# is used to configure the topology of this level.
> >> > +#
> >> > +# @die: die level.  The @dies option in SMPConfiguration is used to
> >> > +# configure the topology of this level.
> >> > +#
> >> > +# @socket: socket level, which would also be called package level.
> >> > +# The @sockets option in SMPConfiguration is used to configure
> >> > +# the topology of this level.
> >> > +#
> >> > +# @book: book level.  The @books option in SMPConfiguration is used
> >> > +# to configure the topology of this level.
> >> > +#
> >> > +# @drawer: drawer level.  The @drawers option in SMPConfiguration is
> >> > +# used to configure the topology of this level.
> >> > +#
> >> > +# @default: default level.  Some architectures will have default
> >> > +# topology settings (e.g., cache topology), and this special
> >> > +# level means following the architecture-specific settings.
> >> > +#
> >> > +# Since: 9.1
> >> > +##
> >> > +{ 'enum': 'CpuTopologyLevel',
> >> > +  'prefix': 'CPU_TOPO_LEVEL',
> >> 
> >> Why set a 'prefix'?
> >> 
> >
> > Because my previous i386 commit 6ddeb0ec8c29 ("i386/cpu: Introduce bitmap
> > to cache available CPU topology levels") introduced the level
> > enumeration with such prefix. For naming consistency, and to shorten the
> > length of the name, I've used the same prefix here as well.
> >
> > I've sensed that you don't like the TOPO abbreviation and I'll remove the
> > prefix :-).
> 
> Consistency is good, but I'd rather achieve it by consistently using
> "topology".
> 
> I never liked the 'prefix' feature much.  We have it because the mapping
> from camel case to upper case with underscores is heuristical, and can
> result in something undesirable.  See commit 351d36e454c (qapi: allow
> override of default enum prefix naming).  Using it just to shorten
> generated identifiers is a bad idea.

Thanks for your clarification! I see, I will drop the prefix.

Regards,
Zhao





Re: [RFC PATCH v5 5/8] .gitattributes: add Rust diff and merge attributes

2024-07-23 Thread Zhao Liu
On Mon, Jul 22, 2024 at 02:43:35PM +0300, Manos Pitsidianakis wrote:
> Date: Mon, 22 Jul 2024 14:43:35 +0300
> From: Manos Pitsidianakis 
> Subject: [RFC PATCH v5 5/8] .gitattributes: add Rust diff and merge
>  attributes
> X-Mailer: git-send-email 2.44.0
> 
> Set rust source code to diff=rust (built-in with new git versions)
> and merge=binary for Cargo.lock files (they should not be merged but
> auto-generated by cargo)
> 
> Reviewed-by: Alex Bennée 
> Signed-off-by: Manos Pitsidianakis 
> ---
>  .gitattributes | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Zhao Liu 




Re: [RFC PATCH v5 8/8] rust/pl011: vendor dependencies

2024-07-23 Thread Zhao Liu
Hi Manos,

(This patch contains too many codes so that mail list rejects to display
it at https://lore.kernel.org/qemu-devel)

Please correct me if I'm wrong...

Is the reason for not using git submodules here because v5 abandoned
compilation through Cargo, so it’s necessary to add meson.build to the
code repository of each dependency, and consequently, all code must be
loaded into QEMU?

It looks like will be difficult to synchronize the dependency changes
in the future.

Best Regards,
Zhao

On Mon, Jul 22, 2024 at 02:43:38PM +0300, Manos Pitsidianakis wrote:
> Date: Mon, 22 Jul 2024 14:43:38 +0300
> From: Manos Pitsidianakis 
> Subject: [RFC PATCH v5 8/8] rust/pl011: vendor dependencies
> X-Mailer: git-send-email 2.44.0
> 
> Signed-off-by: Manos Pitsidianakis 
> ---
>  rust/hw/char/pl011/vendor/either/README.rst   |  185 +
>  .../vendor/arbitrary-int/.cargo-checksum.json |1 +
>  .../pl011/vendor/arbitrary-int/CHANGELOG.md   |   47 +
>  .../pl011/vendor/arbitrary-int/Cargo.toml |   54 +
>  .../pl011/vendor/arbitrary-int/LICENSE.txt|   21 +
>  .../char/pl011/vendor/arbitrary-int/README.md |   72 +
>  .../pl011/vendor/arbitrary-int/meson.build|   14 +
>  .../pl011/vendor/arbitrary-int/src/lib.rs | 1489 +
>  .../pl011/vendor/arbitrary-int/tests/tests.rs | 1913 ++
>  .../vendor/bilge-impl/.cargo-checksum.json|1 +
>  .../char/pl011/vendor/bilge-impl/Cargo.toml   |   54 +
>  .../hw/char/pl011/vendor/bilge-impl/README.md |  327 ++
>  .../char/pl011/vendor/bilge-impl/meson.build  |   24 +
>  .../pl011/vendor/bilge-impl/src/bitsize.rs|  187 +
>  .../vendor/bilge-impl/src/bitsize/split.rs|  185 +
>  .../vendor/bilge-impl/src/bitsize_internal.rs |  235 +
>  .../src/bitsize_internal/struct_gen.rs|  402 ++
>  .../pl011/vendor/bilge-impl/src/debug_bits.rs |   55 +
>  .../vendor/bilge-impl/src/default_bits.rs |   92 +
>  .../pl011/vendor/bilge-impl/src/fmt_bits.rs   |  112 +
>  .../pl011/vendor/bilge-impl/src/from_bits.rs  |  222 +
>  .../char/pl011/vendor/bilge-impl/src/lib.rs   |   79 +
>  .../pl011/vendor/bilge-impl/src/shared.rs |  196 +
>  .../src/shared/discriminant_assigner.rs   |   56 +
>  .../vendor/bilge-impl/src/shared/fallback.rs  |   92 +
>  .../vendor/bilge-impl/src/shared/util.rs  |   91 +
>  .../vendor/bilge-impl/src/try_from_bits.rs|  143 +
>  .../pl011/vendor/bilge/.cargo-checksum.json   |1 +
>  rust/hw/char/pl011/vendor/bilge/Cargo.toml|   69 +
>  .../hw/char/pl011/vendor/bilge/LICENSE-APACHE |  176 +
>  rust/hw/char/pl011/vendor/bilge/LICENSE-MIT   |   17 +
>  rust/hw/char/pl011/vendor/bilge/README.md |  327 ++
>  rust/hw/char/pl011/vendor/bilge/meson.build   |   17 +
>  rust/hw/char/pl011/vendor/bilge/src/lib.rs|   80 +
>  .../pl011/vendor/either/.cargo-checksum.json  |1 +
>  rust/hw/char/pl011/vendor/either/Cargo.toml   |   54 +
>  .../char/pl011/vendor/either/LICENSE-APACHE   |  201 +
>  rust/hw/char/pl011/vendor/either/LICENSE-MIT  |   25 +
>  .../pl011/vendor/either/README-crates.io.md   |   10 +
>  rust/hw/char/pl011/vendor/either/meson.build  |   16 +
>  .../pl011/vendor/either/src/into_either.rs|   64 +
>  .../char/pl011/vendor/either/src/iterator.rs  |  315 +
>  rust/hw/char/pl011/vendor/either/src/lib.rs   | 1519 +
>  .../pl011/vendor/either/src/serde_untagged.rs |   69 +
>  .../either/src/serde_untagged_optional.rs |   74 +
>  .../vendor/itertools/.cargo-checksum.json |1 +
>  .../char/pl011/vendor/itertools/CHANGELOG.md  |  409 ++
>  .../hw/char/pl011/vendor/itertools/Cargo.lock |  681 +++
>  .../hw/char/pl011/vendor/itertools/Cargo.toml |  101 +
>  .../pl011/vendor/itertools/LICENSE-APACHE |  201 +
>  .../char/pl011/vendor/itertools/LICENSE-MIT   |   25 +
>  rust/hw/char/pl011/vendor/itertools/README.md |   44 +
>  .../pl011/vendor/itertools/benches/bench1.rs  |  877 +++
>  .../vendor/itertools/benches/combinations.rs  |  125 +
>  .../benches/combinations_with_replacement.rs  |   40 +
>  .../vendor/itertools/benches/extra/mod.rs |2 +
>  .../itertools/benches/extra/zipslices.rs  |  188 +
>  .../itertools/benches/fold_specialization.rs  |   73 +
>  .../vendor/itertools/benches/powerset.rs  |   44 +
>  .../vendor/itertools/benches/tree_fold1.rs|  144 +
>  .../itertools/benches/tuple_combinations.rs   |  113 +
>  .../pl011/vendor/itertools/benches/tuples.rs  |  213 +
>  .../pl011/vendor/itertools/examples/iris.data |  150 +
>  .../pl011/vendor/itertools/examples/iris.rs   |  137 +
>  .../char/pl011/vendor/itertools/meson.build   |   18 +
>  .../vendor/itertools/src/adaptors/coalesce.rs |  235 +
>  .../vendor/itertools/src/adaptors/map.rs  |  124 +
>  .../vendor/itertools/src/adaptors/mod.rs  | 1151 
>  .../itertools/src/adaptors/multi_product.rs   |  230 +
>  .../vendor/itertools/src/combinations.rs  |  128 +
>  .../src/combinations_with_replacement.rs  |  109 +
>  .../pl011/vendor/itertools/src/concat_impl.rs |   23 +
>  .../vendor/iterto

  1   2   3   4   5   6   7   8   9   10   >