date:20240528

Re: [PATCH] qemu/bitops.h: Locate changed bits

2024-05-28 Thread Wang, Lei

On 5/29/2024 12:59, Tong Ho wrote:> Add inlined functions to obtain a mask of
changed bits.  3 flavors
> are added: toggled, changed to 1, changed to 0.
> 
> These newly added utilities aid common device behaviors where
> actions are taken only when a register's bit(s) are changed.
> 
> Signed-off-by: Tong Ho 
> ---
>  include/qemu/bitops.h | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
> index 2c0a2fe751..7a701474ea 100644
> --- a/include/qemu/bitops.h
> +++ b/include/qemu/bitops.h
> @@ -148,6 +148,39 @@ static inline int test_bit(long nr, const unsigned long 
> *addr)
>  return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
>  }
>  
> +/**
> + * find_bits_changed - Returns a mask of bits changed.
> + * @ref_bits: the reference bits against which the test is made.
> + * @chk_bits: the bits to be checked.
> + */
> +static inline unsigned long find_bits_changed(unsigned long ref_bits,
> +  unsigned long chk_bits)
> +{
> +return ref_bits ^ chk_bits;
> +}
> +
> +/**
> + * find_bits_to_1 - Returns a mask of bits changed from 0 to 1.
> + * @ref_bits: the reference bits against which the test is made.
> + * @chk_bits: the bits to be checked.
> + */
> +static inline unsigned long find_bits_to_1(unsigned long ref_bits,
> +   unsigned long chk_bits)
> +{
> +return find_bits_changed(ref_bits, chk_bits) & chk_bits;
> +}
> +
> +/**
> + * find_bits_to_0 - Returns a mask of bits changed from 1 to 0.
> + * @ref_bits: the reference bits against which the test is made.
> + * @chk_bits: the bits to be checked.
> + */
> +static inline unsigned long find_bits_to_0(unsigned long ref_bits,
> +   unsigned long chk_bits)
> +{
> +return find_bits_to_1(chk_bits, ref_bits);
> +}
> +
>  /**
>   * find_last_bit - find the last set bit in a memory region
>   * @addr: The address to start the search at

Reviewed-by: Lei Wang

[PATCH v5 18/23] hw/i386/pc: Remove PCMachineClass::rsdp_in_ram

2024-05-28 Thread Philippe Mathieu-Daudé

PCMachineClass::rsdp_in_ram was only used by the
pc-i440fx-2.2 machine, which got removed. It is
now always true. Remove it, simplifying acpi_setup().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 include/hw/i386/pc.h |  1 -
 hw/i386/acpi-build.c | 35 ---
 hw/i386/pc.c |  1 -
 3 files changed, 4 insertions(+), 33 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 808de4eca7..63568eb9e9 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -100,7 +100,6 @@ struct PCMachineClass {
 
 /* ACPI compat: */
 bool has_acpi_build;
-bool rsdp_in_ram;
 unsigned acpi_data_size;
 int pci_root_uid;
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ab2d4d8dcb..ed0adb0e82 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2495,7 +2495,6 @@ static
 void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 {
 PCMachineState *pcms = PC_MACHINE(machine);
-PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(machine);
 DeviceState *iommu = pcms->iommu;
 GArray *table_offsets;
@@ -2667,16 +2666,6 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 .rsdt_tbl_offset = ,
 };
 build_rsdp(tables->rsdp, tables->linker, _data);
-if (!pcmc->rsdp_in_ram) {
-/* We used to allocate some extra space for RSDP revision 2 but
- * only used the RSDP revision 0 space. The extra bytes were
- * zeroed out and not used.
- * Here we continue wasting those extra 16 bytes to make sure we
- * don't break migration for machine types 2.2 and older due to
- * RSDP blob size mismatch.
- */
-build_append_int_noprefix(tables->rsdp, 0, 16);
-}
 }
 
 /* We'll expose it all to Guest so we want to reduce
@@ -2755,7 +2744,6 @@ static const VMStateDescription vmstate_acpi_build = {
 void acpi_setup(void)
 {
 PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
-PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
 AcpiBuildTables tables;
 AcpiBuildState *build_state;
@@ -2817,25 +2805,10 @@ void acpi_setup(void)
tables.vmgenid);
 }
 
-if (!pcmc->rsdp_in_ram) {
-/*
- * Keep for compatibility with old machine types.
- * Though RSDP is small, its contents isn't immutable, so
- * we'll update it along with the rest of tables on guest access.
- */
-uint32_t rsdp_size = acpi_data_len(tables.rsdp);
-
-build_state->rsdp = g_memdup(tables.rsdp->data, rsdp_size);
-fw_cfg_add_file_callback(x86ms->fw_cfg, ACPI_BUILD_RSDP_FILE,
- acpi_build_update, NULL, build_state,
- build_state->rsdp, rsdp_size, true);
-build_state->rsdp_mr = NULL;
-} else {
-build_state->rsdp = NULL;
-build_state->rsdp_mr = acpi_add_rom_blob(acpi_build_update,
- build_state, tables.rsdp,
- ACPI_BUILD_RSDP_FILE);
-}
+build_state->rsdp = NULL;
+build_state->rsdp_mr = acpi_add_rom_blob(acpi_build_update,
+ build_state, tables.rsdp,
+ ACPI_BUILD_RSDP_FILE);
 
 qemu_register_reset(acpi_build_reset, build_state);
 acpi_build_reset(build_state);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index fae21f75aa..8e51d1f1bb 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1757,7 +1757,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 
 pcmc->pci_enabled = true;
 pcmc->has_acpi_build = true;
-pcmc->rsdp_in_ram = true;
 pcmc->smbios_defaults = true;
 pcmc->gigabyte_align = true;
 pcmc->has_reserved_memory = true;
-- 
2.41.0

[PATCH v5 08/23] hw/i386/pc: Remove deprecated pc-i440fx-2.1 machine

2024-05-28 Thread Philippe Mathieu-Daudé

The pc-i440fx-2.1 machine was deprecated for the 8.2
release (see commit c7437f0ddb "docs/about: Mark the
old pc-i440fx-2.0 - 2.3 machine types as deprecated"),
time to remove it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 docs/about/deprecated.rst   |  2 +-
 docs/about/removed-features.rst |  2 +-
 include/hw/i386/pc.h|  3 ---
 hw/i386/pc.c|  7 ---
 hw/i386/pc_piix.c   | 23 ---
 5 files changed, 2 insertions(+), 35 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 629f6a1566..5b4753e5dc 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -228,7 +228,7 @@ deprecated; use the new name ``dtb-randomness`` instead. 
The new name
 better reflects the way this property affects all random data within
 the device tree blob, not just the ``kaslr-seed`` node.
 
-``pc-i440fx-2.1`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
+``pc-i440fx-2.2`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
 

 
 These old machine types are quite neglected nowadays and thus might have
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 5f0c2d8ec2..9b0e2f11de 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -925,7 +925,7 @@ mips ``fulong2e`` machine alias (removed in 6.0)
 
 This machine has been renamed ``fuloong2e``.
 
-``pc-0.10`` up to ``pc-i440fx-2.0`` (removed in 4.0 up to 9.0)
+``pc-0.10`` up to ``pc-i440fx-2.1`` (removed in 4.0 up to 9.0)
 ''
 
 These machine types were very old and likely could not be used for live
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 01fdcfaeb6..db0f8e0e36 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -286,9 +286,6 @@ extern const size_t pc_compat_2_3_len;
 extern GlobalProperty pc_compat_2_2[];
 extern const size_t pc_compat_2_2_len;
 
-extern GlobalProperty pc_compat_2_1[];
-extern const size_t pc_compat_2_1_len;
-
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 11182e09ce..f27c9fd98c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -312,13 +312,6 @@ GlobalProperty pc_compat_2_2[] = {
 };
 const size_t pc_compat_2_2_len = G_N_ELEMENTS(pc_compat_2_2);
 
-GlobalProperty pc_compat_2_1[] = {
-PC_CPU_MODEL_IDS("2.1.0")
-{ "coreduo" "-" TYPE_X86_CPU, "vmx", "on" },
-{ "core2duo" "-" TYPE_X86_CPU, "vmx", "on" },
-};
-const size_t pc_compat_2_1_len = G_N_ELEMENTS(pc_compat_2_1);
-
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
 GSIState *s;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a750a0e6ab..e0b421dd51 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -66,7 +66,6 @@
 #include "hw/hyperv/vmbus-bridge.h"
 #include "hw/mem/nvdimm.h"
 #include "hw/i386/acpi-build.h"
-#include "kvm/kvm-cpu.h"
 #include "target/i386/cpu.h"
 
 #define XEN_IOAPIC_NUM_PIRQS 128ULL
@@ -435,12 +434,6 @@ static void pc_compat_2_2_fn(MachineState *machine)
 pc_compat_2_3_fn(machine);
 }
 
-static void pc_compat_2_1_fn(MachineState *machine)
-{
-pc_compat_2_2_fn(machine);
-x86_cpu_change_kvm_default("svm", NULL);
-}
-
 #ifdef CONFIG_ISAPC
 static void pc_init_isa(MachineState *machine)
 {
@@ -866,22 +859,6 @@ static void pc_i440fx_2_2_machine_options(MachineClass *m)
 DEFINE_I440FX_MACHINE(v2_2, "pc-i440fx-2.2", pc_compat_2_2_fn,
   pc_i440fx_2_2_machine_options);
 
-static void pc_i440fx_2_1_machine_options(MachineClass *m)
-{
-PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
-
-pc_i440fx_2_2_machine_options(m);
-m->hw_version = "2.1.0";
-m->default_display = NULL;
-compat_props_add(m->compat_props, hw_compat_2_1, hw_compat_2_1_len);
-compat_props_add(m->compat_props, pc_compat_2_1, pc_compat_2_1_len);
-pcmc->smbios_uuid_encoded = false;
-pcmc->enforce_aligned_dimm = false;
-}
-
-DEFINE_I440FX_MACHINE(v2_1, "pc-i440fx-2.1", pc_compat_2_1_fn,
-  pc_i440fx_2_1_machine_options);
-
 #ifdef CONFIG_ISAPC
 static void isapc_machine_options(MachineClass *m)
 {
-- 
2.41.0

[PATCH v5 03/23] hw/usb/hcd-xhci: Remove XHCI_FLAG_FORCE_PCIE_ENDCAP flag

2024-05-28 Thread Philippe Mathieu-Daudé

XHCI_FLAG_FORCE_PCIE_ENDCAP was only used by the
pc-i440fx-2.0 machine, which got removed. Remove it
and simplify usb_xhci_pci_realize().

Reviewed-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 hw/usb/hcd-xhci.h | 1 -
 hw/usb/hcd-xhci-nec.c | 2 --
 hw/usb/hcd-xhci-pci.c | 3 +--
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
index 98f598382a..1efa4858fb 100644
--- a/hw/usb/hcd-xhci.h
+++ b/hw/usb/hcd-xhci.h
@@ -37,7 +37,6 @@ typedef struct XHCIEPContext XHCIEPContext;
 
 enum xhci_flags {
 XHCI_FLAG_SS_FIRST = 1,
-XHCI_FLAG_FORCE_PCIE_ENDCAP,
 XHCI_FLAG_ENABLE_STREAMS,
 };
 
diff --git a/hw/usb/hcd-xhci-nec.c b/hw/usb/hcd-xhci-nec.c
index 328e5bfe7c..5d5b069cf9 100644
--- a/hw/usb/hcd-xhci-nec.c
+++ b/hw/usb/hcd-xhci-nec.c
@@ -43,8 +43,6 @@ static Property nec_xhci_properties[] = {
 DEFINE_PROP_ON_OFF_AUTO("msix", XHCIPciState, msix, ON_OFF_AUTO_AUTO),
 DEFINE_PROP_BIT("superspeed-ports-first", XHCINecState, flags,
 XHCI_FLAG_SS_FIRST, true),
-DEFINE_PROP_BIT("force-pcie-endcap", XHCINecState, flags,
-XHCI_FLAG_FORCE_PCIE_ENDCAP, false),
 DEFINE_PROP_UINT32("intrs", XHCINecState, intrs, XHCI_MAXINTRS),
 DEFINE_PROP_UINT32("slots", XHCINecState, slots, XHCI_MAXSLOTS),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
index 4423983308..cbad96f393 100644
--- a/hw/usb/hcd-xhci-pci.c
+++ b/hw/usb/hcd-xhci-pci.c
@@ -148,8 +148,7 @@ static void usb_xhci_pci_realize(struct PCIDevice *dev, 
Error **errp)
  PCI_BASE_ADDRESS_MEM_TYPE_64,
  >xhci.mem);
 
-if (pci_bus_is_express(pci_get_bus(dev)) ||
-xhci_get_flag(>xhci, XHCI_FLAG_FORCE_PCIE_ENDCAP)) {
+if (pci_bus_is_express(pci_get_bus(dev))) {
 ret = pcie_endpoint_cap_init(dev, 0xa0);
 assert(ret > 0);
 }
-- 
2.41.0

[PATCH v5 17/23] hw/i386/pc: Remove PCMachineClass::resizable_acpi_blob

2024-05-28 Thread Philippe Mathieu-Daudé

PCMachineClass::resizable_acpi_blob was only used by the
pc-i440fx-2.2 machine, which got removed. It is now always
true. Remove it, simplifying acpi_build().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 include/hw/i386/pc.h |  3 ---
 hw/i386/acpi-build.c | 10 --
 hw/i386/pc.c |  1 -
 3 files changed, 14 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 996495985e..808de4eca7 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -125,9 +125,6 @@ struct PCMachineClass {
 /* create kvmclock device even when KVM PV features are not exposed */
 bool kvmclock_create_always;
 
-/* resizable acpi blob compat */
-bool resizable_acpi_blob;
-
 /*
  * whether the machine type implements broken 32-bit address space bound
  * check for memory.
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a6f8203460..ab2d4d8dcb 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2688,16 +2688,6 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
  * keep the table size stable for all (max_cpus, max_memory_slots)
  * combinations.
  */
-/* Make sure we have a buffer in case we need to resize the tables. */
-if ((tables_blob->len > ACPI_BUILD_TABLE_SIZE / 2) &&
-!pcmc->resizable_acpi_blob) {
-/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
-warn_report("ACPI table size %u exceeds %d bytes,"
-" migration may not work",
-tables_blob->len, ACPI_BUILD_TABLE_SIZE / 2);
-error_printf("Try removing CPUs, NUMA nodes, memory slots"
- " or PCI bridges.\n");
-}
 acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE);
 
 acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ccfcb92605..fae21f75aa 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1768,7 +1768,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->acpi_data_size = 0x2 + 0x8000;
 pcmc->pvh_enabled = true;
 pcmc->kvmclock_create_always = true;
-pcmc->resizable_acpi_blob = true;
 x86mc->apic_xrupt_override = true;
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = pc_get_hotplug_handler;
-- 
2.41.0

[PATCH v5 19/23] hw/i386/acpi: Remove AcpiBuildState::rsdp field

2024-05-28 Thread Philippe Mathieu-Daudé

AcpiBuildState::rsdp is always NULL, remove it,
simplifying acpi_build_update().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 hw/i386/acpi-build.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index ed0adb0e82..6f9925d176 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2459,7 +2459,6 @@ struct AcpiBuildState {
 MemoryRegion *table_mr;
 /* Is table patched? */
 uint8_t patched;
-void *rsdp;
 MemoryRegion *rsdp_mr;
 MemoryRegion *linker_mr;
 } AcpiBuildState;
@@ -2715,11 +2714,7 @@ static void acpi_build_update(void *build_opaque)
 
 acpi_ram_update(build_state->table_mr, tables.table_data);
 
-if (build_state->rsdp) {
-memcpy(build_state->rsdp, tables.rsdp->data, 
acpi_data_len(tables.rsdp));
-} else {
-acpi_ram_update(build_state->rsdp_mr, tables.rsdp);
-}
+acpi_ram_update(build_state->rsdp_mr, tables.rsdp);
 
 acpi_ram_update(build_state->linker_mr, tables.linker->cmd_blob);
 acpi_build_tables_cleanup(, true);
@@ -2805,7 +2800,6 @@ void acpi_setup(void)
tables.vmgenid);
 }
 
-build_state->rsdp = NULL;
 build_state->rsdp_mr = acpi_add_rom_blob(acpi_build_update,
  build_state, tables.rsdp,
  ACPI_BUILD_RSDP_FILE);
-- 
2.41.0

[PATCH v5 14/23] hw/mem/pc-dimm: Remove legacy_align argument from pc_dimm_pre_plug()

2024-05-28 Thread Philippe Mathieu-Daudé

'legacy_align' is always NULL, remove it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: David Hildenbrand 
Reviewed-by: Zhao Liu 
---
 include/hw/mem/pc-dimm.h | 3 +--
 hw/arm/virt.c| 2 +-
 hw/i386/pc.c | 2 +-
 hw/loongarch/virt.c  | 2 +-
 hw/mem/pc-dimm.c | 6 ++
 hw/ppc/spapr.c   | 2 +-
 6 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 322bebe555..fe0f3ea963 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -66,8 +66,7 @@ struct PCDIMMDeviceClass {
 void (*unrealize)(PCDIMMDevice *dimm);
 };
 
-void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState *machine,
-  const uint64_t *legacy_align, Error **errp);
+void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState *machine, Error **errp);
 void pc_dimm_plug(PCDIMMDevice *dimm, MachineState *machine);
 void pc_dimm_unplug(PCDIMMDevice *dimm, MachineState *machine);
 #endif
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 268b25e332..c7a1f754e7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2763,7 +2763,7 @@ static void virt_memory_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 return;
 }
 
-pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
+pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), errp);
 }
 
 static void virt_memory_plug(HotplugHandler *hotplug_dev,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 9cb5083f8f..08d38a1dcc 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1321,7 +1321,7 @@ static void pc_memory_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 return;
 }
 
-pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
+pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), errp);
 }
 
 static void pc_memory_plug(HotplugHandler *hotplug_dev,
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 6a12659583..c8f16d9d6c 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -1133,7 +1133,7 @@ static bool memhp_type_supported(DeviceState *dev)
 static void virt_mem_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
  Error **errp)
 {
-pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
+pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), errp);
 }
 
 static void virt_device_pre_plug(HotplugHandler *hotplug_dev,
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 37f1f4ccfd..836384a90f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -44,8 +44,7 @@ static MemoryRegion *pc_dimm_get_memory_region(PCDIMMDevice 
*dimm, Error **errp)
 return host_memory_backend_get_memory(dimm->hostmem);
 }
 
-void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState *machine,
-  const uint64_t *legacy_align, Error **errp)
+void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState *machine, Error **errp)
 {
 Error *local_err = NULL;
 int slot;
@@ -70,8 +69,7 @@ void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState 
*machine,
 _abort);
 trace_mhp_pc_dimm_assigned_slot(slot);
 
-memory_device_pre_plug(MEMORY_DEVICE(dimm), machine, legacy_align,
-   errp);
+memory_device_pre_plug(MEMORY_DEVICE(dimm), machine, NULL, errp);
 }
 
 void pc_dimm_plug(PCDIMMDevice *dimm, MachineState *machine)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4345764bce..e7a5b7ce8c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3700,7 +3700,7 @@ static void spapr_memory_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 return;
 }
 
-pc_dimm_pre_plug(dimm, MACHINE(hotplug_dev), NULL, errp);
+pc_dimm_pre_plug(dimm, MACHINE(hotplug_dev), errp);
 }
 
 struct SpaprDimmState {
-- 
2.41.0

[PATCH v5 06/23] hw/acpi/ich9: Remove 'memory-hotplug-support' property

2024-05-28 Thread Philippe Mathieu-Daudé

No external code sets the 'memory-hotplug-support'
property, remove it.

Suggested-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 hw/acpi/ich9.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 573d032e8e..9b605af21a 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -351,21 +351,6 @@ static void ich9_pm_get_gpe0_blk(Object *obj, Visitor *v, 
const char *name,
 visit_type_uint32(v, name, , errp);
 }
 
-static bool ich9_pm_get_memory_hotplug_support(Object *obj, Error **errp)
-{
-ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
-
-return s->pm.acpi_memory_hotplug.is_enabled;
-}
-
-static void ich9_pm_set_memory_hotplug_support(Object *obj, bool value,
-   Error **errp)
-{
-ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
-
-s->pm.acpi_memory_hotplug.is_enabled = value;
-}
-
 static bool ich9_pm_get_cpu_hotplug_legacy(Object *obj, Error **errp)
 {
 ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
@@ -445,9 +430,6 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm)
 NULL, NULL, pm);
 object_property_add_uint32_ptr(obj, ACPI_PM_PROP_GPE0_BLK_LEN,
_len, OBJ_PROP_FLAG_READ);
-object_property_add_bool(obj, "memory-hotplug-support",
- ich9_pm_get_memory_hotplug_support,
- ich9_pm_set_memory_hotplug_support);
 object_property_add_bool(obj, "cpu-hotplug-legacy",
  ich9_pm_get_cpu_hotplug_legacy,
  ich9_pm_set_cpu_hotplug_legacy);
-- 
2.41.0

[PATCH v5 12/23] hw/smbios: Remove 'smbios_uuid_encoded', simplify smbios_encode_uuid()

2024-05-28 Thread Philippe Mathieu-Daudé

'smbios_encode_uuid' is always true, remove it,
simplifying smbios_encode_uuid().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 hw/smbios/smbios.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 8261eb716f..3b7703489d 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -30,7 +30,6 @@
 #include "hw/pci/pci_device.h"
 #include "smbios_build.h"
 
-static const bool smbios_uuid_encoded = true;
 /*
  * SMBIOS tables provided by user with '-smbios file=' option
  */
@@ -600,11 +599,9 @@ static void smbios_build_type_0_table(void)
 static void smbios_encode_uuid(struct smbios_uuid *uuid, QemuUUID *in)
 {
 memcpy(uuid, in, 16);
-if (smbios_uuid_encoded) {
-uuid->time_low = bswap32(uuid->time_low);
-uuid->time_mid = bswap16(uuid->time_mid);
-uuid->time_hi_and_version = bswap16(uuid->time_hi_and_version);
-}
+uuid->time_low = bswap32(uuid->time_low);
+uuid->time_mid = bswap16(uuid->time_mid);
+uuid->time_hi_and_version = bswap16(uuid->time_hi_and_version);
 }
 
 static void smbios_build_type_1_table(void)
-- 
2.41.0

[PATCH v5 20/23] hw/i386/pc: Remove deprecated pc-i440fx-2.3 machine

2024-05-28 Thread Philippe Mathieu-Daudé

The pc-i440fx-2.3 machine was deprecated for the 8.2
release (see commit c7437f0ddb "docs/about: Mark the
old pc-i440fx-2.0 - 2.3 machine types as deprecated"),
time to remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/about/deprecated.rst   |  4 ++--
 docs/about/removed-features.rst |  2 +-
 hw/i386/pc.c| 25 -
 hw/i386/pc_piix.c   | 19 ---
 4 files changed, 3 insertions(+), 47 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 0fa45aba8b..3d004a0818 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -228,8 +228,8 @@ deprecated; use the new name ``dtb-randomness`` instead. 
The new name
 better reflects the way this property affects all random data within
 the device tree blob, not just the ``kaslr-seed`` node.
 
-``pc-i440fx-2.3`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
-
+``pc-i440fx-2.4`` up to ``pc-i440fx-2.12`` (since 9.1)
+''
 
 These old machine types are quite neglected nowadays and thus might have
 various pitfalls with regards to live migration. Use a newer machine type
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 5d7bb4354b..2cbbd03cfd 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -925,7 +925,7 @@ mips ``fulong2e`` machine alias (removed in 6.0)
 
 This machine has been renamed ``fuloong2e``.
 
-``pc-0.10`` up to ``pc-i440fx-2.2`` (removed in 4.0 up to 9.0)
+``pc-0.10`` up to ``pc-i440fx-2.3`` (removed in 4.0 up to 9.0)
 ''
 
 These machine types were very old and likely could not be used for live
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 8e51d1f1bb..b84c8ddba0 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -264,31 +264,6 @@ GlobalProperty pc_compat_2_4[] = {
 };
 const size_t pc_compat_2_4_len = G_N_ELEMENTS(pc_compat_2_4);
 
-GlobalProperty pc_compat_2_3[] = {
-PC_CPU_MODEL_IDS("2.3.0")
-{ TYPE_X86_CPU, "arat", "off" },
-{ "qemu64" "-" TYPE_X86_CPU, "min-level", "4" },
-{ "kvm64" "-" TYPE_X86_CPU, "min-level", "5" },
-{ "pentium3" "-" TYPE_X86_CPU, "min-level", "2" },
-{ "n270" "-" TYPE_X86_CPU, "min-level", "5" },
-{ "Conroe" "-" TYPE_X86_CPU, "min-level", "4" },
-{ "Penryn" "-" TYPE_X86_CPU, "min-level", "4" },
-{ "Nehalem" "-" TYPE_X86_CPU, "min-level", "4" },
-{ "n270" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Penryn" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Conroe" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Nehalem" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Westmere" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "SandyBridge" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "IvyBridge" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Haswell" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Haswell-noTSX" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Broadwell" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ "Broadwell-noTSX" "-" TYPE_X86_CPU, "min-xlevel", "0x800a" },
-{ TYPE_X86_CPU, "kvm-no-smi-migration", "on" },
-};
-const size_t pc_compat_2_3_len = G_N_ELEMENTS(pc_compat_2_3);
-
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
 GSIState *s;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 1343fd93e7..217c749705 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -421,14 +421,6 @@ static void pc_set_south_bridge(Object *obj, int value, 
Error **errp)
  * hw_compat_*, pc_compat_*, or * pc_*_machine_options().
  */
 
-static void pc_compat_2_3_fn(MachineState *machine)
-{
-X86MachineState *x86ms = X86_MACHINE(machine);
-if (kvm_enabled()) {
-x86ms->smm = ON_OFF_AUTO_OFF;
-}
-}
-
 #ifdef CONFIG_ISAPC
 static void pc_init_isa(MachineState *machine)
 {
@@ -827,17 +819,6 @@ static void pc_i440fx_2_4_machine_options(MachineClass *m)
 DEFINE_I440FX_MACHINE(v2_4, "pc-i440fx-2.4", NULL,
   pc_i440fx_2_4_machine_options)
 
-static void pc_i440fx_2_3_machine_options(MachineClass *m)
-{
-pc_i440fx_2_4_machine_options(m);
-m->hw_version = "2.3.0";
-compat_props_add(m->compat_props, hw_compat_2_3, hw_compat_2_3_len);
-compat_props_add(m->compat_props, pc_compat_2_3, pc_compat_2_3_len);
-}
-
-DEFINE_I440FX_MACHINE(v2_3, "pc-i440fx-2.3", pc_compat_2_3_fn,
-  pc_i440fx_2_3_machine_options);
-
 #ifdef CONFIG_ISAPC
 static void isapc_machine_options(MachineClass *m)
 {
-- 
2.41.0

[PATCH v5 21/23] hw/i386/pc: Simplify DEFINE_I440FX_MACHINE() macro

2024-05-28 Thread Philippe Mathieu-Daudé

Last commit removed the last non-NULL use of DEFINE_I440FX_MACHINE
3rd parameter. 'compatfn' is now obsolete, remove it.

Suggested-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/i386/pc_piix.c | 62 ++-
 1 file changed, 29 insertions(+), 33 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 217c749705..e7f51a5f2c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -452,13 +452,9 @@ static void pc_xen_hvm_init(MachineState *machine)
 }
 #endif
 
-#define DEFINE_I440FX_MACHINE(suffix, name, compatfn, optionfn) \
+#define DEFINE_I440FX_MACHINE(suffix, name, optionfn) \
 static void pc_init_##suffix(MachineState *machine) \
 { \
-void (*compat)(MachineState *m) = (compatfn); \
-if (compat) { \
-compat(machine); \
-} \
 pc_init1(machine, TYPE_I440FX_PCI_DEVICE); \
 } \
 DEFINE_PC_MACHINE(suffix, name, pc_init_##suffix, optionfn)
@@ -496,7 +492,7 @@ static void pc_i440fx_9_1_machine_options(MachineClass *m)
 m->is_default = true;
 }
 
-DEFINE_I440FX_MACHINE(v9_1, "pc-i440fx-9.1", NULL,
+DEFINE_I440FX_MACHINE(v9_1, "pc-i440fx-9.1",
   pc_i440fx_9_1_machine_options);
 
 static void pc_i440fx_9_0_machine_options(MachineClass *m)
@@ -512,7 +508,7 @@ static void pc_i440fx_9_0_machine_options(MachineClass *m)
 pcmc->isa_bios_alias = false;
 }
 
-DEFINE_I440FX_MACHINE(v9_0, "pc-i440fx-9.0", NULL,
+DEFINE_I440FX_MACHINE(v9_0, "pc-i440fx-9.0",
   pc_i440fx_9_0_machine_options);
 
 static void pc_i440fx_8_2_machine_options(MachineClass *m)
@@ -527,7 +523,7 @@ static void pc_i440fx_8_2_machine_options(MachineClass *m)
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_64;
 }
 
-DEFINE_I440FX_MACHINE(v8_2, "pc-i440fx-8.2", NULL,
+DEFINE_I440FX_MACHINE(v8_2, "pc-i440fx-8.2",
   pc_i440fx_8_2_machine_options);
 
 static void pc_i440fx_8_1_machine_options(MachineClass *m)
@@ -541,7 +537,7 @@ static void pc_i440fx_8_1_machine_options(MachineClass *m)
 compat_props_add(m->compat_props, pc_compat_8_1, pc_compat_8_1_len);
 }
 
-DEFINE_I440FX_MACHINE(v8_1, "pc-i440fx-8.1", NULL,
+DEFINE_I440FX_MACHINE(v8_1, "pc-i440fx-8.1",
   pc_i440fx_8_1_machine_options);
 
 static void pc_i440fx_8_0_machine_options(MachineClass *m)
@@ -556,7 +552,7 @@ static void pc_i440fx_8_0_machine_options(MachineClass *m)
 pcmc->default_smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
 }
 
-DEFINE_I440FX_MACHINE(v8_0, "pc-i440fx-8.0", NULL,
+DEFINE_I440FX_MACHINE(v8_0, "pc-i440fx-8.0",
   pc_i440fx_8_0_machine_options);
 
 static void pc_i440fx_7_2_machine_options(MachineClass *m)
@@ -566,7 +562,7 @@ static void pc_i440fx_7_2_machine_options(MachineClass *m)
 compat_props_add(m->compat_props, pc_compat_7_2, pc_compat_7_2_len);
 }
 
-DEFINE_I440FX_MACHINE(v7_2, "pc-i440fx-7.2", NULL,
+DEFINE_I440FX_MACHINE(v7_2, "pc-i440fx-7.2",
   pc_i440fx_7_2_machine_options);
 
 static void pc_i440fx_7_1_machine_options(MachineClass *m)
@@ -576,7 +572,7 @@ static void pc_i440fx_7_1_machine_options(MachineClass *m)
 compat_props_add(m->compat_props, pc_compat_7_1, pc_compat_7_1_len);
 }
 
-DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1", NULL,
+DEFINE_I440FX_MACHINE(v7_1, "pc-i440fx-7.1",
   pc_i440fx_7_1_machine_options);
 
 static void pc_i440fx_7_0_machine_options(MachineClass *m)
@@ -588,7 +584,7 @@ static void pc_i440fx_7_0_machine_options(MachineClass *m)
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
 
-DEFINE_I440FX_MACHINE(v7_0, "pc-i440fx-7.0", NULL,
+DEFINE_I440FX_MACHINE(v7_0, "pc-i440fx-7.0",
   pc_i440fx_7_0_machine_options);
 
 static void pc_i440fx_6_2_machine_options(MachineClass *m)
@@ -598,7 +594,7 @@ static void pc_i440fx_6_2_machine_options(MachineClass *m)
 compat_props_add(m->compat_props, pc_compat_6_2, pc_compat_6_2_len);
 }
 
-DEFINE_I440FX_MACHINE(v6_2, "pc-i440fx-6.2", NULL,
+DEFINE_I440FX_MACHINE(v6_2, "pc-i440fx-6.2",
   pc_i440fx_6_2_machine_options);
 
 static void pc_i440fx_6_1_machine_options(MachineClass *m)
@@ -609,7 +605,7 @@ static void pc_i440fx_6_1_machine_options(MachineClass *m)
 m->smp_props.prefer_sockets = true;
 }
 
-DEFINE_I440FX_MACHINE(v6_1, "pc-i440fx-6.1", NULL,
+DEFINE_I440FX_MACHINE(v6_1, "pc-i440fx-6.1",
   pc_i440fx_6_1_machine_options);
 
 static void pc_i440fx_6_0_machine_options(MachineClass *m)
@@ -619,7 +615,7 @@ static void pc_i440fx_6_0_machine_options(MachineClass *m)
 compat_props_add(m->compat_props, pc_compat_6_0, pc_compat_6_0_len);
 }
 
-DEFINE_I440FX_MACHINE(v6_0, "pc-i440fx-6.0", NULL,
+DEFINE_I440FX_MACHINE(v6_0, "pc-i440fx-6.0",
   pc_i440fx_6_0_machine_options);
 
 static void pc_i440fx_5_2_machine_options(MachineClass *m)
@@ -629,7 +625,7 @@ static void

[PATCH v5 16/23] hw/i386/pc: Remove deprecated pc-i440fx-2.2 machine

2024-05-28 Thread Philippe Mathieu-Daudé

The pc-i440fx-2.2 machine was deprecated for the 8.2
release (see commit c7437f0ddb "docs/about: Mark the
old pc-i440fx-2.0 - 2.3 machine types as deprecated"),
time to remove it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 docs/about/deprecated.rst   |  2 +-
 docs/about/removed-features.rst |  2 +-
 include/hw/i386/pc.h|  3 ---
 hw/i386/pc.c| 23 ---
 hw/i386/pc_piix.c   | 21 -
 5 files changed, 2 insertions(+), 49 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 5b4753e5dc..0fa45aba8b 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -228,7 +228,7 @@ deprecated; use the new name ``dtb-randomness`` instead. 
The new name
 better reflects the way this property affects all random data within
 the device tree blob, not just the ``kaslr-seed`` node.
 
-``pc-i440fx-2.2`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
+``pc-i440fx-2.3`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
 

 
 These old machine types are quite neglected nowadays and thus might have
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 9b0e2f11de..5d7bb4354b 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -925,7 +925,7 @@ mips ``fulong2e`` machine alias (removed in 6.0)
 
 This machine has been renamed ``fuloong2e``.
 
-``pc-0.10`` up to ``pc-i440fx-2.1`` (removed in 4.0 up to 9.0)
+``pc-0.10`` up to ``pc-i440fx-2.2`` (removed in 4.0 up to 9.0)
 ''
 
 These machine types were very old and likely could not be used for live
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 1351e73ee0..996495985e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -279,9 +279,6 @@ extern const size_t pc_compat_2_4_len;
 extern GlobalProperty pc_compat_2_3[];
 extern const size_t pc_compat_2_3_len;
 
-extern GlobalProperty pc_compat_2_2[];
-extern const size_t pc_compat_2_2_len;
-
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c7d44420a5..ccfcb92605 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -289,29 +289,6 @@ GlobalProperty pc_compat_2_3[] = {
 };
 const size_t pc_compat_2_3_len = G_N_ELEMENTS(pc_compat_2_3);
 
-GlobalProperty pc_compat_2_2[] = {
-PC_CPU_MODEL_IDS("2.2.0")
-{ "kvm64" "-" TYPE_X86_CPU, "vme", "off" },
-{ "kvm32" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Conroe" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Penryn" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Nehalem" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Westmere" "-" TYPE_X86_CPU, "vme", "off" },
-{ "SandyBridge" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Haswell" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Broadwell" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Opteron_G1" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Opteron_G2" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Opteron_G3" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Opteron_G4" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Opteron_G5" "-" TYPE_X86_CPU, "vme", "off" },
-{ "Haswell" "-" TYPE_X86_CPU, "f16c", "off" },
-{ "Haswell" "-" TYPE_X86_CPU, "rdrand", "off" },
-{ "Broadwell" "-" TYPE_X86_CPU, "f16c", "off" },
-{ "Broadwell" "-" TYPE_X86_CPU, "rdrand", "off" },
-};
-const size_t pc_compat_2_2_len = G_N_ELEMENTS(pc_compat_2_2);
-
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
 GSIState *s;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index e0b421dd51..1343fd93e7 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -429,11 +429,6 @@ static void pc_compat_2_3_fn(MachineState *machine)
 }
 }
 
-static void pc_compat_2_2_fn(MachineState *machine)
-{
-pc_compat_2_3_fn(machine);
-}
-
 #ifdef CONFIG_ISAPC
 static void pc_init_isa(MachineState *machine)
 {
@@ -843,22 +838,6 @@ static void pc_i440fx_2_3_machine_options(MachineClass *m)
 DEFINE_I440FX_MACHINE(v2_3, "pc-i440fx-2.3", pc_compat_2_3_fn,
   pc_i440fx_2_3_machine_options);
 
-static void pc_i440fx_2_2_machine_options(MachineClass *m)
-{
-PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
-
-pc_i440fx_2_3_machine_options(m);
-m->hw_version = "2.2.0";
-m->default_machine_opts = "firmware=bios-256k.bin,suppress-vmdesc=on";
-compat_props_add(m->compat_props, hw_compat_2_2, hw_compat_2_2_len);
-compat_props_add(m->compat_props, pc_compat_2_2, pc_compat_2_2_len);
-pcmc->rsdp_in_ram = false;
-pcmc->resizable_acpi_blob = false;
-}
-
-DEFINE_I440FX_MACHINE(v2_2, "pc-i440fx-2.2",

[PATCH v5 09/23] target/i386/kvm: Remove x86_cpu_change_kvm_default() and 'kvm-cpu.h'

2024-05-28 Thread Philippe Mathieu-Daudé

x86_cpu_change_kvm_default() was only used out of kvm-cpu.c by
the pc-i440fx-2.1 machine, which got removed. Make it static,
and remove its declaration. "kvm-cpu.h" is now empty, remove it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 target/i386/kvm/kvm-cpu.h | 41 ---
 target/i386/kvm/kvm-cpu.c |  3 +--
 2 files changed, 1 insertion(+), 43 deletions(-)
 delete mode 100644 target/i386/kvm/kvm-cpu.h

diff --git a/target/i386/kvm/kvm-cpu.h b/target/i386/kvm/kvm-cpu.h
deleted file mode 100644
index e858ca21e5..00
--- a/target/i386/kvm/kvm-cpu.h
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * i386 KVM CPU type and functions
- *
- *  Copyright (c) 2003 Fabrice Bellard
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; if not, see .
- */
-
-#ifndef KVM_CPU_H
-#define KVM_CPU_H
-
-#ifdef CONFIG_KVM
-/*
- * Change the value of a KVM-specific default
- *
- * If value is NULL, no default will be set and the original
- * value from the CPU model table will be kept.
- *
- * It is valid to call this function only for properties that
- * are already present in the kvm_default_props table.
- */
-void x86_cpu_change_kvm_default(const char *prop, const char *value);
-
-#else /* !CONFIG_KVM */
-
-#define x86_cpu_change_kvm_default(a, b)
-
-#endif /* CONFIG_KVM */
-
-#endif /* KVM_CPU_H */
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index f76972e47e..f9b99b5f50 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -10,7 +10,6 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "host-cpu.h"
-#include "kvm-cpu.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "hw/boards.h"
@@ -178,7 +177,7 @@ static PropValue kvm_default_props[] = {
 /*
  * Only for builtin_x86_defs models initialized with x86_register_cpudef_types.
  */
-void x86_cpu_change_kvm_default(const char *prop, const char *value)
+static void x86_cpu_change_kvm_default(const char *prop, const char *value)
 {
 PropValue *pv;
 for (pv = kvm_default_props; pv->prop; pv++) {
-- 
2.41.0

[PATCH v5 02/23] hw/i386/pc: Remove deprecated pc-i440fx-2.0 machine

2024-05-28 Thread Philippe Mathieu-Daudé

The pc-i440fx-2.0 machine was deprecated for the 8.2
release (see commit c7437f0ddb "docs/about: Mark the
old pc-i440fx-2.0 - 2.3 machine types as deprecated"),
time to remove it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 docs/about/deprecated.rst   |  2 +-
 docs/about/removed-features.rst |  2 +-
 include/hw/i386/pc.h|  3 ---
 hw/i386/pc.c| 15 -
 hw/i386/pc_piix.c   | 37 -
 5 files changed, 2 insertions(+), 57 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 7ff52bdd8e..629f6a1566 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -228,7 +228,7 @@ deprecated; use the new name ``dtb-randomness`` instead. 
The new name
 better reflects the way this property affects all random data within
 the device tree blob, not just the ``kaslr-seed`` node.
 
-``pc-i440fx-2.0`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
+``pc-i440fx-2.1`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
 

 
 These old machine types are quite neglected nowadays and thus might have
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index fba0cfb0b0..5f0c2d8ec2 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -925,7 +925,7 @@ mips ``fulong2e`` machine alias (removed in 6.0)
 
 This machine has been renamed ``fuloong2e``.
 
-``pc-0.10`` up to ``pc-i440fx-1.7`` (removed in 4.0 up to 8.2)
+``pc-0.10`` up to ``pc-i440fx-2.0`` (removed in 4.0 up to 9.0)
 ''
 
 These machine types were very old and likely could not be used for live
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index ad9c3d9ba8..7347636d47 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -290,9 +290,6 @@ extern const size_t pc_compat_2_2_len;
 extern GlobalProperty pc_compat_2_1[];
 extern const size_t pc_compat_2_1_len;
 
-extern GlobalProperty pc_compat_2_0[];
-extern const size_t pc_compat_2_0_len;
-
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 7b638da7aa..11182e09ce 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -319,21 +319,6 @@ GlobalProperty pc_compat_2_1[] = {
 };
 const size_t pc_compat_2_1_len = G_N_ELEMENTS(pc_compat_2_1);
 
-GlobalProperty pc_compat_2_0[] = {
-PC_CPU_MODEL_IDS("2.0.0")
-{ "virtio-scsi-pci", "any_layout", "off" },
-{ "PIIX4_PM", "memory-hotplug-support", "off" },
-{ "apic", "version", "0x11" },
-{ "nec-usb-xhci", "superspeed-ports-first", "off" },
-{ "nec-usb-xhci", "force-pcie-endcap", "on" },
-{ "pci-serial", "prog_if", "0" },
-{ "pci-serial-2x", "prog_if", "0" },
-{ "pci-serial-4x", "prog_if", "0" },
-{ "virtio-net-pci", "guest_announce", "off" },
-{ "ICH9-LPC", "memory-hotplug-support", "off" },
-};
-const size_t pc_compat_2_0_len = G_N_ELEMENTS(pc_compat_2_0);
-
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
 GSIState *s;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 02878060d0..a750a0e6ab 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -441,11 +441,6 @@ static void pc_compat_2_1_fn(MachineState *machine)
 x86_cpu_change_kvm_default("svm", NULL);
 }
 
-static void pc_compat_2_0_fn(MachineState *machine)
-{
-pc_compat_2_1_fn(machine);
-}
-
 #ifdef CONFIG_ISAPC
 static void pc_init_isa(MachineState *machine)
 {
@@ -887,38 +882,6 @@ static void pc_i440fx_2_1_machine_options(MachineClass *m)
 DEFINE_I440FX_MACHINE(v2_1, "pc-i440fx-2.1", pc_compat_2_1_fn,
   pc_i440fx_2_1_machine_options);
 
-static void pc_i440fx_2_0_machine_options(MachineClass *m)
-{
-PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
-
-pc_i440fx_2_1_machine_options(m);
-m->hw_version = "2.0.0";
-compat_props_add(m->compat_props, pc_compat_2_0, pc_compat_2_0_len);
-pcmc->smbios_legacy_mode = true;
-pcmc->has_reserved_memory = false;
-/* This value depends on the actual DSDT and SSDT compiled into
- * the source QEMU; unfortunately it depends on the binary and
- * not on the machine type, so we cannot make pc-i440fx-1.7 work on
- * both QEMU 1.7 and QEMU 2.0.
- *
- * Large variations cause migration to fail for more than one
- * consecutive value of the "-smp" maxcpus option.
- *
- * For small variations of the kind caused by different iasl versions,
- * the 4k rounding usually leaves slack.  However, there could be still
- * one or two values that break.  For QEMU 1.7 and QEMU 2.0 the
- * slack is

[PATCH v5 22/23] target/i386: Remove X86CPU::kvm_no_smi_migration field

2024-05-28 Thread Philippe Mathieu-Daudé

X86CPU::kvm_no_smi_migration was only used by the
pc-i440fx-2.3 machine, which got removed. Remove it
and simplify kvm_put_vcpu_events().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 target/i386/cpu.h | 3 ---
 target/i386/cpu.c | 2 --
 target/i386/kvm/kvm.c | 7 +--
 3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c64ef0c1a2..6951f48f86 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2059,9 +2059,6 @@ struct ArchCPU {
 /* if set, limit maximum value for phys_bits when host_phys_bits is true */
 uint8_t host_phys_bits_limit;
 
-/* Stop SMI delivery for migration compatibility with old machines */
-bool kvm_no_smi_migration;
-
 /* Forcefully disable KVM PV features not exposed in guest CPUIDs */
 bool kvm_pv_enforce_cpuid;
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc2dceb647..2d972def64 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -8253,8 +8253,6 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("x-vendor-cpuid-only", X86CPU, vendor_cpuid_only, true),
 DEFINE_PROP_BOOL("lmce", X86CPU, enable_lmce, false),
 DEFINE_PROP_BOOL("l3-cache", X86CPU, enable_l3_cache, true),
-DEFINE_PROP_BOOL("kvm-no-smi-migration", X86CPU, kvm_no_smi_migration,
- false),
 DEFINE_PROP_BOOL("kvm-pv-enforce-cpuid", X86CPU, kvm_pv_enforce_cpuid,
  false),
 DEFINE_PROP_BOOL("vmware-cpuid-freq", X86CPU, vmware_cpuid_freq, true),
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 6c864e4611..51bd9556f6 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4391,6 +4391,7 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
 events.sipi_vector = env->sipi_vector;
 
 if (has_msr_smbase) {
+events.flags |= KVM_VCPUEVENT_VALID_SMM;
 events.smi.smm = !!(env->hflags & HF_SMM_MASK);
 events.smi.smm_inside_nmi = !!(env->hflags2 & HF2_SMM_INSIDE_NMI_MASK);
 if (kvm_irqchip_in_kernel()) {
@@ -4405,12 +4406,6 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level)
 events.smi.pending = 0;
 events.smi.latched_init = 0;
 }
-/* Stop SMI delivery on old machine types to avoid a reboot
- * on an inward migration of an old VM.
- */
-if (!cpu->kvm_no_smi_migration) {
-events.flags |= KVM_VCPUEVENT_VALID_SMM;
-}
 }
 
 if (level >= KVM_PUT_RESET_STATE) {
-- 
2.41.0

[PATCH v5 07/23] hw/acpi/ich9: Remove dead code related to 'acpi_memory_hotplug'

2024-05-28 Thread Philippe Mathieu-Daudé

acpi_memory_hotplug::is_enabled is set to %true once via
ich9_lpc_initfn() -> ich9_pm_add_properties(). No need to
check it, so remove now dead code.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 hw/acpi/ich9.c | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 9b605af21a..02d8546bd3 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -153,17 +153,10 @@ static int ich9_pm_post_load(void *opaque, int version_id)
  .offset = vmstate_offset_pointer(_state, _field, uint8_t),  \
  }
 
-static bool vmstate_test_use_memhp(void *opaque)
-{
-ICH9LPCPMRegs *s = opaque;
-return s->acpi_memory_hotplug.is_enabled;
-}
-
 static const VMStateDescription vmstate_memhp_state = {
 .name = "ich9_pm/memhp",
 .version_id = 1,
 .minimum_version_id = 1,
-.needed = vmstate_test_use_memhp,
 .fields = (const VMStateField[]) {
 VMSTATE_MEMORY_HOTPLUG(acpi_memory_hotplug, ICH9LPCPMRegs),
 VMSTATE_END_OF_LIST()
@@ -335,11 +328,9 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm, 
qemu_irq sci_irq)
 legacy_acpi_cpu_hotplug_init(pci_address_space_io(lpc_pci),
 OBJECT(lpc_pci), >gpe_cpu, ICH9_CPU_HOTPLUG_IO_BASE);
 
-if (pm->acpi_memory_hotplug.is_enabled) {
-acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), 
OBJECT(lpc_pci),
- >acpi_memory_hotplug,
- ACPI_MEMORY_HOTPLUG_BASE);
-}
+acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), OBJECT(lpc_pci),
+ >acpi_memory_hotplug,
+ ACPI_MEMORY_HOTPLUG_BASE);
 }
 
 static void ich9_pm_get_gpe0_blk(Object *obj, Visitor *v, const char *name,
@@ -460,12 +451,7 @@ void ich9_pm_device_pre_plug_cb(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 return;
 }
 
-if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
-!lpc->pm.acpi_memory_hotplug.is_enabled) {
-error_setg(errp,
-   "memory hotplug is not enabled: %s.memory-hotplug-support "
-   "is not set", object_get_typename(OBJECT(lpc)));
-} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
 uint64_t negotiated = lpc->smi_negotiated_features;
 
 if (negotiated & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT) &&
@@ -509,8 +495,7 @@ void ich9_pm_device_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 {
 ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
 
-if (lpc->pm.acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 acpi_memory_unplug_request_cb(hotplug_dev,
   >pm.acpi_memory_hotplug, dev,
   errp);
@@ -545,8 +530,7 @@ void ich9_pm_device_unplug_cb(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 {
 ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
 
-if (lpc->pm.acpi_memory_hotplug.is_enabled &&
-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 acpi_memory_unplug_cb(>pm.acpi_memory_hotplug, dev, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) &&
!lpc->pm.cpu_hotplug_legacy) {
-- 
2.41.0

[PATCH v5 23/23] hw/i386/pc: Replace PCMachineClass::acpi_data_size by PC_ACPI_DATA_SIZE

2024-05-28 Thread Philippe Mathieu-Daudé

PCMachineClass::acpi_data_size was only used by the pc-i440fx-2.0
machine, which got removed. Since it is constant, replace the class
field by a definition (local to hw/i386/pc.c, since not used
elsewhere).

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 include/hw/i386/pc.h |  4 
 hw/i386/pc.c | 19 ---
 hw/i386/pc_piix.c|  7 ---
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 63568eb9e9..db26368ace 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -74,9 +74,6 @@ typedef struct PCMachineState {
  *
  * Compat fields:
  *
- * @acpi_data_size: Size of the chunk of memory at the top of RAM
- *  for the BIOS ACPI tables and other BIOS
- *  datastructures.
  * @gigabyte_align: Make sure that guest addresses aligned at
  *  1Gbyte boundaries get mapped to host
  *  addresses aligned at 1Gbyte boundaries. This
@@ -100,7 +97,6 @@ struct PCMachineClass {
 
 /* ACPI compat: */
 bool has_acpi_build;
-unsigned acpi_data_size;
 int pci_root_uid;
 
 /* SMBIOS compat: */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index b84c8ddba0..9dca3f0354 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -264,6 +264,16 @@ GlobalProperty pc_compat_2_4[] = {
 };
 const size_t pc_compat_2_4_len = G_N_ELEMENTS(pc_compat_2_4);
 
+/*
+ * @PC_ACPI_DATA_SIZE:
+ * Size of the chunk of memory at the top of RAM for the BIOS ACPI tables
+ * and other BIOS datastructures.
+ *
+ * BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K
+ * reported to be used at the moment, 32K should be enough for a while.
+ */
+#define PC_ACPI_DATA_SIZE (0x2 + 0x8000)
+
 GSIState *pc_gsi_create(qemu_irq **irqs, bool pci_enabled)
 {
 GSIState *s;
@@ -645,8 +655,7 @@ void xen_load_linux(PCMachineState *pcms)
 fw_cfg_add_i16(fw_cfg, FW_CFG_NB_CPUS, x86ms->boot_cpus);
 rom_set_fw(fw_cfg);
 
-x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
-   pcmc->pvh_enabled);
+x86_load_linux(x86ms, fw_cfg, PC_ACPI_DATA_SIZE, pcmc->pvh_enabled);
 for (i = 0; i < nb_option_roms; i++) {
 assert(!strcmp(option_rom[i].name, "linuxboot.bin") ||
!strcmp(option_rom[i].name, "linuxboot_dma.bin") ||
@@ -980,8 +989,7 @@ void pc_memory_init(PCMachineState *pcms,
 }
 
 if (linux_boot) {
-x86_load_linux(x86ms, fw_cfg, pcmc->acpi_data_size,
-   pcmc->pvh_enabled);
+x86_load_linux(x86ms, fw_cfg, PC_ACPI_DATA_SIZE, pcmc->pvh_enabled);
 }
 
 for (i = 0; i < nb_option_roms; i++) {
@@ -1737,9 +1745,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->has_reserved_memory = true;
 pcmc->enforce_amd_1tb_hole = true;
 pcmc->isa_bios_alias = true;
-/* BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K reported
- * to be used at the moment, 32K should be enough for a while.  */
-pcmc->acpi_data_size = 0x2 + 0x8000;
 pcmc->pvh_enabled = true;
 pcmc->kvmclock_create_always = true;
 x86mc->apic_xrupt_override = true;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index e7f51a5f2c..e4930b7f48 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -414,13 +414,6 @@ static void pc_set_south_bridge(Object *obj, int value, 
Error **errp)
 pcms->south_bridge = PCSouthBridgeOption_lookup.array[value];
 }
 
-/* Looking for a pc_compat_2_4() function? It doesn't exist.
- * pc_compat_*() functions that run on machine-init time and
- * change global QEMU state are deprecated. Please don't create
- * one, and implement any pc-*-2.4 (and newer) compat code in
- * hw_compat_*, pc_compat_*, or * pc_*_machine_options().
- */
-
 #ifdef CONFIG_ISAPC
 static void pc_init_isa(MachineState *machine)
 {
-- 
2.41.0

[PATCH v5 04/23] hw/usb/hcd-xhci: Remove XHCI_FLAG_SS_FIRST flag

2024-05-28 Thread Philippe Mathieu-Daudé

XHCI_FLAG_SS_FIRST was only used by the pc-i440fx-2.0 machine,
which got removed. Remove it and simplify various functions in
hcd-xhci.c.

Reviewed-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 hw/usb/hcd-xhci.h |  3 +--
 hw/usb/hcd-xhci-nec.c |  2 --
 hw/usb/hcd-xhci-pci.c |  1 -
 hw/usb/hcd-xhci.c | 42 --
 4 files changed, 9 insertions(+), 39 deletions(-)

diff --git a/hw/usb/hcd-xhci.h b/hw/usb/hcd-xhci.h
index 1efa4858fb..fe16d7ad05 100644
--- a/hw/usb/hcd-xhci.h
+++ b/hw/usb/hcd-xhci.h
@@ -36,8 +36,7 @@ typedef struct XHCIStreamContext XHCIStreamContext;
 typedef struct XHCIEPContext XHCIEPContext;
 
 enum xhci_flags {
-XHCI_FLAG_SS_FIRST = 1,
-XHCI_FLAG_ENABLE_STREAMS,
+XHCI_FLAG_ENABLE_STREAMS = 1,
 };
 
 typedef enum TRBType {
diff --git a/hw/usb/hcd-xhci-nec.c b/hw/usb/hcd-xhci-nec.c
index 5d5b069cf9..0c063b3697 100644
--- a/hw/usb/hcd-xhci-nec.c
+++ b/hw/usb/hcd-xhci-nec.c
@@ -41,8 +41,6 @@ struct XHCINecState {
 static Property nec_xhci_properties[] = {
 DEFINE_PROP_ON_OFF_AUTO("msi", XHCIPciState, msi, ON_OFF_AUTO_AUTO),
 DEFINE_PROP_ON_OFF_AUTO("msix", XHCIPciState, msix, ON_OFF_AUTO_AUTO),
-DEFINE_PROP_BIT("superspeed-ports-first", XHCINecState, flags,
-XHCI_FLAG_SS_FIRST, true),
 DEFINE_PROP_UINT32("intrs", XHCINecState, intrs, XHCI_MAXINTRS),
 DEFINE_PROP_UINT32("slots", XHCINecState, slots, XHCI_MAXSLOTS),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
index cbad96f393..264d7ebb77 100644
--- a/hw/usb/hcd-xhci-pci.c
+++ b/hw/usb/hcd-xhci-pci.c
@@ -242,7 +242,6 @@ static void qemu_xhci_instance_init(Object *obj)
 s->msix = ON_OFF_AUTO_AUTO;
 xhci->numintrs = XHCI_MAXINTRS;
 xhci->numslots = XHCI_MAXSLOTS;
-xhci_set_flag(xhci, XHCI_FLAG_SS_FIRST);
 }
 
 static const TypeInfo qemu_xhci_info = {
diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index ad40232eb6..b6411f0bda 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -541,18 +541,10 @@ static XHCIPort *xhci_lookup_port(XHCIState *xhci, struct 
USBPort *uport)
 case USB_SPEED_LOW:
 case USB_SPEED_FULL:
 case USB_SPEED_HIGH:
-if (xhci_get_flag(xhci, XHCI_FLAG_SS_FIRST)) {
-index = uport->index + xhci->numports_3;
-} else {
-index = uport->index;
-}
+index = uport->index + xhci->numports_3;
 break;
 case USB_SPEED_SUPER:
-if (xhci_get_flag(xhci, XHCI_FLAG_SS_FIRST)) {
-index = uport->index;
-} else {
-index = uport->index + xhci->numports_2;
-}
+index = uport->index;
 break;
 default:
 return NULL;
@@ -2779,11 +2771,7 @@ static uint64_t xhci_cap_read(void *ptr, hwaddr reg, 
unsigned size)
 ret = 0x20425355; /* "USB " */
 break;
 case 0x28: /* Supported Protocol:08 */
-if (xhci_get_flag(xhci, XHCI_FLAG_SS_FIRST)) {
-ret = (xhci->numports_2<<8) | (xhci->numports_3+1);
-} else {
-ret = (xhci->numports_2<<8) | 1;
-}
+ret = (xhci->numports_2 << 8) | (xhci->numports_3 + 1);
 break;
 case 0x2c: /* Supported Protocol:0c */
 ret = 0x; /* reserved */
@@ -2795,11 +2783,7 @@ static uint64_t xhci_cap_read(void *ptr, hwaddr reg, 
unsigned size)
 ret = 0x20425355; /* "USB " */
 break;
 case 0x38: /* Supported Protocol:08 */
-if (xhci_get_flag(xhci, XHCI_FLAG_SS_FIRST)) {
-ret = (xhci->numports_3<<8) | 1;
-} else {
-ret = (xhci->numports_3<<8) | (xhci->numports_2+1);
-}
+ret = (xhci->numports_3 << 8) | 1;
 break;
 case 0x3c: /* Supported Protocol:0c */
 ret = 0x; /* reserved */
@@ -3349,13 +,8 @@ static void usb_xhci_init(XHCIState *xhci)
 for (i = 0; i < usbports; i++) {
 speedmask = 0;
 if (i < xhci->numports_2) {
-if (xhci_get_flag(xhci, XHCI_FLAG_SS_FIRST)) {
-port = >ports[i + xhci->numports_3];
-port->portnr = i + 1 + xhci->numports_3;
-} else {
-port = >ports[i];
-port->portnr = i + 1;
-}
+port = >ports[i + xhci->numports_3];
+port->portnr = i + 1 + xhci->numports_3;
 port->uport = >uports[i];
 port->speedmask =
 USB_SPEED_MASK_LOW  |
@@ -3366,13 +3345,8 @@ static void usb_xhci_init(XHCIState *xhci)
 speedmask |= port->speedmask;
 }
 if (i < xhci->numports_3) {
-if (xhci_get_flag(xhci, XHCI_FLAG_SS_FIRST)) {
-port = >ports[i];
-port->portnr = i + 1;
-} else {
-port = >ports[i + xhci->numports_2];
-port->portnr = i + 1 + xhci->numports_2;
-}
+port = >ports[i];
+

[PATCH v5 05/23] hw/i386/acpi: Remove PCMachineClass::legacy_acpi_table_size

2024-05-28 Thread Philippe Mathieu-Daudé

PCMachineClass::legacy_acpi_table_size was only used by the
pc-i440fx-2.0 machine, which got removed. Remove it and simplify
acpi_build().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 include/hw/i386/pc.h |  1 -
 hw/i386/acpi-build.c | 62 +---
 2 files changed, 12 insertions(+), 51 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7347636d47..01fdcfaeb6 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -103,7 +103,6 @@ struct PCMachineClass {
 /* ACPI compat: */
 bool has_acpi_build;
 bool rsdp_in_ram;
-int legacy_acpi_table_size;
 unsigned acpi_data_size;
 int pci_root_uid;
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 53f804ac16..a6f8203460 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2499,13 +2499,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 X86MachineState *x86ms = X86_MACHINE(machine);
 DeviceState *iommu = pcms->iommu;
 GArray *table_offsets;
-unsigned facs, dsdt, rsdt, fadt;
+unsigned facs, dsdt, rsdt;
 AcpiPmInfo pm;
 AcpiMiscInfo misc;
 AcpiMcfgInfo mcfg;
 Range pci_hole = {}, pci_hole64 = {};
 uint8_t *u;
-size_t aml_len = 0;
 GArray *tables_blob = tables->table_data;
 AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
 Object *vmgenid_dev;
@@ -2551,19 +2550,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 build_dsdt(tables_blob, tables->linker, , ,
_hole, _hole64, machine);
 
-/* Count the size of the DSDT and SSDT, we will need it for legacy
- * sizing of ACPI tables.
- */
-aml_len += tables_blob->len - dsdt;
-
 /* ACPI tables pointed to by RSDT */
-fadt = tables_blob->len;
 acpi_add_table(table_offsets, tables_blob);
 pm.fadt.facs_tbl_offset = 
 pm.fadt.dsdt_tbl_offset = 
 pm.fadt.xdsdt_tbl_offset = 
 build_fadt(tables_blob, tables->linker, , oem_id, oem_table_id);
-aml_len += tables_blob->len - fadt;
 
 acpi_add_table(table_offsets, tables_blob);
 acpi_build_madt(tables_blob, tables->linker, x86ms,
@@ -2694,49 +2686,19 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
  * too simple to be enough.  4k turned out to be too small an
  * alignment very soon, and in fact it is almost impossible to
  * keep the table size stable for all (max_cpus, max_memory_slots)
- * combinations.  So the table size is always 64k for pc-i440fx-2.1
- * and we give an error if the table grows beyond that limit.
- *
- * We still have the problem of migrating from "-M pc-i440fx-2.0".  For
- * that, we exploit the fact that QEMU 2.1 generates _smaller_ tables
- * than 2.0 and we can always pad the smaller tables with zeros.  We can
- * then use the exact size of the 2.0 tables.
- *
- * All this is for PIIX4, since QEMU 2.0 didn't support Q35 migration.
+ * combinations.
  */
-if (pcmc->legacy_acpi_table_size) {
-/* Subtracting aml_len gives the size of fixed tables.  Then add the
- * size of the PIIX4 DSDT/SSDT in QEMU 2.0.
- */
-int legacy_aml_len =
-pcmc->legacy_acpi_table_size +
-ACPI_BUILD_LEGACY_CPU_AML_SIZE * x86ms->apic_id_limit;
-int legacy_table_size =
-ROUND_UP(tables_blob->len - aml_len + legacy_aml_len,
- ACPI_BUILD_ALIGN_SIZE);
-if ((tables_blob->len > legacy_table_size) &&
-!pcmc->resizable_acpi_blob) {
-/* Should happen only with PCI bridges and -M pc-i440fx-2.0.  */
-warn_report("ACPI table size %u exceeds %d bytes,"
-" migration may not work",
-tables_blob->len, legacy_table_size);
-error_printf("Try removing CPUs, NUMA nodes, memory slots"
- " or PCI bridges.\n");
-}
-g_array_set_size(tables_blob, legacy_table_size);
-} else {
-/* Make sure we have a buffer in case we need to resize the tables. */
-if ((tables_blob->len > ACPI_BUILD_TABLE_SIZE / 2) &&
-!pcmc->resizable_acpi_blob) {
-/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots. 
 */
-warn_report("ACPI table size %u exceeds %d bytes,"
-" migration may not work",
-tables_blob->len, ACPI_BUILD_TABLE_SIZE / 2);
-error_printf("Try removing CPUs, NUMA nodes, memory slots"
- " or PCI bridges.\n");
-}
-acpi_align_size(tables_blob, ACPI_BUILD_TABLE_SIZE);
+/* Make sure we have a buffer in case we need to resize the tables. */
+if ((tables_blob->len > ACPI_BUILD_TABLE_SIZE / 2) &&
+!pcmc->resizable_acpi_blob) {
+/* As of QEMU 2.1, this fires with 160 VCPUs and 255 memory slots.  */
+

[PATCH v5 15/23] hw/mem/memory-device: Remove legacy_align from memory_device_pre_plug()

2024-05-28 Thread Philippe Mathieu-Daudé

'legacy_align' is always NULL, remove it, simplifying
memory_device_pre_plug().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: David Hildenbrand 
Reviewed-by: Zhao Liu 
---
 include/hw/mem/memory-device.h |  2 +-
 hw/i386/pc.c   |  3 +--
 hw/mem/memory-device.c | 12 
 hw/mem/pc-dimm.c   |  2 +-
 hw/virtio/virtio-md-pci.c  |  2 +-
 5 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/include/hw/mem/memory-device.h b/include/hw/mem/memory-device.h
index e0571c8a31..c0a58087cc 100644
--- a/include/hw/mem/memory-device.h
+++ b/include/hw/mem/memory-device.h
@@ -169,7 +169,7 @@ uint64_t get_plugged_memory_size(void);
 unsigned int memory_devices_get_reserved_memslots(void);
 bool memory_devices_memslot_auto_decision_active(void);
 void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
-const uint64_t *legacy_align, Error **errp);
+Error **errp);
 void memory_device_plug(MemoryDeviceState *md, MachineState *ms);
 void memory_device_unplug(MemoryDeviceState *md, MachineState *ms);
 uint64_t memory_device_get_region_size(const MemoryDeviceState *md,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 08d38a1dcc..c7d44420a5 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1389,8 +1389,7 @@ static void pc_hv_balloon_pre_plug(HotplugHandler 
*hotplug_dev,
 {
 /* The vmbus handler has no hotplug handler; we should never end up here. 
*/
 g_assert(!dev->hotplugged);
-memory_device_pre_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev), NULL,
-   errp);
+memory_device_pre_plug(MEMORY_DEVICE(dev), MACHINE(hotplug_dev), errp);
 }
 
 static void pc_hv_balloon_plug(HotplugHandler *hotplug_dev,
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index e098585cda..a5f279adcc 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -345,7 +345,7 @@ uint64_t get_plugged_memory_size(void)
 }
 
 void memory_device_pre_plug(MemoryDeviceState *md, MachineState *ms,
-const uint64_t *legacy_align, Error **errp)
+Error **errp)
 {
 const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
 Error *local_err = NULL;
@@ -388,14 +388,10 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
MachineState *ms,
 return;
 }
 
-if (legacy_align) {
-align = *legacy_align;
-} else {
-if (mdc->get_min_alignment) {
-align = mdc->get_min_alignment(md);
-}
-align = MAX(align, memory_region_get_alignment(mr));
+if (mdc->get_min_alignment) {
+align = mdc->get_min_alignment(md);
 }
+align = MAX(align, memory_region_get_alignment(mr));
 addr = mdc->get_addr(md);
 addr = memory_device_get_free_addr(ms, !addr ? NULL : , align,
memory_region_size(mr), _err);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 836384a90f..27919ca45d 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -69,7 +69,7 @@ void pc_dimm_pre_plug(PCDIMMDevice *dimm, MachineState 
*machine, Error **errp)
 _abort);
 trace_mhp_pc_dimm_assigned_slot(slot);
 
-memory_device_pre_plug(MEMORY_DEVICE(dimm), machine, NULL, errp);
+memory_device_pre_plug(MEMORY_DEVICE(dimm), machine, errp);
 }
 
 void pc_dimm_plug(PCDIMMDevice *dimm, MachineState *machine)
diff --git a/hw/virtio/virtio-md-pci.c b/hw/virtio/virtio-md-pci.c
index 62bfb7920b..9ec5067662 100644
--- a/hw/virtio/virtio-md-pci.c
+++ b/hw/virtio/virtio-md-pci.c
@@ -37,7 +37,7 @@ void virtio_md_pci_pre_plug(VirtIOMDPCI *vmd, MachineState 
*ms, Error **errp)
  * First, see if we can plug this memory device at all. If that
  * succeeds, branch of to the actual hotplug handler.
  */
-memory_device_pre_plug(md, ms, NULL, _err);
+memory_device_pre_plug(md, ms, _err);
 if (!local_err && bus_handler) {
 hotplug_handler_pre_plug(bus_handler, dev, _err);
 }
-- 
2.41.0

[PATCH v5 13/23] hw/i386/pc: Remove PCMachineClass::enforce_aligned_dimm

2024-05-28 Thread Philippe Mathieu-Daudé

PCMachineClass::enforce_aligned_dimm was only used by the
pc-i440fx-2.1 machine, which got removed. It is now always
true. Remove it, simplifying pc_get_device_memory_range().
Update the comment in Avocado test_phybits_low_pse36().

Reviewed-by: Zhao Liu 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/i386/pc.h  |  3 ---
 hw/i386/pc.c  | 14 +++---
 tests/avocado/mem-addr-space-check.py |  9 -
 3 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index bbbf58bd42..1351e73ee0 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -74,8 +74,6 @@ typedef struct PCMachineState {
  *
  * Compat fields:
  *
- * @enforce_aligned_dimm: check that DIMM's address/size is aligned by
- *backend's alignment value if provided
  * @acpi_data_size: Size of the chunk of memory at the top of RAM
  *  for the BIOS ACPI tables and other BIOS
  *  datastructures.
@@ -114,7 +112,6 @@ struct PCMachineClass {
 /* RAM / address space compat: */
 bool gigabyte_align;
 bool has_reserved_memory;
-bool enforce_aligned_dimm;
 bool broken_reserved_end;
 bool enforce_amd_1tb_hole;
 bool isa_bios_alias;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 4b2a29bf08..9cb5083f8f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -727,7 +727,6 @@ static void pc_get_device_memory_range(PCMachineState *pcms,
hwaddr *base,
ram_addr_t *device_mem_size)
 {
-PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *machine = MACHINE(pcms);
 ram_addr_t size;
 hwaddr addr;
@@ -735,10 +734,8 @@ static void pc_get_device_memory_range(PCMachineState 
*pcms,
 size = machine->maxram_size - machine->ram_size;
 addr = ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
 
-if (pcmc->enforce_aligned_dimm) {
-/* size device region assuming 1G page max alignment per slot */
-size += (1 * GiB) * machine->ram_slots;
-}
+/* size device region assuming 1G page max alignment per slot */
+size += (1 * GiB) * machine->ram_slots;
 
 *base = addr;
 *device_mem_size = size;
@@ -1297,12 +1294,9 @@ void pc_i8259_create(ISABus *isa_bus, qemu_irq 
*i8259_irqs)
 static void pc_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
Error **errp)
 {
-const PCMachineState *pcms = PC_MACHINE(hotplug_dev);
 const X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
-const PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 const MachineState *ms = MACHINE(hotplug_dev);
 const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
-const uint64_t legacy_align = TARGET_PAGE_SIZE;
 Error *local_err = NULL;
 
 /*
@@ -1327,8 +1321,7 @@ static void pc_memory_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 return;
 }
 
-pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev),
- pcmc->enforce_aligned_dimm ? NULL : _align, errp);
+pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
 }
 
 static void pc_memory_plug(HotplugHandler *hotplug_dev,
@@ -1792,7 +1785,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->smbios_defaults = true;
 pcmc->gigabyte_align = true;
 pcmc->has_reserved_memory = true;
-pcmc->enforce_aligned_dimm = true;
 pcmc->enforce_amd_1tb_hole = true;
 pcmc->isa_bios_alias = true;
 /* BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K reported
diff --git a/tests/avocado/mem-addr-space-check.py 
b/tests/avocado/mem-addr-space-check.py
index af019969c0..85541ea051 100644
--- a/tests/avocado/mem-addr-space-check.py
+++ b/tests/avocado/mem-addr-space-check.py
@@ -31,11 +31,10 @@ def test_phybits_low_pse36(self):
 at 4 GiB boundary when "above_4g_mem_size" is 0 (this would be true 
when
 we have 0.5 GiB of VM memory, see pc_q35_init()). This means total
 hotpluggable memory size is 60 GiB. Per slot, we reserve 1 GiB of 
memory
-for dimm alignment for all newer machines (see enforce_aligned_dimm
-property for pc machines and pc_get_device_memory_range()). That leaves
-total hotpluggable actual memory size of 59 GiB. If the VM is started
-with 0.5 GiB of memory, maxmem should be set to a maximum value of
-59.5 GiB to ensure that the processor can address all memory directly.
+for dimm alignment for all machines. That leaves total hotpluggable
+actual memory size of 59 GiB. If the VM is started with 0.5 GiB of
+memory, maxmem should be set to a maximum value of 59.5 GiB to ensure
+that the processor can address all memory directly.
 Note that 64-bit pci hole size is 0 in this case. If maxmem is set to
 59.6G, QEMU should fail to

[PATCH v5 10/23] hw/i386/pc: Remove PCMachineClass::smbios_uuid_encoded

2024-05-28 Thread Philippe Mathieu-Daudé

PCMachineClass::smbios_uuid_encoded was only used by the
pc-i440fx-2.1 machine, which got removed. It is now always
true, remove it.

Reviewed-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 include/hw/i386/pc.h | 1 -
 hw/i386/fw_cfg.c | 3 +--
 hw/i386/pc.c | 1 -
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index db0f8e0e36..bbbf58bd42 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -109,7 +109,6 @@ struct PCMachineClass {
 /* SMBIOS compat: */
 bool smbios_defaults;
 bool smbios_legacy_mode;
-bool smbios_uuid_encoded;
 SmbiosEntryPointType default_smbios_ep_type;
 
 /* RAM / address space compat: */
diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index 6e0d9945d0..f9e8af3bf5 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -63,8 +63,7 @@ void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState 
*fw_cfg,
 
 if (pcmc->smbios_defaults) {
 /* These values are guest ABI, do not change */
-smbios_set_defaults("QEMU", mc->desc, mc->name,
-pcmc->smbios_uuid_encoded);
+smbios_set_defaults("QEMU", mc->desc, mc->name, true);
 }
 
 /* tell smbios about cpuid version and features */
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f27c9fd98c..4b2a29bf08 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1790,7 +1790,6 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->has_acpi_build = true;
 pcmc->rsdp_in_ram = true;
 pcmc->smbios_defaults = true;
-pcmc->smbios_uuid_encoded = true;
 pcmc->gigabyte_align = true;
 pcmc->has_reserved_memory = true;
 pcmc->enforce_aligned_dimm = true;
-- 
2.41.0

[PATCH v5 01/23] hw/i386/pc: Deprecate 2.4 to 2.12 pc-i440fx machines

2024-05-28 Thread Philippe Mathieu-Daudé

Similarly to the commit c7437f0ddb "docs/about: Mark the
old pc-i440fx-2.0 - 2.3 machine types as deprecated",
deprecate the 2.4 to 2.12 machines.

Suggested-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Reviewed-by: Zhao Liu 
---
 docs/about/deprecated.rst | 4 ++--
 hw/i386/pc_piix.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 40585ca7d5..7ff52bdd8e 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -228,8 +228,8 @@ deprecated; use the new name ``dtb-randomness`` instead. 
The new name
 better reflects the way this property affects all random data within
 the device tree blob, not just the ``kaslr-seed`` node.
 
-``pc-i440fx-2.0`` up to ``pc-i440fx-2.3`` (since 8.2)
-'
+``pc-i440fx-2.0`` up to ``pc-i440fx-2.3`` (since 8.2) and ``pc-i440fx-2.4`` up 
to ``pc-i440fx-2.12`` (since 9.1)
+
 
 These old machine types are quite neglected nowadays and thus might have
 various pitfalls with regards to live migration. Use a newer machine type
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index ebb51de380..02878060d0 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -742,6 +742,7 @@ DEFINE_I440FX_MACHINE(v3_0, "pc-i440fx-3.0", NULL,
 static void pc_i440fx_2_12_machine_options(MachineClass *m)
 {
 pc_i440fx_3_0_machine_options(m);
+m->deprecation_reason = "old and unattended - use a newer version instead";
 compat_props_add(m->compat_props, hw_compat_2_12, hw_compat_2_12_len);
 compat_props_add(m->compat_props, pc_compat_2_12, pc_compat_2_12_len);
 }
@@ -847,7 +848,6 @@ static void pc_i440fx_2_3_machine_options(MachineClass *m)
 {
 pc_i440fx_2_4_machine_options(m);
 m->hw_version = "2.3.0";
-m->deprecation_reason = "old and unattended - use a newer version instead";
 compat_props_add(m->compat_props, hw_compat_2_3, hw_compat_2_3_len);
 compat_props_add(m->compat_props, pc_compat_2_3, pc_compat_2_3_len);
 }
-- 
2.41.0

[PATCH v5 11/23] hw/smbios: Remove 'uuid_encoded' argument from smbios_set_defaults()

2024-05-28 Thread Philippe Mathieu-Daudé

'uuid_encoded' is always true, remove it.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Zhao Liu 
---
 include/hw/firmware/smbios.h | 3 +--
 hw/arm/virt.c| 3 +--
 hw/i386/fw_cfg.c | 2 +-
 hw/loongarch/virt.c  | 2 +-
 hw/riscv/virt.c  | 2 +-
 hw/smbios/smbios.c   | 6 ++
 6 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
index 8d3fb2fb3b..f066ab7262 100644
--- a/include/hw/firmware/smbios.h
+++ b/include/hw/firmware/smbios.h
@@ -331,8 +331,7 @@ void smbios_add_usr_blob_size(size_t size);
 void smbios_entry_add(QemuOpts *opts, Error **errp);
 void smbios_set_cpuid(uint32_t version, uint32_t features);
 void smbios_set_defaults(const char *manufacturer, const char *product,
- const char *version,
- bool uuid_encoded);
+ const char *version);
 void smbios_set_default_processor_family(uint16_t processor_family);
 uint8_t *smbios_get_table_legacy(size_t *length, Error **errp);
 void smbios_get_tables(MachineState *ms,
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3c93c0c0a6..268b25e332 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1677,8 +1677,7 @@ static void virt_build_smbios(VirtMachineState *vms)
 }
 
 smbios_set_defaults("QEMU", product,
-vmc->smbios_old_sys_ver ? "1.0" : mc->name,
-true);
+vmc->smbios_old_sys_ver ? "1.0" : mc->name);
 
 /* build the array of physical mem area from base_memmap */
 mem_array.address = vms->memmap[VIRT_MEM].base;
diff --git a/hw/i386/fw_cfg.c b/hw/i386/fw_cfg.c
index f9e8af3bf5..7c43c325ef 100644
--- a/hw/i386/fw_cfg.c
+++ b/hw/i386/fw_cfg.c
@@ -63,7 +63,7 @@ void fw_cfg_build_smbios(PCMachineState *pcms, FWCfgState 
*fw_cfg,
 
 if (pcmc->smbios_defaults) {
 /* These values are guest ABI, do not change */
-smbios_set_defaults("QEMU", mc->desc, mc->name, true);
+smbios_set_defaults("QEMU", mc->desc, mc->name);
 }
 
 /* tell smbios about cpuid version and features */
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 3e6e93edf3..6a12659583 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -529,7 +529,7 @@ static void virt_build_smbios(LoongArchVirtMachineState 
*lvms)
 return;
 }
 
-smbios_set_defaults("QEMU", product, mc->name, true);
+smbios_set_defaults("QEMU", product, mc->name);
 
 smbios_get_tables(ms, SMBIOS_ENTRY_POINT_TYPE_64,
   NULL, 0,
diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4fdb660525..5676d66d12 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1277,7 +1277,7 @@ static void virt_build_smbios(RISCVVirtState *s)
 product = "KVM Virtual Machine";
 }
 
-smbios_set_defaults("QEMU", product, mc->name, true);
+smbios_set_defaults("QEMU", product, mc->name);
 
 if (riscv_is_32bit(>soc[0])) {
 smbios_set_default_processor_family(0x200);
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index eed5787b15..8261eb716f 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -30,7 +30,7 @@
 #include "hw/pci/pci_device.h"
 #include "smbios_build.h"
 
-static bool smbios_uuid_encoded = true;
+static const bool smbios_uuid_encoded = true;
 /*
  * SMBIOS tables provided by user with '-smbios file=' option
  */
@@ -1017,11 +1017,9 @@ void smbios_set_default_processor_family(uint16_t 
processor_family)
 }
 
 void smbios_set_defaults(const char *manufacturer, const char *product,
- const char *version,
- bool uuid_encoded)
+ const char *version)
 {
 smbios_have_defaults = true;
-smbios_uuid_encoded = uuid_encoded;
 
 SMBIOS_SET_DEFAULT(smbios_type1.manufacturer, manufacturer);
 SMBIOS_SET_DEFAULT(smbios_type1.product, product);
-- 
2.41.0

[PATCH v5 00/23] hw/i386: Remove deprecated pc-i440fx-2.0 -> 2.3 machines

2024-05-28 Thread Philippe Mathieu-Daudé

Missing review: #20

Paolo, Michael, should I merge this myself? Ack-by welcome ;)

Since v4:
- Rebased on top of 7b68a5fe2f ("Merge tag 'for-upstream')
- Removed obsolete comment (Daniel)
- Clean DEFINE_I440FX_MACHINE (Daniel, new patch).

Since v3:
- Deprecate up to 2.12 (Thomas)

Since v2:
- Addressed Zhao review comments

Since v1:
- Addressed Zhao and Thomas review comments

Kill legacy code, because we need to evolve.

I ended there via dynamic machine -> ICH9 -> legacy ACPI...

This should also help Igor cleanups:
http://lore.kernel.org/qemu-devel/20240326171632.3cc75...@imammedo.users.ipa.redhat.com/

Clashes a bit with Daniel general deprecation policy for all
versioned machines:
https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg00084.html

Philippe Mathieu-Daudé (23):
  hw/i386/pc: Deprecate 2.4 to 2.12 pc-i440fx machines
  hw/i386/pc: Remove deprecated pc-i440fx-2.0 machine
  hw/usb/hcd-xhci: Remove XHCI_FLAG_FORCE_PCIE_ENDCAP flag
  hw/usb/hcd-xhci: Remove XHCI_FLAG_SS_FIRST flag
  hw/i386/acpi: Remove PCMachineClass::legacy_acpi_table_size
  hw/acpi/ich9: Remove 'memory-hotplug-support' property
  hw/acpi/ich9: Remove dead code related to 'acpi_memory_hotplug'
  hw/i386/pc: Remove deprecated pc-i440fx-2.1 machine
  target/i386/kvm: Remove x86_cpu_change_kvm_default() and 'kvm-cpu.h'
  hw/i386/pc: Remove PCMachineClass::smbios_uuid_encoded
  hw/smbios: Remove 'uuid_encoded' argument from smbios_set_defaults()
  hw/smbios: Remove 'smbios_uuid_encoded', simplify smbios_encode_uuid()
  hw/i386/pc: Remove PCMachineClass::enforce_aligned_dimm
  hw/mem/pc-dimm: Remove legacy_align argument from pc_dimm_pre_plug()
  hw/mem/memory-device: Remove legacy_align from
memory_device_pre_plug()
  hw/i386/pc: Remove deprecated pc-i440fx-2.2 machine
  hw/i386/pc: Remove PCMachineClass::resizable_acpi_blob
  hw/i386/pc: Remove PCMachineClass::rsdp_in_ram
  hw/i386/acpi: Remove AcpiBuildState::rsdp field
  hw/i386/pc: Remove deprecated pc-i440fx-2.3 machine
  hw/i386/pc: Simplify DEFINE_I440FX_MACHINE() macro
  target/i386: Remove X86CPU::kvm_no_smi_migration field
  hw/i386/pc: Replace PCMachineClass::acpi_data_size by
PC_ACPI_DATA_SIZE

 docs/about/deprecated.rst |   4 +-
 docs/about/removed-features.rst   |   2 +-
 hw/usb/hcd-xhci.h |   4 +-
 include/hw/firmware/smbios.h  |   3 +-
 include/hw/i386/pc.h  |  22 
 include/hw/mem/memory-device.h|   2 +-
 include/hw/mem/pc-dimm.h  |   3 +-
 target/i386/cpu.h |   3 -
 target/i386/kvm/kvm-cpu.h |  41 --
 hw/acpi/ich9.c|  46 +--
 hw/arm/virt.c |   5 +-
 hw/i386/acpi-build.c  |  95 ++
 hw/i386/fw_cfg.c  |   3 +-
 hw/i386/pc.c  | 107 +++-
 hw/i386/pc_piix.c | 171 +-
 hw/loongarch/virt.c   |   4 +-
 hw/mem/memory-device.c|  12 +-
 hw/mem/pc-dimm.c  |   6 +-
 hw/ppc/spapr.c|   2 +-
 hw/riscv/virt.c   |   2 +-
 hw/smbios/smbios.c|  13 +-
 hw/usb/hcd-xhci-nec.c |   4 -
 hw/usb/hcd-xhci-pci.c |   4 +-
 hw/usb/hcd-xhci.c |  42 ++-
 hw/virtio/virtio-md-pci.c |   2 +-
 target/i386/cpu.c |   2 -
 target/i386/kvm/kvm-cpu.c |   3 +-
 target/i386/kvm/kvm.c |   7 +-
 tests/avocado/mem-addr-space-check.py |   9 +-
 29 files changed, 98 insertions(+), 525 deletions(-)
 delete mode 100644 target/i386/kvm/kvm-cpu.h

-- 
2.41.0

[PATCH] qemu/bitops.h: Locate changed bits

2024-05-28 Thread Tong Ho

Add inlined functions to obtain a mask of changed bits.  3 flavors
are added: toggled, changed to 1, changed to 0.

These newly added utilities aid common device behaviors where
actions are taken only when a register's bit(s) are changed.

Signed-off-by: Tong Ho 
---
 include/qemu/bitops.h | 33 +
 1 file changed, 33 insertions(+)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 2c0a2fe751..7a701474ea 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -148,6 +148,39 @@ static inline int test_bit(long nr, const unsigned long 
*addr)
 return 1UL & (addr[BIT_WORD(nr)] >> (nr & (BITS_PER_LONG-1)));
 }
 
+/**
+ * find_bits_changed - Returns a mask of bits changed.
+ * @ref_bits: the reference bits against which the test is made.
+ * @chk_bits: the bits to be checked.
+ */
+static inline unsigned long find_bits_changed(unsigned long ref_bits,
+  unsigned long chk_bits)
+{
+return ref_bits ^ chk_bits;
+}
+
+/**
+ * find_bits_to_1 - Returns a mask of bits changed from 0 to 1.
+ * @ref_bits: the reference bits against which the test is made.
+ * @chk_bits: the bits to be checked.
+ */
+static inline unsigned long find_bits_to_1(unsigned long ref_bits,
+   unsigned long chk_bits)
+{
+return find_bits_changed(ref_bits, chk_bits) & chk_bits;
+}
+
+/**
+ * find_bits_to_0 - Returns a mask of bits changed from 1 to 0.
+ * @ref_bits: the reference bits against which the test is made.
+ * @chk_bits: the bits to be checked.
+ */
+static inline unsigned long find_bits_to_0(unsigned long ref_bits,
+   unsigned long chk_bits)
+{
+return find_bits_to_1(chk_bits, ref_bits);
+}
+
 /**
  * find_last_bit - find the last set bit in a memory region
  * @addr: The address to start the search at
-- 
2.25.1

Re: [PATCH] target/riscv: Use get_address() to get address with Zicbom extensions

2024-05-28 Thread Philippe Mathieu-Daudé


ping?

On 19/4/24 13:05, Philippe Mathieu-Daudé wrote:

We need to use get_address() to get an address from cpu_gpr[],
since $zero is "special" (NULL).

Fixes: e05da09b7c ("target/riscv: implement Zicbom extension")
Reported-by: Zhiwei Jiang (姜智伟) 
Signed-off-by: Philippe Mathieu-Daudé 
---
  target/riscv/insn_trans/trans_rvzicbo.c.inc | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvzicbo.c.inc 
b/target/riscv/insn_trans/trans_rvzicbo.c.inc
index d5d7095903..6f6b29598d 100644
--- a/target/riscv/insn_trans/trans_rvzicbo.c.inc
+++ b/target/riscv/insn_trans/trans_rvzicbo.c.inc
@@ -31,27 +31,27 @@
  static bool trans_cbo_clean(DisasContext *ctx, arg_cbo_clean *a)
  {
  REQUIRE_ZICBOM(ctx);
-gen_helper_cbo_clean_flush(tcg_env, cpu_gpr[a->rs1]);
+gen_helper_cbo_clean_flush(tcg_env, get_address(ctx, a->rs1, 0));
  return true;
  }
  
  static bool trans_cbo_flush(DisasContext *ctx, arg_cbo_flush *a)

  {
  REQUIRE_ZICBOM(ctx);
-gen_helper_cbo_clean_flush(tcg_env, cpu_gpr[a->rs1]);
+gen_helper_cbo_clean_flush(tcg_env, get_address(ctx, a->rs1, 0));
  return true;
  }
  
  static bool trans_cbo_inval(DisasContext *ctx, arg_cbo_inval *a)

  {
  REQUIRE_ZICBOM(ctx);
-gen_helper_cbo_inval(tcg_env, cpu_gpr[a->rs1]);
+gen_helper_cbo_inval(tcg_env, get_address(ctx, a->rs1, 0));
  return true;
  }
  
  static bool trans_cbo_zero(DisasContext *ctx, arg_cbo_zero *a)

  {
  REQUIRE_ZICBOZ(ctx);
-gen_helper_cbo_zero(tcg_env, cpu_gpr[a->rs1]);
+gen_helper_cbo_zero(tcg_env, get_address(ctx, a->rs1, 0));
  return true;
  }

Re: [PATCH v2] hw/input/tsc2005: Fix -Wchar-subscripts warning in tsc2005_txrx()

2024-05-28 Thread Philippe Mathieu-Daudé


On 20/5/24 14:49, Peter Maydell wrote:

On Wed, 8 May 2024 at 15:35, Philippe Mathieu-Daudé  wrote:


Check the function index is in range and use an unsigned
variable to avoid the following warning with GCC 13.2.0:

   [666/5358] Compiling C object libcommon.fa.p/hw_input_tsc2005.c.o
   hw/input/tsc2005.c: In function 'tsc2005_timer_tick':
   hw/input/tsc2005.c:416:26: warning: array subscript has type 'char' 
[-Wchar-subscripts]
 416 | s->dav |= mode_regs[s->function];
 | ~^~

Signed-off-by: Philippe Mathieu-Daudé 
---
v2: Use Peter suggestion
---
  hw/input/tsc2005.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/input/tsc2005.c b/hw/input/tsc2005.c
index 941f163d36..8d35892c09 100644
--- a/hw/input/tsc2005.c
+++ b/hw/input/tsc2005.c
@@ -406,6 +406,9 @@ uint32_t tsc2005_txrx(void *opaque, uint32_t value, int len)
  static void tsc2005_timer_tick(void *opaque)
  {
  TSC2005State *s = opaque;
+unsigned int function = s->function;
+
+assert(function < ARRAY_SIZE(mode_regs);


Missing ')' -- this doesn't compile ;-)


Oops I apologize for not even build-testing :/


Applied to target-arm.next with the typo fixed, thanks.


Thanks!

Re: [PATCH v2 0/7] hw/xen: Simplify legacy backends handling

2024-05-28 Thread Philippe Mathieu-Daudé


ping?

On 10/5/24 12:49, Philippe Mathieu-Daudé wrote:

Respin of Paolo's Xen patches from
https://lore.kernel.org/qemu-devel/20240509170044.190795-1-pbonz...@redhat.com/
rebased on one of my cleanup branches making backend
structures const. Treat xenfb as other backends.

Paolo Bonzini (2):
   hw/xen: initialize legacy backends from xen_bus_init()
   hw/xen: register legacy backends via xen_backend_init

Philippe Mathieu-Daudé (5):
   hw/xen: Remove declarations left over in 'xen-legacy-backend.h'
   hw/xen: Constify XenLegacyDevice::XenDevOps
   hw/xen: Constify xenstore_be::XenDevOps
   hw/xen: Make XenDevOps structures const
   hw/xen: Register framebuffer backend via xen_backend_init()

  include/hw/xen/xen-legacy-backend.h | 15 +--
  include/hw/xen/xen_pvdev.h  |  3 +--
  hw/9pfs/xen-9p-backend.c|  8 +++-
  hw/display/xenfb.c  | 15 +--
  hw/i386/pc.c|  1 -
  hw/usb/xen-usb.c| 14 --
  hw/xen/xen-bus.c|  4 
  hw/xen/xen-hvm-common.c |  2 --
  hw/xen/xen-legacy-backend.c | 24 
  hw/xenpv/xen_machine_pv.c   |  7 +--
  10 files changed, 35 insertions(+), 58 deletions(-)

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-28 Thread Jinpu Wang

On Wed, May 29, 2024 at 4:43 AM Gonglei (Arei)  wrote:
>
> Hi,
>
> > -Original Message-
> > From: Peter Xu [mailto:pet...@redhat.com]
> > Sent: Tuesday, May 28, 2024 11:55 PM
> > > > > Exactly, not so compelling, as I did it first only on servers
> > > > > widely used for production in our data center. The network
> > > > > adapters are
> > > > >
> > > > > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> > > > > BCM5720 2-port Gigabit Ethernet PCIe
> > > >
> > > > Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more
> > reasonable.
> > > >
> > > >
> > https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> > > > wvaqk81vxtkzx-l...@mail.gmail.com/
> > > >
> > > > Appreciate a lot for everyone helping on the testings.
> > > >
> > > > > InfiniBand controller: Mellanox Technologies MT27800 Family
> > > > > [ConnectX-5]
> > > > >
> > > > > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > > > > migration. RDMA traffic is through InfiniBand and TCP through
> > > > > Ethernet on these two hosts. One is standby while the other is active.
> > > > >
> > > > > Now I'll try on a server with more recent Ethernet and InfiniBand
> > > > > network adapters. One of them has:
> > > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> > > > >
> > > > > The comparison between RDMA and TCP on the same NIC could make
> > > > > more
> > > > sense.
> > > >
> > > > It looks to me NICs are powerful now, but again as I mentioned I
> > > > don't think it's a reason we need to deprecate rdma, especially if
> > > > QEMU's rdma migration has the chance to be refactored using rsocket.
> > > >
> > > > Is there anyone who started looking into that direction?  Would it
> > > > make sense we start some PoC now?
> > > >
> > >
> > > My team has finished the PoC refactoring which works well.
> > >
> > > Progress:
> > > 1.  Implement io/channel-rdma.c,
> > > 2.  Add unit test tests/unit/test-io-channel-rdma.c and verifying it
> > > is successful, 3.  Remove the original code from migration/rdma.c, 4.
> > > Rewrite the rdma_start_outgoing_migration and
> > > rdma_start_incoming_migration logic, 5.  Remove all rdma_xxx functions
> > > from migration/ram.c. (to prevent RDMA live migration from polluting the
> > core logic of live migration), 6.  The soft-RoCE implemented by software is
> > used to test the RDMA live migration. It's successful.
> > >
> > > We will be submit the patchset later.
> >
> > That's great news, thank you!
> >
> > --
> > Peter Xu
>
> For rdma programming, the current mainstream implementation is to use rdma_cm 
> to establish a connection, and then use verbs to transmit data.
>
> rdma_cm and ibverbs create two FDs respectively. The two FDs have different 
> responsibilities. rdma_cm fd is used to notify connection establishment 
> events,
> and verbs fd is used to notify new CQEs. When poll/epoll monitoring is 
> directly performed on the rdma_cm fd, only a pollin event can be monitored, 
> which means
> that an rdma_cm event occurs. When the verbs fd is directly polled/epolled, 
> only the pollin event can be listened, which indicates that a new CQE is 
> generated.
>
> Rsocket is a sub-module attached to the rdma_cm library and provides rdma 
> calls that are completely similar to socket interfaces. However, this library 
> returns
> only the rdma_cm fd for listening to link setup-related events and does not 
> expose the verbs fd (readable and writable events for listening to data). 
> Only the rpoll
> interface provided by the RSocket can be used to listen to related events. 
> However, QEMU uses the ppoll interface to listen to the rdma_cm fd (gotten by 
> raccept API).
> And cannot listen to the verbs fd event. Only some hacking methods can be 
> used to address this problem.
>
> Do you guys have any ideas? Thanks.
+cc linux-rdma
+cc Sean



>
>
> Regards,
> -Gonglei

[PATCH] tests/qtest/migrate-test: Add a postcopy memfile test

2024-05-28 Thread Nicholas Piggin

Postcopy requires userfaultfd support, which requires tmpfs if a memory
file is used.

This adds back support for /dev/shm memory files, but adds preallocation
to skip environments where that mount is limited in size.

Signed-off-by: Nicholas Piggin 
---

How about this? This goes on top of the reset of the patches
(I'll re-send them all as a series if we can get to some agreement).

This adds back the /dev/shm option with preallocation and adds a test
case that requires tmpfs.

Thanks,
Nick

 tests/qtest/migration-test.c | 63 +++-
 1 file changed, 55 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 86eace354e..7fd9bbdc18 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -11,6 +11,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
 
 #include "libqtest.h"
 #include "qapi/qmp/qdict.h"
@@ -553,6 +554,7 @@ typedef struct {
  */
 bool hide_stderr;
 bool use_memfile;
+bool use_uffd_memfile;
 /* only launch the target process */
 bool only_target;
 /* Use dirty ring if true; dirty logging otherwise */
@@ -739,7 +741,48 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 ignore_stderr = "";
 }
 
-if (args->use_memfile) {
+if (!qtest_has_machine(machine_alias)) {
+g_autofree char *msg = g_strdup_printf("machine %s not supported",
+   machine_alias);
+g_test_skip(msg);
+return -1;
+}
+
+if (args->use_uffd_memfile) {
+#if defined(__NR_userfaultfd) && defined(__linux__)
+int fd;
+uint64_t size;
+
+if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
+g_test_skip("/dev/shm does not exist or is not a directory");
+return -1;
+}
+
+/*
+ * Pre-create and allocate the file here, because /dev/shm/
+ * is known to be limited in size in some places (e.g., Gitlab CI).
+ */
+memfile_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
+fd = open(memfile_path, O_WRONLY | O_CREAT | O_EXCL, S_IRUSR | 
S_IWUSR);
+if (fd == -1) {
+g_test_skip("/dev/shm file could not be created");
+return -1;
+}
+
+g_assert(qemu_strtosz(memory_size, NULL, ) == 0);
+size += 64*1024; /* QEMU may map a bit more memory for a guard page */
+
+if (fallocate(fd, 0, 0, size) == -1) {
+unlink(memfile_path);
+perror("could not alloc"); exit(1);
+g_test_skip("Could not allocate machine memory in /dev/shm");
+return -1;
+}
+close(fd);
+#else
+g_test_skip("userfaultfd is not supported");
+#endif
+} else if (args->use_memfile) {
 memfile_path = g_strdup_printf("/%s/qemu-%d", tmpfs, getpid());
 memfile_opts = g_strdup_printf(
 "-object memory-backend-file,id=mem0,size=%s"
@@ -751,12 +794,6 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 kvm_opts = ",dirty-ring-size=4096";
 }
 
-if (!qtest_has_machine(machine_alias)) {
-g_autofree char *msg = g_strdup_printf("machine %s not supported", 
machine_alias);
-g_test_skip(msg);
-return -1;
-}
-
 machine = resolve_machine_version(machine_alias, QEMU_ENV_SRC,
   QEMU_ENV_DST);
 
@@ -807,7 +844,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
  * Remove shmem file immediately to avoid memory leak in test failed case.
  * It's valid because QEMU has already opened this file
  */
-if (args->use_memfile) {
+if (args->use_memfile || args->use_uffd_memfile) {
 unlink(memfile_path);
 }
 
@@ -1275,6 +1312,15 @@ static void test_postcopy(void)
 test_postcopy_common();
 }
 
+static void test_postcopy_memfile(void)
+{
+MigrateCommon args = {
+.start.use_uffd_memfile = true,
+};
+
+test_postcopy_common();
+}
+
 static void test_postcopy_suspend(void)
 {
 MigrateCommon args = {
@@ -3441,6 +3487,7 @@ int main(int argc, char **argv)
 
 if (has_uffd) {
 migration_test_add("/migration/postcopy/plain", test_postcopy);
+migration_test_add("/migration/postcopy/memfile", 
test_postcopy_memfile);
 migration_test_add("/migration/postcopy/recovery/plain",
test_postcopy_recovery);
 migration_test_add("/migration/postcopy/preempt/plain",
-- 
2.43.0

RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-05-28 Thread Gonglei (Arei)

Hi,

> -Original Message-
> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Tuesday, May 28, 2024 11:55 PM
> > > > Exactly, not so compelling, as I did it first only on servers
> > > > widely used for production in our data center. The network
> > > > adapters are
> > > >
> > > > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> > > > BCM5720 2-port Gigabit Ethernet PCIe
> > >
> > > Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more
> reasonable.
> > >
> > >
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> > > wvaqk81vxtkzx-l...@mail.gmail.com/
> > >
> > > Appreciate a lot for everyone helping on the testings.
> > >
> > > > InfiniBand controller: Mellanox Technologies MT27800 Family
> > > > [ConnectX-5]
> > > >
> > > > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > > > migration. RDMA traffic is through InfiniBand and TCP through
> > > > Ethernet on these two hosts. One is standby while the other is active.
> > > >
> > > > Now I'll try on a server with more recent Ethernet and InfiniBand
> > > > network adapters. One of them has:
> > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> > > >
> > > > The comparison between RDMA and TCP on the same NIC could make
> > > > more
> > > sense.
> > >
> > > It looks to me NICs are powerful now, but again as I mentioned I
> > > don't think it's a reason we need to deprecate rdma, especially if
> > > QEMU's rdma migration has the chance to be refactored using rsocket.
> > >
> > > Is there anyone who started looking into that direction?  Would it
> > > make sense we start some PoC now?
> > >
> >
> > My team has finished the PoC refactoring which works well.
> >
> > Progress:
> > 1.  Implement io/channel-rdma.c,
> > 2.  Add unit test tests/unit/test-io-channel-rdma.c and verifying it
> > is successful, 3.  Remove the original code from migration/rdma.c, 4.
> > Rewrite the rdma_start_outgoing_migration and
> > rdma_start_incoming_migration logic, 5.  Remove all rdma_xxx functions
> > from migration/ram.c. (to prevent RDMA live migration from polluting the
> core logic of live migration), 6.  The soft-RoCE implemented by software is
> used to test the RDMA live migration. It's successful.
> >
> > We will be submit the patchset later.
> 
> That's great news, thank you!
> 
> --
> Peter Xu

For rdma programming, the current mainstream implementation is to use rdma_cm 
to establish a connection, and then use verbs to transmit data.

rdma_cm and ibverbs create two FDs respectively. The two FDs have different 
responsibilities. rdma_cm fd is used to notify connection establishment events, 
and verbs fd is used to notify new CQEs. When poll/epoll monitoring is directly 
performed on the rdma_cm fd, only a pollin event can be monitored, which means
that an rdma_cm event occurs. When the verbs fd is directly polled/epolled, 
only the pollin event can be listened, which indicates that a new CQE is 
generated.

Rsocket is a sub-module attached to the rdma_cm library and provides rdma calls 
that are completely similar to socket interfaces. However, this library returns 
only the rdma_cm fd for listening to link setup-related events and does not 
expose the verbs fd (readable and writable events for listening to data). Only 
the rpoll 
interface provided by the RSocket can be used to listen to related events. 
However, QEMU uses the ppoll interface to listen to the rdma_cm fd (gotten by 
raccept API). 
And cannot listen to the verbs fd event. Only some hacking methods can be used 
to address this problem. 

Do you guys have any ideas? Thanks.


Regards,
-Gonglei

Re: [PATCH 1/1] prealloc: add truncate mode for prealloc filter

2024-05-28 Thread Wang, Lei

On 5/1/2024 1:05, Denis V. Lunev via wrote:
> Preallocate filter allows to implement really interesting setups.
> 
> Assume that we have
> * shared block device, f.e. iSCSI LUN, implemented with some HW device
> * clustered LVM on top of it
> * QCOW2 image stored inside LVM volume
> 
> This allows very cheap clustered setups with all QCOW2 features intact.
> Currently supported setups using QCOW2 with data_file option are not
> so cool as snapshots are not allowed, QCOW2 should be placed into some
> additional distributed storage and so on.
> 
> Though QCOW2 inside LVM volume has a drawback. The image is growing and
> in order to accomodate that image LVM volume is to be resized. This
> could be done externally using ENOSPACE event/condition but this is
> cumbersome.
> 
> This patch introduces native implementation for such a setup. We should
> just put prealloc filter in between QCOW2 format and file nodes. In that
> case LVM will be resized at proper moment and that is done effectively
> as resizing is done in chinks.
> 
> The patch adds allocation mode for this purpose in order to distinguish
> 'fallocate' for ordinary file system and 'truncate'.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Alexander Ivanov 
> CC: Kevin Wolf 
> CC: Hanna Reitz 
> CC: Vladimir Sementsov-Ogievskiy 
> ---
>  block/preallocate.c | 50 +++--
>  1 file changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/block/preallocate.c b/block/preallocate.c
> index 4d82125036..6d31627325 100644
> --- a/block/preallocate.c
> +++ b/block/preallocate.c
> @@ -33,10 +33,24 @@
>  #include "block/block-io.h"
>  #include "block/block_int.h"
>  
> +typedef enum PreallocateMode {
> +PREALLOCATE_MODE_FALLOCATE = 0,
> +PREALLOCATE_MODE_TRUNCATE = 1,
> +PREALLOCATE_MODE__MAX = 2,
> +} PreallocateMode;
> +
> +static QEnumLookup prealloc_mode_lookup = {
> +.array = (const char *const[]) {
> +"falloc",
> +"truncate",
> +},
> +.size = PREALLOCATE_MODE__MAX,
> +};
>  
>  typedef struct PreallocateOpts {
>  int64_t prealloc_size;
>  int64_t prealloc_align;
> +PreallocateMode prealloc_mode;
>  } PreallocateOpts;
>  
>  typedef struct BDRVPreallocateState {
> @@ -79,6 +93,7 @@ typedef struct BDRVPreallocateState {
>  
>  #define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align"
>  #define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size"
> +#define PREALLOCATE_OPT_MODE "mode"

Why not keeping the names consistent, I mean:

#define PREALLOCATE_OPT_PREALLOC_MODE "prealloc-mode"

>  static QemuOptsList runtime_opts = {
>  .name = "preallocate",
>  .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
> @@ -94,7 +109,14 @@ static QemuOptsList runtime_opts = {
>  .type = QEMU_OPT_SIZE,
>  .help = "how much to preallocate, default 128M",
>  },
> -{ /* end of list */ }
> +{
> +.name = PREALLOCATE_OPT_MODE,
> +.type = QEMU_OPT_STRING,
> +.help = "Preallocation mode on image expansion "
> +"(allowed values: falloc, truncate)",
> +.def_value_str = "falloc",
> +},
> +{ /* end of list */ },
>  },
>  };
>  
> @@ -102,6 +124,8 @@ static bool preallocate_absorb_opts(PreallocateOpts 
> *dest, QDict *options,
>  BlockDriverState *child_bs, Error **errp)
>  {
>  QemuOpts *opts = qemu_opts_create(_opts, NULL, 0, _abort);
> +Error *local_err = NULL;
> +char *buf;
>  
>  if (!qemu_opts_absorb_qdict(opts, options, errp)) {
>  return false;
> @@ -112,6 +136,17 @@ static bool preallocate_absorb_opts(PreallocateOpts 
> *dest, QDict *options,
>  dest->prealloc_size =
>  qemu_opt_get_size(opts, PREALLOCATE_OPT_PREALLOC_SIZE, 128 * MiB);
>  
> +buf = qemu_opt_get_del(opts, PREALLOCATE_OPT_MODE);
> +/* prealloc_mode can be downgraded later during allocate_clusters */
> +dest->prealloc_mode = qapi_enum_parse(_mode_lookup, buf,
> +  PREALLOCATE_MODE_FALLOCATE,
> +  _err);
> +g_free(buf);
> +if (local_err != NULL) {
> +error_propagate(errp, local_err);
> +return false;
> +}
> +
>  qemu_opts_del(opts);
>  
>  if (!QEMU_IS_ALIGNED(dest->prealloc_align, BDRV_SECTOR_SIZE)) {
> @@ -335,9 +370,20 @@ handle_write(BlockDriverState *bs, int64_t offset, 
> int64_t bytes,
>  
>  want_merge_zero = want_merge_zero && (prealloc_start <= offset);
>  
> -ret = bdrv_co_pwrite_zeroes(
> +switch (s->opts.prealloc_mode) {
> +case PREALLOCATE_MODE_FALLOCATE:
> +ret = bdrv_co_pwrite_zeroes(
>  bs->file, prealloc_start, prealloc_end - prealloc_start,
>  BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
> +break;
> +case PREALLOCATE_MODE_TRUNCATE:
> +ret = bdrv_co_truncate(bs->file, prealloc_end, false,
> +

RE: [PATCH v4 00/16] Add AST2700 support

2024-05-28 Thread Jamin Lin

Hi Cedric,

> From: Cédric Le Goater 
> On 5/28/24 12:02, Jamin Lin wrote:
> > Hi Cedric,
> >
> >> -Original Message-
> >> From: Cédric Le Goater 
> >> Sent: Tuesday, May 28, 2024 5:56 PM
> >> To: Jamin Lin ; Peter Maydell
> >> ; Andrew Jeffery
> >> ; Joel Stanley ;
> >> Alistair Francis ; Cleber Rosa
> >> ; Philippe Mathieu-Daudé ;
> >> Wainer dos Santos Moschetta ; Beraldo Leal
> >> ; open list:ASPEED BMCs ;
> open
> >> list:All patches CC here 
> >> Cc: Troy Lee ; Yunlin Tang
> >> 
> >> Subject: Re: [PATCH v4 00/16] Add AST2700 support
> >>
> >> Jamin,
> >>
> >> I think you should add your self as a Reviewer to the ASPEED BMCs
> >> machine in the MAINTAINERS files. Would you agree ?
> >>
> > Agree.
> >
> > Could you please add me, Troy and Steven in the MAINTAINERS files?
> > steven_...@aspeedtech.com
> > troy_...@aspeedtech.com
> > jamin_...@aspeedtech.com
> 
> You should send a patch updating the MAINTAINERS file with new names and
> those promoted should reply that they agree, or not.
> 
> See https://qemu.readthedocs.io/en/v9.0.0/devel/maintainers.html for more
> info and the git history of MAINTAINERS also.
> 
Will send a patch to updating the MAINTAINERS file.
Thanks-Jamin
> Thanks,
> 
> C.
>

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Peter Xu

On Wed, May 29, 2024 at 10:05:32AM +1000, Nicholas Piggin wrote:
> I think that's good if you _need_ shm (e.g., for a uffd test), but
> we should permit tests that only require a memory file.

Yes there's no reason to forbid that, it's just that we're not adding new
tests but we can potentially change any future use_shmem to not test shmem
anymore.. instead we test something we don't suggest users to use..

The only concern is a small /dev/shm mount, am I right?  Would it work if
switch to memory-backend-memfd,shared=on?

Thanks,

-- 
Peter Xu

Re: [RFC PATCH 07/10] target/ppc: Add helpers to check for SMT sibling threads

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 7:16 PM AEST, Harsh Prateek Bora wrote:
>
>
> On 5/26/24 17:56, Nicholas Piggin wrote:
> > Add helpers for TCG code to determine if there are SMT siblings
> > sharing per-core and per-lpar registers. This simplifies the
> > callers and makes SMT register topology simpler to modify with
> > later changes.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >   target/ppc/cpu.h |  7 +++
> >   target/ppc/cpu_init.c|  2 +-
> >   target/ppc/excp_helper.c | 16 +++-
> >   target/ppc/misc_helper.c | 27 ++-
> >   target/ppc/timebase_helper.c | 20 +++-
> >   5 files changed, 28 insertions(+), 44 deletions(-)
> > 
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index 9a89083932..8fd6ade471 100644
> > --- a/target/ppc/cpu.h
> > +++ b/target/ppc/cpu.h
> > @@ -1406,6 +1406,13 @@ struct CPUArchState {
> >   uint64_t pmu_base_time;
> >   };
> >   
> > +#define PPC_CPU_HAS_CORE_SIBLINGS(cs)   \
> > +(cs->nr_threads > 1)
> > +
> > +#define PPC_CPU_HAS_LPAR_SIBLINGS(cs)   \
> > +((POWERPC_CPU(cs)->env.flags & POWERPC_FLAG_SMT_1LPAR) &&   \
> > + PPC_CPU_HAS_CORE_SIBLINGS(cs))
> > +
> >   #define _CORE_ID(cs)\
> >   (POWERPC_CPU(cs)->env.core_index)
> >   
> > diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
> > index ae483e20c4..e71ee008ed 100644
> > --- a/target/ppc/cpu_init.c
> > +++ b/target/ppc/cpu_init.c
> > @@ -6975,7 +6975,7 @@ static void ppc_cpu_realize(DeviceState *dev, Error 
> > **errp)
> >   
> >   pcc->parent_realize(dev, errp);
> >   
> > -if (env_cpu(env)->nr_threads > 1) {
> > +if (PPC_CPU_HAS_CORE_SIBLINGS(cs)) {
> >   env->flags |= POWERPC_FLAG_SMT;
> >   }
> >   
> > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > index 0cd542675f..fd45da0f2b 100644
> > --- a/target/ppc/excp_helper.c
> > +++ b/target/ppc/excp_helper.c
> > @@ -3029,7 +3029,7 @@ void helper_book3s_msgsnd(CPUPPCState *env, 
> > target_ulong rb)
> >   brdcast = true;
> >   }
> >   
> > -if (cs->nr_threads == 1 || !brdcast) {
> > +if (!PPC_CPU_HAS_CORE_SIBLINGS(cs) || !brdcast) {
>
> Since there are multiple usage of above macro in negation below as well, 
> we may probably want to introduce another macro PPC_CPU_HAS_SINGLE_CORE

Ah, you mean SINGLE_THREAD. Yes it would read a bit better.

Thanks,
Nick

> which checks only for nr_threads == 1. Anyways,
>
> Reviewed-by: Harsh Prateek Bora 
>
>
> >   ppc_set_irq(cpu, PPC_INTERRUPT_HDOORBELL, 1);
> >   return;
> >   }
> > @@ -3067,21 +3067,19 @@ void helper_book3s_msgsndp(CPUPPCState *env, 
> > target_ulong rb)
> >   CPUState *cs = env_cpu(env);
> >   PowerPCCPU *cpu = env_archcpu(env);
> >   CPUState *ccs;
> > -uint32_t nr_threads = cs->nr_threads;
> >   int ttir = rb & PPC_BITMASK(57, 63);
> >   
> >   helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", 
> > HFSCR_IC_MSGP);
> >   
> > -if (!(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
> > -nr_threads = 1; /* msgsndp behaves as 1-thread in LPAR-per-thread 
> > mode*/
> > -}
> > -
> > -if (!dbell_type_server(rb) || ttir >= nr_threads) {
> > +if (!dbell_type_server(rb)) {
> >   return;
> >   }
> >   
> > -if (nr_threads == 1) {
> > -ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
> > +/* msgsndp behaves as 1-thread in LPAR-per-thread mode*/
> > +if (!PPC_CPU_HAS_LPAR_SIBLINGS(cs)) {
> > +if (ttir == 0) {
> > +ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
> > +}
> >   return;
> >   }
> >   
> > diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
> > index 46ba3a5584..598c956cdd 100644
> > --- a/target/ppc/misc_helper.c
> > +++ b/target/ppc/misc_helper.c
> > @@ -49,9 +49,8 @@ void helper_spr_core_write_generic(CPUPPCState *env, 
> > uint32_t sprn,
> >   {
> >   CPUState *cs = env_cpu(env);
> >   CPUState *ccs;
> > -uint32_t nr_threads = cs->nr_threads;
> >   
> > -if (nr_threads == 1) {
> > +if (!PPC_CPU_HAS_CORE_SIBLINGS(cs)) {
> >   env->spr[sprn] = val;
> >   return;
> >   }
> > @@ -196,7 +195,7 @@ void helper_store_ptcr(CPUPPCState *env, target_ulong 
> > val)
> >   return;
> >   }
> >   
> > -if (cs->nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) 
> > {
> > +if (!PPC_CPU_HAS_LPAR_SIBLINGS(cs)) {
> >   env->spr[SPR_PTCR] = val;
> >   tlb_flush(cs);
> >   } else {
> > @@ -243,16 +242,12 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
> >   {
> >   CPUState *cs = env_cpu(env);
> >   CPUState *ccs;
> > -uint32_t nr_threads = cs->nr_threads;
> >   target_ulong dpdes = 0;
> >   
> >   helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", 
> > HFSCR_IC_MSGP);
>

Re: [RFC PATCH 06/10] ppc: Add a core_index to CPUPPCState for SMT vCPUs

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 6:52 PM AEST, Harsh Prateek Bora wrote:
> corrected typo, it's bitwise.
>
> On 5/28/24 14:18, Harsh Prateek Bora wrote:
> >> -    (POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & 
> >> ~(cs->nr_threads - 1))
> >> +    (POWERPC_CPU(cs)->env.core_index)
> > 
> > Dont we want to keep the bitwise & with ~(cs->nr_threads - 1) ?
> > How's it taken care ?

For these accessors it actually just wants to have something that
compares if a CPU belongs to the same core or not, so exact value
doesn't really matter.

Maybe the helpers should do that comparison. It could possibly even
be a class method to be really clean, although that's more costly
to call (but writing to a SMT shared register is pretty costly anyway
so maybe doesn't matter).

I'll think a bit more.

Thanks,
Nick

Re: [RFC PATCH 05/10] ppc/pnv: Extend chip_pir class method to TIR as well

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 6:32 PM AEST, Harsh Prateek Bora wrote:
>
>
> On 5/26/24 17:56, Nicholas Piggin wrote:
> > The chip_pir chip class method allows the platform to set the PIR
> > processor identification register. Extend this to a more general
> > ID function which also allows the TIR to be set. This is in
> > preparation for "big core", which is a more complicated topology
> > of cores and threads.
> > 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >   include/hw/ppc/pnv_chip.h |  3 +-
> >   hw/ppc/pnv.c  | 61 ---
> >   hw/ppc/pnv_core.c | 10 ---
> >   3 files changed, 45 insertions(+), 29 deletions(-)
> > 
> > diff --git a/include/hw/ppc/pnv_chip.h b/include/hw/ppc/pnv_chip.h
> > index 8589f3291e..679723926a 100644
> > --- a/include/hw/ppc/pnv_chip.h
> > +++ b/include/hw/ppc/pnv_chip.h
> > @@ -147,7 +147,8 @@ struct PnvChipClass {
> >   
> >   DeviceRealize parent_realize;
> >   
> > -uint32_t (*chip_pir)(PnvChip *chip, uint32_t core_id, uint32_t 
> > thread_id);
> > +void (*processor_id)(PnvChip *chip, uint32_t core_id, uint32_t 
> > thread_id,
> > + uint32_t *pir, uint32_t *tir);
>
> Should it be named get_chip_core_thread_regs() ?

Yeah, the name isn't great. It is getting the regs, but the regs are the
"pervasive id" used as well... but maybe that's not too relevant here.
What about we drop chip_ since we have the chip and no other methods use
such prefix, then call it get_thread_pir_tir()?

> > @@ -155,7 +155,7 @@ static int pnv_dt_core(PnvChip *chip, PnvCore *pc, void 
> > *fdt)
> >   char *nodename;
> >   int cpus_offset = get_cpus_node(fdt);
> >   
> > -pir = pnv_cc->chip_pir(chip, pc->hwid, 0);
> > +pnv_cc->processor_id(chip, pc->hwid, 0, , );
>
> As a generic helper API and potentially expandable, it should allow 
> passing NULL for registers whose values are not really sought to avoid 
> having to create un-necessary local variables by the caller.

I'll do that.

Thanks,
Nick

Re: [RFC PATCH 02/10] ppc/pnv: Move timebase state into PnvCore

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 5:52 PM AEST, Cédric Le Goater wrote:
> On 5/28/24 08:28, Harsh Prateek Bora wrote:
> > 
> > 
> > On 5/26/24 17:56, Nicholas Piggin wrote:
> >> The timebase state machine is per per-core state and can be driven
> >> by any thread in the core. It is currently implemented as a hack
> >> where the state is in a CPU structure and only thread 0's state is
> >> accessed by the chiptod, which limits programming the timebase
> >> side of the state machine to thread 0 of a core.
> >>
> >> Move the state out into PnvCore and share it among all threads.
> >>
> >> Signed-off-by: Nicholas Piggin 
> >> ---
> >>   include/hw/ppc/pnv_core.h    | 17 
> >>   target/ppc/cpu.h | 20 --
> >>   hw/ppc/pnv_chiptod.c |  6 ++--
> >>   target/ppc/timebase_helper.c | 53 
> >>   4 files changed, 49 insertions(+), 47 deletions(-)
> >>
> >> diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
> >> index 30c1e5b1a3..f434c71547 100644
> >> --- a/include/hw/ppc/pnv_core.h
> >> +++ b/include/hw/ppc/pnv_core.h
> >> @@ -25,6 +25,20 @@
> >>   #include "hw/ppc/pnv.h"
> >>   #include "qom/object.h"
> >> +/* ChipTOD and TimeBase State Machine */
> >> +struct pnv_tod_tbst {
> >> +    int tb_ready_for_tod; /* core TB ready to receive TOD from chiptod */
> >> +    int tod_sent_to_tb;   /* chiptod sent TOD to the core TB */
> >> +
> >> +    /*
> >> + * "Timers" for async TBST events are simulated by mfTFAC because TFAC
> >> + * is polled for such events. These are just used to ensure firmware
> >> + * performs the polling at least a few times.
> >> + */
> >> +    int tb_state_timer;
> >> +    int tb_sync_pulse_timer;
> >> +};
> >> +
> >>   #define TYPE_PNV_CORE "powernv-cpu-core"
> >>   OBJECT_DECLARE_TYPE(PnvCore, PnvCoreClass,
> >>   PNV_CORE)
> >> @@ -38,6 +52,9 @@ struct PnvCore {
> >>   uint32_t pir;
> >>   uint32_t hwid;
> >>   uint64_t hrmor;
> >> +
> >> +    struct pnv_tod_tbst pnv_tod_tbst;
> >> +
> > 
> > Now that it is part of struct PnvCore itself, we can drop pnv_ prefix
> > and just call the member variable as tod_tbst ?
>
> yes and rename pnv_tod_tbst using CamelCase please.

Okay will do. That'll look nicer.

Thanks,
Nick

Re: [RFC PATCH 04/10] ppc/pnv: specialise init for powernv8/9/10 machines

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 5:45 PM AEST, Cédric Le Goater wrote:
> On 5/28/24 09:10, Harsh Prateek Bora wrote:
> > Hi Nick,
> > 
> > On 5/26/24 17:56, Nicholas Piggin wrote:
> >> This will allow different settings and checks for different
> >> machine types with later changes.
> >>
> >> Signed-off-by: Nicholas Piggin 
> >> ---
> >>   hw/ppc/pnv.c | 35 ++-
> >>   1 file changed, 30 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> >> index 6e3a5ccdec..a706de2e36 100644
> >> --- a/hw/ppc/pnv.c
> >> +++ b/hw/ppc/pnv.c
> >> @@ -976,11 +976,6 @@ static void pnv_init(MachineState *machine)
> >>   pnv->num_chips =
> >>   machine->smp.max_cpus / (machine->smp.cores * 
> >> machine->smp.threads);
> >> -    if (machine->smp.threads > 8) {
> >> -    error_report("Cannot support more than 8 threads/core "
> >> - "on a powernv machine");
> >> -    exit(1);
> >> -    }
> >>   if (!is_power_of_2(machine->smp.threads)) {
> >>   error_report("Cannot support %d threads/core on a powernv"
> >>    "machine because it must be a power of 2",
> >> @@ -1076,6 +1071,33 @@ static void pnv_init(MachineState *machine)
> >>   }
> >>   }
> >> +static void pnv_power8_init(MachineState *machine)
> >> +{
> >> +    if (machine->smp.threads > 8) {
> >> +    error_report("Cannot support more than 8 threads/core "
> >> + "on a powernv POWER8 machine");
> > 
> > We could use mc->desc for machine name above, so that ..
> > 
> >> +    exit(1);
> >> +    }
> > 
> > with this patch, we can reuse p8 init for both p9 and p10 (and not just 
> > reuse p9 for p10 with hard coded string?).
>
> Good idea. You could add a 'max_smt' attribute to PnvMachineClass to limit
> POWER8 to one.

Okay I'll see how that goes. Good suggestions.

Thanks,
Nick

Re: [PATCH v4 11/11] ppc/pnv: Update skiboot.lid to support Power11

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 5:15 PM AEST, Cédric Le Goater wrote:
> On 5/28/24 09:05, Aditya Gupta wrote:
> > Skiboot/OPAL patches are in discussion upstream [1], with corresponding
> > commits in github repository [2].
> > 
> > Update skiboot.lid, with binary built from 'upstream_power11' branch
> > of skiboot repository with Power11 enablement patches [2].
> > 
> > ---
> > This patch can be skipped for now, if need to wait for patches to be
> > merged in open-power/skiboot. Have updated the skiboot.lid to aid in
> > testing this patch series.
>
> When is the merge in skiboot planned ? QEMU 9.1 freeze is in ~2 months.

I think I will try to get spapr bits in for 9.1, but may just skip pnv
for this round since there's a bunch of other stuff including some pnv
churn I'd like to get in 9.1.

Thanks,
Nick

Re: [RFC PATCH 03/10] target/ppc: Improve SPR indirect registers

2024-05-28 Thread Nicholas Piggin

On Tue May 28, 2024 at 4:50 PM AEST, Harsh Prateek Bora wrote:
>
> Hi Nick,
>
> On 5/26/24 17:56, Nicholas Piggin wrote:
> > SPRC/SPRD were recently added to all BookS CPUs supported, but
> > they are only tested on POWER9 and POWER10, so restrict them to
> > those CPUs.
> > 
>
> Hope you mean to restrict to P9/10 for both spapr and pnv or just pnv ?

For pnv, but they are hypervisor registers so they can not be
accessed with spapr.

[...]

> > @@ -321,11 +322,25 @@ void helper_store_sprc(CPUPPCState *env, target_ulong 
> > val)
> >   
> >   target_ulong helper_load_sprd(CPUPPCState *env)
> >   {
> > +PowerPCCPU *cpu = env_archcpu(env);
> > +PnvCore *pc = pnv_cpu_state(cpu)->core;
>
> We may want to avoid creating local variable cpu here also like previous 
> patches.

Since we have a maze of pointers and types, sometimes I like to
write the types down, but maybe that's just me :P

> However, is this helper meant to be accessible for spapr as well ?

Right, it's not. I *think* it should be okay to do this since it
should never be reached by spapr.

Thanks,
Nick

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Nicholas Piggin

On Wed May 29, 2024 at 2:05 AM AEST, Peter Xu wrote:
> On Tue, May 28, 2024 at 09:35:22AM -0400, Peter Xu wrote:
> > On Tue, May 28, 2024 at 02:27:57PM +1000, Nicholas Piggin wrote:
> > > There is no need to use /dev/shm for file-backed memory devices, and
> > > it is too small to be usable in gitlab CI. Switch to using a regular
> > > file in /tmp/ which will usually have more space available.
> > > 
> > > Signed-off-by: Nicholas Piggin 
> > > ---
> > > Am I missing something? AFAIKS there is not even any point using
> > > /dev/shm aka tmpfs anyway, there is not much special about it as a
> > > filesystem. This applies on top of the series just sent, and passes
> > > gitlab CI qtests including aarch64.
> > 
> > I think it's just that /dev/shm guarantees shmem usage, while the var
> > "tmpfs" implies g_dir_make_tmp() which may be another non-ram based file
> > system, while that'll be slightly different comparing to what a real user
> > would use - we don't suggest user to put guest RAM on things like btrfs.

Right, these days I think /tmp usually is not tmpfs but just a regular
filesystem. For these tests that's okay though. And it gets us working
with gitlab CI. The ignore-shared test works and is verified to skip the
copy (according to counters and some tracing I did) so I think it's a
good step.

> > 
> > One real implication is if we add a postcopy test it'll fail with
> > g_dir_make_tmp() when it is not pointing to a shmem mount, as
> > UFFDIO_REGISTER will fail there.  But that test doesn't yet exist as the
> > QEMU paths should be the same even if Linux will trigger different paths
> > when different types of mem is used (anonymous v.s. shmem).

Ah okay userfault. I guess that would require real tmpfs. We could just
add a new option to the harness for require_uffd when it comes up?

> > If the goal here is to properly handle the case where tmpfs doesn't have
> > enough space, how about what I suggested in the other email?
> > 
> > https://lore.kernel.org/r/ZlSppKDE6wzjCF--@x1n
> > 
> > IOW, try populate the shmem region before starting the guest, skip if
> > population failed.  Would that work?

I think that's good if you _need_ shm (e.g., for a uffd test), but
we should permit tests that only require a memory file.

Thanks,
Nick

Re: [PULL 2/2] hw/ufs: Add support MCQ of UFSHCI 4.0

2024-05-28 Thread Jeuk Kim




On 5/29/2024 2:06 AM, Richard Henderson wrote:

On 5/27/24 23:12, Jeuk Kim wrote:

From: Minwoo Im 

This patch adds support for MCQ defined in UFSHCI 4.0.  This patch
utilized the legacy I/O codes as much as possible to support MCQ.

MCQ operation & runtime register is placed at 0x1000 offset of UFSHCI
register statically with no spare space among four registers (48B):

UfsMcqSqReg, UfsMcqSqIntReg, UfsMcqCqReg, UfsMcqCqIntReg

The maxinum number of queue is 32 as per spec, and the default
MAC(Multiple Active Commands) are 32 in the device.

Example:
-device ufs,serial=foo,id=ufs0,mcq=true,mcq-maxq=8

Signed-off-by: Minwoo Im 
Reviewed-by: Jeuk Kim 
Message-Id: <20240528023106.856777-3-minwoo...@samsung.com>
Signed-off-by: Jeuk Kim 
---
  hw/ufs/trace-events |  17 ++
  hw/ufs/ufs.c    | 475 ++--
  hw/ufs/ufs.h    |  98 -
  include/block/ufs.h |  23 ++-
  4 files changed, 593 insertions(+), 20 deletions(-)


Fails build:

https://gitlab.com/qemu-project/qemu/-/jobs/6960270722

In file included from trace/trace-hw_ufs.c:5:
../hw/ufs/trace-events:28:24: error: format specifies type 'unsigned 
char' but the argument has type 'uint32_t' (aka 'unsigned int') 
[-Werror,-Wformat]

 , cqid, addr);
   ^~~~
../hw/ufs/trace-events:25:112: error: format specifies type 'unsigned 
char' but the argument has type 'uint32_t' (aka 'unsigned int') 
[-Werror,-Wformat]
    qemu_log("ufs_err_dma_write_cq " "failed to write cq 
entry. cqid %"PRIu8", hwaddr %"PRIu64"" "\n", cqid, addr);

~~~  ^~~~
2 errors generated.



r~



Sorry about that.

I'll fix it and send it back to you.

Re: [PATCH V1 19/26] physmem: preserve ram blocks for cpr

2024-05-28 Thread Peter Xu

On Mon, Apr 29, 2024 at 08:55:28AM -0700, Steve Sistare wrote:
> Preserve fields of RAMBlocks that allocate their host memory during CPR so
> the RAM allocation can be recovered.

This sentence itself did not explain much, IMHO.  QEMU can share memory
using fd based memory already of all kinds, as long as the memory backend
is path-based it can be shared by sharing the same paths to dst.

This reads very confusing as a generic concept.  I mean, QEMU migration
relies on so many things to work right.  We mostly asks the users to "use
exactly the same cmdline for src/dst QEMU unless you know what you're
doing", otherwise many things can break.  That should also include ramblock
being matched between src/dst due to the same cmdlines provided on both
sides.  It'll be confusing to mention this when we thought the ramblocks
also rely on that fact.

So IIUC this sentence should be dropped in the real patch, and I'll try to
guess the real reason with below..

> Mirror the mr->align field in the RAMBlock to simplify the vmstate.
> Preserve the old host address, even though it is immediately discarded,
> as it will be needed in the future for CPR with iommufd.  Preserve
> guest_memfd, even though CPR does not yet support it, to maintain vmstate
> compatibility when it becomes supported.

.. It could be about the vfio vaddr update feature that you mentioned and
only for iommufd (as IIUC vfio still relies on iova ranges, then it won't
help here)?

If so, IMHO we should have this patch (or any variance form) to be there
for your upcoming vfio support.  Keeping this around like this will make
the series harder to review.  Or is it needed even before VFIO?

Another thing to ask: does this idea also need to rely on some future
iommufd kernel support?  If there's anything that's not merged in current
Linux upstream, this series needs to be marked as RFC, so it's not target
for merging.  This will also be true if this patch is "preparing" for that
work.  It means if this patch only services iommufd purpose, even if it
doesn't require any kernel header to be referenced, we should only merge it
together with the full iommufd support comes later (and that'll be after
iommufd kernel supports land).

Thanks,

-- 
Peter Xu

Re: [PATCH 3/4] usb/ohci-pci: deprecate, don't build by default

2024-05-28 Thread Mark Cave-Ayland


On 28/05/2024 11:35, Thomas Huth wrote:


On 28/05/2024 11.54, Gerd Hoffmann wrote:

The xhci host adapter is the much better choice.

Signed-off-by: Gerd Hoffmann 
---
  hw/usb/hcd-ohci-pci.c | 1 +
  hw/usb/Kconfig    | 1 -
  2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ohci-pci.c b/hw/usb/hcd-ohci-pci.c
index 33ed9b6f5a52..88de657def71 100644
--- a/hw/usb/hcd-ohci-pci.c
+++ b/hw/usb/hcd-ohci-pci.c
@@ -143,6 +143,7 @@ static void ohci_pci_class_init(ObjectClass *klass, void 
*data)
  dc->hotpluggable = false;
  dc->vmsd = _ohci;
  dc->reset = usb_ohci_reset_pci;
+    klass->deprecation_note = "use qemu-xhci instead";
  }
  static const TypeInfo ohci_pci_info = {
diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig
index 84bc7fbe36cd..c4a6ea5a687f 100644
--- a/hw/usb/Kconfig
+++ b/hw/usb/Kconfig
@@ -17,7 +17,6 @@ config USB_OHCI_SYSBUS
  config USB_OHCI_PCI
  bool
-    default y if PCI_DEVICES
  depends on PCI
  select USB_OHCI


Not sure whether we should disable it by default just because it is deprecated. We 
don't do that for any other devices as far as I know.


Anyway, you should add the device to docs/about/deprecated.rst to really mark it as 
deprecated, since that's our official list (AFAIK).


Also, there are still some machines that use this device:

$ grep -r USB_OHCI_PCI *
hw/hppa/Kconfig:    imply USB_OHCI_PCI
hw/mips/Kconfig:    imply USB_OHCI_PCI
hw/ppc/Kconfig:    imply USB_OHCI_PCI
hw/ppc/Kconfig:    imply USB_OHCI_PCI

pseries could certainly continue without OHCI AFAICT, but the others? Maybe this 
needs some discussion first... (thus putting some more people on CC:)


  Thomas


The mac99 machine has an in-built OHCI PCI interface so I don't think this device 
should be marked as deprecated. Normally in these cases isn't it just a matter of 
updating documentation to recommend XHCI over OHCI for particular uses?



ATB,

Mark.

Re: [PATCH v2 00/37] target/sparc: Implement VIS4

2024-05-28 Thread Mark Cave-Ayland


On 26/05/2024 20:42, Richard Henderson wrote:


Now tested with RISU, using a Solaris M8 host as reference.
This exposed a few bugs in the existing VIS1 support as well,
so fix those before anything else.  It also exposed a few bugs
in the implementation of VIS3, so fixes squashed there as well.


r~


Richard Henderson (37):
   target/sparc: Fix ARRAY8
   target/sparc: Rewrite gen_edge
   target/sparc: Fix do_dc
   target/sparc: Fix helper_fmul8ulx16
   target/sparc: Perform DFPREG/QFPREG in decodetree
   target/sparc: Remove gen_dest_fpr_D
   target/sparc: Remove cpu_fpr[]
   target/sparc: Use gvec for VIS1 parallel add/sub
   target/sparc: Implement FMAf extension
   target/sparc: Add feature bits for VIS 3
   target/sparc: Implement ADDXC, ADDXCcc
   target/sparc: Implement CMASK instructions
   target/sparc: Implement FCHKSM16
   target/sparc: Implement FHADD, FHSUB, FNHADD, FNADD, FNMUL
   target/sparc: Implement FLCMP
   target/sparc: Implement FMEAN16
   target/sparc: Implement FPADD64, FPSUB64
   target/sparc: Implement FPADDS, FPSUBS
   target/sparc: Implement FPCMPEQ8, FPCMPNE8, FPCMPULE8, FPCMPUGT8
   target/sparc: Implement FSLL, FSRL, FSRA, FSLAS
   target/sparc: Implement LDXEFSR
   target/sparc: Implement LZCNT
   target/sparc: Implement MOVsTOw, MOVdTOx, MOVwTOs, MOVxTOd
   target/sparc: Implement PDISTN
   target/sparc: Implement UMULXHI
   target/sparc: Implement XMULX
   target/sparc: Enable VIS3 feature bit
   target/sparc: Implement IMA extension
   target/sparc: Add feature bit for VIS4
   target/sparc: Implement FALIGNDATAi
   target/sparc: Implement 8-bit FPADD, FPADDS, and FPADDUS
   target/sparc: Implement VIS4 comparisons
   target/sparc: Implement FPMIN, FPMAX
   target/sparc: Implement SUBXC, SUBXCcc
   target/sparc: Implement MWAIT
   target/sparc: Implement monitor ASIs
   target/sparc: Enable VIS4 feature bit

  target/sparc/asi.h |   4 +
  target/sparc/helper.h  |  27 +-
  target/sparc/cpu-feature.h.inc |   4 +
  target/sparc/insns.decode  | 338 
  linux-user/elfload.c   |   3 +
  target/sparc/cpu.c |  12 +
  target/sparc/fop_helper.c  | 136 +
  target/sparc/ldst_helper.c |   4 +
  target/sparc/translate.c   | 938 ++---
  target/sparc/vis_helper.c  | 392 +++---
  fpu/softfloat-specialize.c.inc |  31 ++
  11 files changed, 1558 insertions(+), 331 deletions(-)


Thanks - I'll give this series a quick run over my test images over the next few days 
just to make sure there are no regressions (unlikely as I don't have much in the way 
of current VIS test cases) and report back.



ATB,

Mark.

Re: [PATCH V1 17/26] machine: memfd-alloc option

2024-05-28 Thread Peter Xu

On Mon, Apr 29, 2024 at 08:55:26AM -0700, Steve Sistare wrote:
> Allocate anonymous memory using memfd_create if the memfd-alloc machine
> option is set.
> 
> Signed-off-by: Steve Sistare 
> ---
>  hw/core/machine.c   | 22 ++
>  include/hw/boards.h |  1 +
>  qemu-options.hx |  6 ++
>  system/memory.c |  9 ++---
>  system/physmem.c| 18 +-
>  system/trace-events |  1 +
>  6 files changed, 53 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 582c2df..9567b97 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -443,6 +443,20 @@ static void machine_set_mem_merge(Object *obj, bool 
> value, Error **errp)
>  ms->mem_merge = value;
>  }
>  
> +static bool machine_get_memfd_alloc(Object *obj, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +return ms->memfd_alloc;
> +}
> +
> +static void machine_set_memfd_alloc(Object *obj, bool value, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +ms->memfd_alloc = value;
> +}
> +
>  static bool machine_get_usb(Object *obj, Error **errp)
>  {
>  MachineState *ms = MACHINE(obj);
> @@ -1044,6 +1058,11 @@ static void machine_class_init(ObjectClass *oc, void 
> *data)
>  object_class_property_set_description(oc, "mem-merge",
>  "Enable/disable memory merge support");
>  
> +object_class_property_add_bool(oc, "memfd-alloc",
> +machine_get_memfd_alloc, machine_set_memfd_alloc);
> +object_class_property_set_description(oc, "memfd-alloc",
> +"Enable/disable allocating anonymous memory using memfd_create");
> +
>  object_class_property_add_bool(oc, "usb",
>  machine_get_usb, machine_set_usb);
>  object_class_property_set_description(oc, "usb",
> @@ -1387,6 +1406,9 @@ static bool create_default_memdev(MachineState *ms, 
> const char *path, Error **er
>  if (!object_property_set_int(obj, "size", ms->ram_size, errp)) {
>  goto out;
>  }
> +if (!object_property_set_bool(obj, "share", ms->memfd_alloc, errp)) {
> +goto out;
> +}
>  object_property_add_child(object_get_objects_root(), mc->default_ram_id,
>obj);
>  /* Ensure backend's memory region name is equal to mc->default_ram_id */
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 69c1ba4..96259c3 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -372,6 +372,7 @@ struct MachineState {
>  bool dump_guest_core;
>  bool mem_merge;
>  bool require_guest_memfd;
> +bool memfd_alloc;
>  bool usb;
>  bool usb_disabled;
>  char *firmware;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index cf61f6b..f0dfda5 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -32,6 +32,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>  "vmport=on|off|auto controls emulation of vmport 
> (default: auto)\n"
>  "dump-guest-core=on|off include guest memory in a core 
> dump (default=on)\n"
>  "mem-merge=on|off controls memory merge support 
> (default: on)\n"
> +"memfd-alloc=on|off controls allocating anonymous guest 
> RAM using memfd_create (default: off)\n"
>  "aes-key-wrap=on|off controls support for AES key 
> wrapping (default=on)\n"
>  "dea-key-wrap=on|off controls support for DEA key 
> wrapping (default=on)\n"
>  "suppress-vmdesc=on|off disables self-describing 
> migration (default=off)\n"
> @@ -79,6 +80,11 @@ SRST
>  supported by the host, de-duplicates identical memory pages
>  among VMs instances (enabled by default).
>  
> +``memfd-alloc=on|off``
> +Enables or disables allocation of anonymous guest RAM using
> +memfd_create.  Any associated memory-backend objects are created with
> +share=on.  The memfd-alloc default is off.
> +
>  ``aes-key-wrap=on|off``
>  Enables or disables AES key wrapping support on s390-ccw hosts.
>  This feature controls whether AES wrapping keys will be created
> diff --git a/system/memory.c b/system/memory.c
> index 49f1cb2..ca04a0e 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -1552,8 +1552,9 @@ bool memory_region_init_ram_nomigrate(MemoryRegion *mr,
>uint64_t size,
>Error **errp)
>  {
> +uint32_t flags = current_machine->memfd_alloc ? RAM_SHARED : 0;

If there's a machine option to "use memfd for allocations", then it's
shared mem... Hmm..

It is a bit confusing to me in quite a few levels:

  - Why memory allocation method will be defined by a machine property,
even if we have memory-backend-* which should cover everything?

  - Even if we have such a machine property, why setting "memfd" will
always imply shared?  why not private?  After all it's not called

Re: [PATCH RISU v2 00/13] ELF and Sparc64 support

2024-05-28 Thread Mark Cave-Ayland


On 26/05/2024 20:36, Richard Henderson wrote:


Let risu accept elf test files, adjusted from v1.
Adjust risugen to invoke the assembler and linker,
with a cross-compiler prefix if needed.
Add some sparc64 testing which utilizes this.

Changes for v2:
   - Implement VIS2 through VIS4.

There's something odd going on with the Sparc M8 Solaris host where
the values recorded via RISU for some floating-point operations are
incorrectly rounded, but performing the same operations with the
same inputs in a standalone test program produces correct results.

I wonder if there's some unfinished_FPop exception being generated
and the operating system emulation is producing incorrect results.
I'd be much happier if I could test this on Linux...


r~


Richard Henderson (13):
   risu: Allow use of ELF test files
   Build elf test cases instead of raw binaries
   Introduce host_context_t
   risu: Add initial sparc64 support
   risugen: Be explicit about print destinations
   risugen: Add sparc64 support
   contrib/generate_all: Do not rely on ag
   sparc64: Add a few logical insns
   sparc64: Add VIS1 instructions
   sparc64: Add VIS2 and FMAF insns
   sparc64: Add VIS3 instructions
   sparc64: Add IMA instructions
   sparc64: Add VIS4 instructions

  Makefile   |  22 ++-
  risu.h |  16 +-
  risu_reginfo_aarch64.h |   2 +
  risu_reginfo_arm.h |   2 +
  risu_reginfo_i386.h|   2 +
  risu_reginfo_loongarch64.h |   3 +
  risu_reginfo_m68k.h|   2 +
  risu_reginfo_ppc64.h   |   2 +
  risu_reginfo_s390x.h   |   2 +
  risu_reginfo_sparc64.h |  36 
  risu.c |  59 +-
  risu_aarch64.c |   6 +-
  risu_arm.c |   7 +-
  risu_i386.c|   7 +-
  risu_loongarch64.c |   6 +-
  risu_m68k.c|   6 +-
  risu_ppc64.c   |   6 +-
  risu_reginfo_loongarch64.c |   3 +-
  risu_reginfo_sparc64.c | 186 ++
  risu_s390x.c   |   5 +-
  risu_sparc64.c |  52 +
  configure  |   2 +
  contrib/generate_all.sh|   4 +-
  risugen|  10 +-
  risugen_common.pm  |  68 ++-
  risugen_sparc64.pm | 385 +
  sparc64.risu   | 298 
  test.ld|  12 ++
  test_aarch64.s |   4 +-
  test_arm.s |  16 +-
  test_i386.S|   4 +-
  test_sparc64.s | 137 +
  32 files changed, 1298 insertions(+), 74 deletions(-)
  create mode 100644 risu_reginfo_sparc64.h
  create mode 100644 risu_reginfo_sparc64.c
  create mode 100644 risu_sparc64.c
  create mode 100644 risugen_sparc64.pm
  create mode 100644 sparc64.risu
  create mode 100644 test.ld
  create mode 100644 test_sparc64.s


Nice! I don't have any experience with RISU so I don't feel too qualified to review 
the series, but obviously there are clear benefits to having SPARC support included :)



ATB,

Mark.

[PATCH v3 02/33] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB

2024-05-28 Thread Richard Henderson

No need for a full comparison; xor produces non-zero bits
for QC just fine.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/gengvec.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 22c9d17dce..bfe6885a01 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1217,21 +1217,21 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
-static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
   TCGv_vec a, TCGv_vec b)
 {
 TCGv_vec x = tcg_temp_new_vec_matching(t);
 tcg_gen_add_vec(vece, x, a, b);
 tcg_gen_usadd_vec(vece, t, a, b);
-tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
-tcg_gen_or_vec(vece, sat, sat, x);
+tcg_gen_xor_vec(vece, x, x, t);
+tcg_gen_or_vec(vece, qc, qc, x);
 }
 
 void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
 {
 static const TCGOpcode vecop_list[] = {
-INDEX_op_usadd_vec, INDEX_op_cmp_vec, INDEX_op_add_vec, 0
+INDEX_op_usadd_vec, INDEX_op_add_vec, 0
 };
 static const GVecGen4 ops[4] = {
 { .fniv = gen_uqadd_vec,
@@ -1259,21 +1259,21 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
-static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_sqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
   TCGv_vec a, TCGv_vec b)
 {
 TCGv_vec x = tcg_temp_new_vec_matching(t);
 tcg_gen_add_vec(vece, x, a, b);
 tcg_gen_ssadd_vec(vece, t, a, b);
-tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
-tcg_gen_or_vec(vece, sat, sat, x);
+tcg_gen_xor_vec(vece, x, x, t);
+tcg_gen_or_vec(vece, qc, qc, x);
 }
 
 void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
 {
 static const TCGOpcode vecop_list[] = {
-INDEX_op_ssadd_vec, INDEX_op_cmp_vec, INDEX_op_add_vec, 0
+INDEX_op_ssadd_vec, INDEX_op_add_vec, 0
 };
 static const GVecGen4 ops[4] = {
 { .fniv = gen_sqadd_vec,
@@ -1301,21 +1301,21 @@ void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
-static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_uqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
   TCGv_vec a, TCGv_vec b)
 {
 TCGv_vec x = tcg_temp_new_vec_matching(t);
 tcg_gen_sub_vec(vece, x, a, b);
 tcg_gen_ussub_vec(vece, t, a, b);
-tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
-tcg_gen_or_vec(vece, sat, sat, x);
+tcg_gen_xor_vec(vece, x, x, t);
+tcg_gen_or_vec(vece, qc, qc, x);
 }
 
 void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
 {
 static const TCGOpcode vecop_list[] = {
-INDEX_op_ussub_vec, INDEX_op_cmp_vec, INDEX_op_sub_vec, 0
+INDEX_op_ussub_vec, INDEX_op_sub_vec, 0
 };
 static const GVecGen4 ops[4] = {
 { .fniv = gen_uqsub_vec,
@@ -1343,21 +1343,21 @@ void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
-static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec sat,
+static void gen_sqsub_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
   TCGv_vec a, TCGv_vec b)
 {
 TCGv_vec x = tcg_temp_new_vec_matching(t);
 tcg_gen_sub_vec(vece, x, a, b);
 tcg_gen_sssub_vec(vece, t, a, b);
-tcg_gen_cmp_vec(TCG_COND_NE, vece, x, x, t);
-tcg_gen_or_vec(vece, sat, sat, x);
+tcg_gen_xor_vec(vece, x, x, t);
+tcg_gen_or_vec(vece, qc, qc, x);
 }
 
 void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
 {
 static const TCGOpcode vecop_list[] = {
-INDEX_op_sssub_vec, INDEX_op_cmp_vec, INDEX_op_sub_vec, 0
+INDEX_op_sssub_vec, INDEX_op_sub_vec, 0
 };
 static const GVecGen4 ops[4] = {
 { .fniv = gen_sqsub_vec,
-- 
2.34.1

[PATCH v3 10/33] target/arm: Convert SRSHL and URSHL (register) to gvec

2024-05-28 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper.h | 10 +
 target/arm/tcg/translate.h  |  4 
 target/arm/tcg/neon-dp.decode   | 10 ++---
 target/arm/tcg/gengvec.c| 22 +++
 target/arm/tcg/neon_helper.c| 38 -
 target/arm/tcg/translate-a64.c  | 17 ++-
 target/arm/tcg/translate-neon.c |  6 ++
 7 files changed, 84 insertions(+), 23 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index a14c040451..25eb7bf5df 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -327,6 +327,16 @@ DEF_HELPER_3(neon_qrshl_s32, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrshl_u64, i64, env, i64, i64)
 DEF_HELPER_3(neon_qrshl_s64, i64, env, i64, i64)
 
+DEF_HELPER_FLAGS_4(gvec_srshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_srshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_srshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_urshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_urshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_urshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_urshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_2(neon_add_u8, i32, i32, i32)
 DEF_HELPER_2(neon_add_u16, i32, i32, i32)
 DEF_HELPER_2(neon_sub_u8, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 87439dcc61..ea63ffc47b 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -459,6 +459,10 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_ushl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/neon-dp.decode b/target/arm/tcg/neon-dp.decode
index fd3a01bfa0..8525c65c0d 100644
--- a/target/arm/tcg/neon-dp.decode
+++ b/target/arm/tcg/neon-dp.decode
@@ -117,14 +117,8 @@ VSHL_U_3s 001 1 0 . ..   0100 . . . 0 
 @3same_rev
   VQSHL_U64_3s    001 1 0 . ..   0100 . . . 1  @3same_64_rev
   VQSHL_U_3s  001 1 0 . ..   0100 . . . 1  @3same_rev
 }
-{
-  VRSHL_S64_3s    001 0 0 . ..   0101 . . . 0  @3same_64_rev
-  VRSHL_S_3s  001 0 0 . ..   0101 . . . 0  @3same_rev
-}
-{
-  VRSHL_U64_3s    001 1 0 . ..   0101 . . . 0  @3same_64_rev
-  VRSHL_U_3s  001 1 0 . ..   0101 . . . 0  @3same_rev
-}
+VRSHL_S_3s    001 0 0 . ..   0101 . . . 0  @3same_rev
+VRSHL_U_3s    001 1 0 . ..   0101 . . . 0  @3same_rev
 {
   VQRSHL_S64_3s   001 0 0 . ..   0101 . . . 1  @3same_64_rev
   VQRSHL_S_3s 001 0 0 . ..   0101 . . . 1  @3same_rev
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 740f3f864e..216a9f81e3 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1218,6 +1218,28 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
+void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3 * const fns[] = {
+gen_helper_gvec_srshl_b, gen_helper_gvec_srshl_h,
+gen_helper_gvec_srshl_s, gen_helper_gvec_srshl_d,
+};
+tcg_debug_assert(vece <= MO_64);
+tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3 * const fns[] = {
+gen_helper_gvec_urshl_b, gen_helper_gvec_urshl_h,
+gen_helper_gvec_urshl_s, gen_helper_gvec_urshl_d,
+};
+tcg_debug_assert(vece <= MO_64);
+tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
+
 void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp 
esz)
 {
 uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index 0af15e9f6e..516ecc1dcb 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -6,10 +6,11 @@

[PATCH v3 01/33] target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI

2024-05-28 Thread Richard Henderson

For all, rm == 15 is UNPREDICTABLE.
Prior to v8, thumb with rm == 13 is UNPREDICTABLE.
For PLDW, rn == 15 is UNPREDICTABLE.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a32-uncond.decode |  8 +++--
 target/arm/tcg/t32.decode|  7 ++--
 target/arm/tcg/translate.c   | 58 
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/target/arm/tcg/a32-uncond.decode b/target/arm/tcg/a32-uncond.decode
index 2339de2e94..e1b1780d37 100644
--- a/target/arm/tcg/a32-uncond.decode
+++ b/target/arm/tcg/a32-uncond.decode
@@ -24,7 +24,9 @@
 
!extern
!extern imm
+   !extern rm
   E
+  rn rm
 
 # Branch with Link and Exchange
 
@@ -61,9 +63,9 @@ PLD   0101 -101     # 
(imm, lit) 5te
 PLDW  0101 -001     # (imm, lit) 7mp
 PLI   0100 -101     # (imm, lit) 7
 
-PLD   0111 -101   - -- 0    # (register) 5te
-PLDW  0111 -001   - -- 0    # (register) 7mp
-PLI   0110 -101   - -- 0    # (register) 7
+PLD_rr    0111 -101   - -- 0 rm:4   
+PLDW_rr   0111 -001 rn:4  - -- 0 rm:4   
+PLI_rr    0110 -101   - -- 0 rm:4   
 
 # Unallocated memory hints
 #
diff --git a/target/arm/tcg/t32.decode b/target/arm/tcg/t32.decode
index d327178829..1ec12442a4 100644
--- a/target/arm/tcg/t32.decode
+++ b/target/arm/tcg/t32.decode
@@ -28,6 +28,7 @@
 _rot !extern rd rn rm rot
  !extern rd rn rm
   !extern rd rm
+  !extern rn rm
   !extern rd imm
!extern rm
!extern imm
@@ -472,7 +473,7 @@ STR_ri    1000 1100     
  @ldst_ri_pos
   }
   LDRBT_ri    1000 0001   1110    @ldst_ri_unp
   {
-PLD   1000 0001   00 --   # (register)
+PLD_rr    1000 0001   00 -- rm:4  
 LDRB_rr   1000 0001   00 ..   @ldst_rr
   }
 }
@@ -492,7 +493,7 @@ STR_ri    1000 1100     
  @ldst_ri_pos
   }
   LDRHT_ri    1000 0011   1110    @ldst_ri_unp
   {
-PLDW  1000 0011   00 --   # (register)
+PLDW_rr   1000 0011 rn:4  00 -- rm:4  
 LDRH_rr   1000 0011   00 ..   @ldst_rr
   }
 }
@@ -520,7 +521,7 @@ STR_ri    1000 1100     
  @ldst_ri_pos
   }
   LDRSBT_ri   1001 0001   1110    @ldst_ri_unp
   {
-PLI   1001 0001   00 --   # (register)
+PLI_rr    1001 0001   00 -- rm:4  
 LDRSB_rr  1001 0001   00 ..   @ldst_rr
   }
 }
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index c5bc691d92..16b8609ec0 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -7187,6 +7187,64 @@ static bool trans_PLI(DisasContext *s, arg_PLI *a)
 return ENABLE_ARCH_7;
 }
 
+/* Check for UNPREDICTABLE rm for prefetch (register). */
+static bool prefetch_check_m(DisasContext *s, int rm)
+{
+switch (rm) {
+case 13:
+/* SP allowed in v8 or with A1 encoding; rejected with T1. */
+return ENABLE_ARCH_8 || !s->thumb;
+case 15:
+/* PC always rejected. */
+return false;
+default:
+return true;
+}
+}
+
+static bool trans_PLD_rr(DisasContext *s, arg_PLD_rr *a)
+{
+if (!ENABLE_ARCH_5TE) {
+return false;
+}
+/* Choose UNDEF for UNPREDICTABLE rm. */
+if (!prefetch_check_m(s, a->rm)) {
+unallocated_encoding(s);
+}
+return true;
+}
+
+static bool trans_PLDW_rr(DisasContext *s, arg_PLDW_rr *a)
+{
+if (!arm_dc_feature(s, ARM_FEATURE_V7MP)) {
+return false;
+}
+/*
+ * For A1, rn == 15 is UNPREDICTABLE.
+ * For T1, rn == 15 is PLD (literal), and already matched.
+ * Choose UNDEF for UNPREDICTABLE rn or rm.
+ */
+if (a->rn == 15) {
+assert(!s->thumb);
+} else if (prefetch_check_m(s, a->rm)) {
+return true;
+}
+unallocated_encoding(s);
+return true;
+}
+
+static bool trans_PLI_rr(DisasContext *s, arg_PLI_rr *a)
+{
+if (!ENABLE_ARCH_7) {
+return false;
+}
+/* Choose UNDEF for UNPREDICTABLE rm. */
+if (!prefetch_check_m(s, a->rm)) {
+unallocated_encoding(s);
+}
+return true;
+}
+
 /*
  * If-then
  */
-- 
2.34.1

[PATCH v3 30/33] target/arm: Tidy SQDMULH, SQRDMULH (vector)

2024-05-28 Thread Richard Henderson

We already have a gvec helper for the operations, but we aren't
using it on the aa32 neon side.  Create a unified expander for
use by both aa32 and aa64 translators.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate.h  |  4 
 target/arm/tcg/gengvec.c| 20 
 target/arm/tcg/translate-a64.c  | 23 ---
 target/arm/tcg/translate-neon.c | 23 +++
 4 files changed, 31 insertions(+), 39 deletions(-)

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 3b1e68b779..aba21f730f 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -539,6 +539,10 @@ void gen_gvec_sri(unsigned vece, uint32_t rd_ofs, uint32_t 
rm_ofs,
 void gen_gvec_sli(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
   int64_t shift, uint32_t opr_sz, uint32_t max_sz);
 
+void gen_gvec_sqdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_sqrdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+  uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_sqrdmlah_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
   uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_sqrdmlsh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 119826bf28..56a1dc1f75 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -35,6 +35,26 @@ static void gen_gvec_fn3_qc(uint32_t rd_ofs, uint32_t 
rn_ofs, uint32_t rm_ofs,
opr_sz, max_sz, 0, fn);
 }
 
+void gen_gvec_sqdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[2] = {
+gen_helper_neon_sqdmulh_h, gen_helper_neon_sqdmulh_s
+};
+tcg_debug_assert(vece >= 1 && vece <= 2);
+gen_gvec_fn3_qc(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, fns[vece - 1]);
+}
+
+void gen_gvec_sqrdmulh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[2] = {
+gen_helper_neon_sqrdmulh_h, gen_helper_neon_sqrdmulh_s
+};
+tcg_debug_assert(vece >= 1 && vece <= 2);
+gen_gvec_fn3_qc(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, fns[vece - 1]);
+}
+
 void gen_gvec_sqrdmlah_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
   uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
 {
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c4601cde2f..c673b95ec7 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -724,19 +724,6 @@ static void gen_gvec_op3_fpst(DisasContext *s, bool is_q, 
int rd, int rn,
is_q ? 16 : 8, vec_full_reg_size(s), data, fn);
 }
 
-/* Expand a 3-operand + qc + operation using an out-of-line helper.  */
-static void gen_gvec_op3_qc(DisasContext *s, bool is_q, int rd, int rn,
-int rm, gen_helper_gvec_3_ptr *fn)
-{
-TCGv_ptr qc_ptr = tcg_temp_new_ptr();
-
-tcg_gen_addi_ptr(qc_ptr, tcg_env, offsetof(CPUARMState, vfp.qc));
-tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, rd),
-   vec_full_reg_offset(s, rn),
-   vec_full_reg_offset(s, rm), qc_ptr,
-   is_q ? 16 : 8, vec_full_reg_size(s), 0, fn);
-}
-
 /* Expand a 4-operand operation using an out-of-line helper.  */
 static void gen_gvec_op4_ool(DisasContext *s, bool is_q, int rd, int rn,
  int rm, int ra, int data, gen_helper_gvec_4 *fn)
@@ -11007,12 +10994,10 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 
 switch (opcode) {
 case 0x16: /* SQDMULH, SQRDMULH */
-{
-static gen_helper_gvec_3_ptr * const fns[2][2] = {
-{ gen_helper_neon_sqdmulh_h, gen_helper_neon_sqrdmulh_h },
-{ gen_helper_neon_sqdmulh_s, gen_helper_neon_sqrdmulh_s },
-};
-gen_gvec_op3_qc(s, is_q, rd, rn, rm, fns[size - 1][u]);
+if (u) {
+gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqrdmulh_qc, size);
+} else {
+gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sqdmulh_qc, size);
 }
 return;
 }
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index f9a8753906..915c9e56db 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -937,28 +937,11 @@ DO_SHA2(SHA256SU1, gen_helper_crypto_sha256su1)
 }
 
 #define DO_3SAME_VQDMULH(INSN, FUNC)\
-WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16);\
-

[PATCH v3 22/33] target/arm: Convert SHSUB, UHSUB to gvec

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h |   6 --
 target/arm/tcg/translate.h  |   4 +
 target/arm/tcg/gengvec.c| 144 
 target/arm/tcg/neon_helper.c|  27 --
 target/arm/tcg/translate-a64.c  |  17 ++--
 target/arm/tcg/translate-neon.c |   4 +-
 6 files changed, 157 insertions(+), 45 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index b26bfcb079..b95f24ed0a 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -274,12 +274,6 @@ DEF_HELPER_2(neon_rhadd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_rhadd_u16, i32, i32, i32)
 DEF_HELPER_2(neon_rhadd_s32, s32, s32, s32)
 DEF_HELPER_2(neon_rhadd_u32, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_s8, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_u8, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_s16, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_u16, i32, i32, i32)
-DEF_HELPER_2(neon_hsub_s32, s32, s32, s32)
-DEF_HELPER_2(neon_hsub_u32, i32, i32, i32)
 
 DEF_HELPER_2(neon_pmin_u8, i32, i32, i32)
 DEF_HELPER_2(neon_pmin_s8, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index dd99d76bf2..315e0afd04 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -476,6 +476,10 @@ void gen_gvec_shadd(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_shsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index c0627a787b..c46365c3a6 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -2005,3 +2005,147 @@ void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_debug_assert(vece <= MO_32);
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
+
+static void gen_shsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_andc_i64(t, b, a);
+tcg_gen_vec_sar8i_i64(a, a, 1);
+tcg_gen_vec_sar8i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+tcg_gen_vec_sub8_i64(d, a, b);
+tcg_gen_vec_sub8_i64(d, d, t);
+}
+
+static void gen_shsub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_andc_i64(t, b, a);
+tcg_gen_vec_sar16i_i64(a, a, 1);
+tcg_gen_vec_sar16i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+tcg_gen_vec_sub16_i64(d, a, b);
+tcg_gen_vec_sub16_i64(d, d, t);
+}
+
+static void gen_shsub_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+TCGv_i32 t = tcg_temp_new_i32();
+
+tcg_gen_andc_i32(t, b, a);
+tcg_gen_sari_i32(a, a, 1);
+tcg_gen_sari_i32(b, b, 1);
+tcg_gen_andi_i32(t, t, 1);
+tcg_gen_sub_i32(d, a, b);
+tcg_gen_sub_i32(d, d, t);
+}
+
+static void gen_shsub_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+tcg_gen_andc_vec(vece, t, b, a);
+tcg_gen_sari_vec(vece, a, a, 1);
+tcg_gen_sari_vec(vece, b, b, 1);
+tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+tcg_gen_sub_vec(vece, d, a, b);
+tcg_gen_sub_vec(vece, d, d, t);
+}
+
+void gen_gvec_shsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_sari_vec, INDEX_op_sub_vec, 0
+};
+static const GVecGen3 g[4] = {
+{ .fni8 = gen_shsub8_i64,
+  .fniv = gen_shsub_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_8 },
+{ .fni8 = gen_shsub16_i64,
+  .fniv = gen_shsub_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_16 },
+{ .fni4 = gen_shsub_i32,
+  .fniv = gen_shsub_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_32 },
+};
+assert(vece <= MO_32);
+tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
+}
+
+static void gen_uhsub8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_andc_i64(t, b, a);
+tcg_gen_vec_shr8i_i64(a, a, 1);
+tcg_gen_vec_shr8i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+tcg_gen_vec_sub8_i64(d, a, b);
+tcg_gen_vec_sub8_i64(d, d, t);
+}
+
+static void gen_uhsub16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_andc_i64(t, b, a);
+

[PATCH v3 11/33] target/arm: Convert SRSHL, URSHL to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  4 
 target/arm/tcg/translate-a64.c | 22 +++---
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index ea897d6732..9e02776036 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -758,6 +758,8 @@ USQADD_s0111 1110 ..1 0 00111 0 . . 
@r2r_e
 
 SSHL_s  0101 1110 111 . 01000 1 . . @rrr_d
 USHL_s  0111 1110 111 . 01000 1 . . @rrr_d
+SRSHL_s 0101 1110 111 . 01010 1 . . @rrr_d
+URSHL_s 0111 1110 111 . 01010 1 . . @rrr_d
 
 ### Advanced SIMD scalar pairwise
 
@@ -882,6 +884,8 @@ USQADD_v0.10 1110 ..1 0 00111 0 . . 
@qr2r_e
 
 SSHL_v  0.00 1110 ..1 . 01000 1 . . @qrrr_e
 USHL_v  0.10 1110 ..1 . 01000 1 . . @qrrr_e
+SRSHL_v 0.00 1110 ..1 . 01010 1 . . @qrrr_e
+URSHL_v 0.10 1110 ..1 . 01010 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 7e981f8d01..c751da78ef 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5116,6 +5116,8 @@ static bool do_int3_scalar_d(DisasContext *s, arg_rrr_e 
*a,
 
 TRANS(SSHL_s, do_int3_scalar_d, a, gen_sshl_i64)
 TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
+TRANS(SRSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_s64)
+TRANS(URSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_u64)
 
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
@@ -5364,6 +5366,8 @@ TRANS(USQADD_v, do_gvec_fn3, a, gen_gvec_usqadd_qc)
 
 TRANS(SSHL_v, do_gvec_fn3, a, gen_gvec_sshl)
 TRANS(USHL_v, do_gvec_fn3, a, gen_gvec_ushl)
+TRANS(SRSHL_v, do_gvec_fn3, a, gen_gvec_srshl)
+TRANS(URSHL_v, do_gvec_fn3, a, gen_gvec_urshl)
 
 
 /*
@@ -9384,13 +9388,6 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 gen_helper_neon_qshl_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
 }
 break;
-case 0xa: /* SRSHL, URSHL */
-if (u) {
-gen_helper_neon_rshl_u64(tcg_rd, tcg_rn, tcg_rm);
-} else {
-gen_helper_neon_rshl_s64(tcg_rd, tcg_rn, tcg_rm);
-}
-break;
 case 0xb: /* SQRSHL, UQRSHL */
 if (u) {
 gen_helper_neon_qrshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
@@ -9409,6 +9406,7 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 case 0x1: /* SQADD / UQADD */
 case 0x5: /* SQSUB / UQSUB */
 case 0x8: /* SSHL, USHL */
+case 0xa: /* SRSHL, URSHL */
 g_assert_not_reached();
 }
 }
@@ -9433,7 +9431,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x9: /* SQSHL, UQSHL */
 case 0xb: /* SQRSHL, UQRSHL */
 break;
-case 0xa: /* SRSHL, URSHL */
 case 0x6: /* CMGT, CMHI */
 case 0x7: /* CMGE, CMHS */
 case 0x11: /* CMTST, CMEQ */
@@ -9453,6 +9450,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x1: /* SQADD, UQADD */
 case 0x5: /* SQSUB, UQSUB */
 case 0x8: /* SSHL, USHL */
+case 0xa: /* SRSHL, URSHL */
 unallocated_encoding(s);
 return;
 }
@@ -10929,6 +10927,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 case 0x01: /* SQADD, UQADD */
 case 0x05: /* SQSUB, UQSUB */
 case 0x08: /* SSHL, USHL */
+case 0x0a: /* SRSHL, URSHL */
 unallocated_encoding(s);
 return;
 }
@@ -10938,13 +10937,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x0a: /* SRSHL, URSHL */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_urshl, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_srshl, size);
-}
-return;
 case 0x0c: /* SMAX, UMAX */
 if (u) {
 gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
-- 
2.34.1

[PATCH v3 31/33] target/arm: Convert SQDMULH, SQRDMULH to decodetree

2024-05-28 Thread Richard Henderson

These are the last instructions within disas_simd_three_reg_same
and disas_simd_scalar_three_reg_same, so remove them.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  10 ++
 target/arm/tcg/a64.decode  |  18 +++
 target/arm/tcg/translate-a64.c | 276 ++---
 target/arm/tcg/vec_helper.c|  64 
 4 files changed, 172 insertions(+), 196 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 85f9302563..24feecee9b 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -968,6 +968,16 @@ DEF_HELPER_FLAGS_5(neon_sqrdmulh_h, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(neon_sqrdmulh_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(neon_sqdmulh_idx_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqdmulh_idx_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(neon_sqrdmulh_idx_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve2_sqdmulh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve2_sqdmulh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve2_sqdmulh_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2dea68a0a9..f7f897f9fc 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -774,6 +774,9 @@ CMHS_s  0111 1110 111 . 00111 1 . . 
@rrr_d
 CMTST_s 0101 1110 111 . 10001 1 . . @rrr_d
 CMEQ_s  0111 1110 111 . 10001 1 . . @rrr_d
 
+SQDMULH_s   0101 1110 ..1 . 10110 1 . . @rrr_e
+SQRDMULH_s  0111 1110 ..1 . 10110 1 . . @rrr_e
+
 ### Advanced SIMD scalar pairwise
 
 FADDP_s 0101 1110 0011  1101 10 . . @rr_h
@@ -931,6 +934,9 @@ PMUL_v  0.10 1110 001 . 10011 1 . . 
@qrrr_b
 MLA_v   0.00 1110 ..1 . 10010 1 . . @qrrr_e
 MLS_v   0.10 1110 ..1 . 10010 1 . . @qrrr_e
 
+SQDMULH_v   0.00 1110 ..1 . 10110 1 . . @qrrr_e
+SQRDMULH_v  0.10 1110 ..1 . 10110 1 . . @qrrr_e
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
@@ -949,6 +955,12 @@ FMULX_si0111  00 ..  1001 . 0 . .  
 @rrx_h
 FMULX_si0111  10 . . 1001 . 0 . .   @rrx_s
 FMULX_si0111  11 0 . 1001 . 0 . .   @rrx_d
 
+SQDMULH_si  0101  01 ..  1100 . 0 . .   @rrx_h
+SQDMULH_si  0101  10 ..  1100 . 0 . .   @rrx_s
+
+SQRDMULH_si 0101  01 ..  1101 . 0 . .   @rrx_h
+SQRDMULH_si 0101  10 . . 1101 . 0 . .   @rrx_s
+
 ### Advanced SIMD vector x indexed element
 
 FMUL_vi 0.00  00 ..  1001 . 0 . .   @qrrx_h
@@ -980,3 +992,9 @@ MLA_vi  0.10  10 . .  . 0 . .   
@qrrx_s
 
 MLS_vi  0.10  01 ..  0100 . 0 . .   @qrrx_h
 MLS_vi  0.10  10 . . 0100 . 0 . .   @qrrx_s
+
+SQDMULH_vi  0.00  01 ..  1100 . 0 . .   @qrrx_h
+SQDMULH_vi  0.00  10 . . 1100 . 0 . .   @qrrx_s
+
+SQRDMULH_vi 0.00  01 ..  1101 . 0 . .   @qrrx_h
+SQRDMULH_vi 0.00  10 . . 1101 . 0 . .   @qrrx_s
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c673b95ec7..14226c56cf 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1350,6 +1350,14 @@ static bool do_gvec_fn3_no64(DisasContext *s, arg_qrrr_e 
*a, GVecGen3Fn *fn)
 return true;
 }
 
+static bool do_gvec_fn3_no8_no64(DisasContext *s, arg_qrrr_e *a, GVecGen3Fn 
*fn)
+{
+if (a->esz == MO_8) {
+return false;
+}
+return do_gvec_fn3_no64(s, a, fn);
+}
+
 static bool do_gvec_fn4(DisasContext *s, arg_q_e *a, GVecGen4Fn *fn)
 {
 if (!a->q && a->esz == MO_64) {
@@ -5167,6 +5175,25 @@ static const ENVScalar2 f_scalar_uqrshl = {
 };
 TRANS(UQRSHL_s, do_env_scalar2, a, _scalar_uqrshl)
 
+static bool do_env_scalar2_hs(DisasContext *s, arg_rrr_e *a,
+  const ENVScalar2 *f)
+{
+if (a->esz == MO_16 || a->esz == MO_32) {
+return do_env_scalar2(s, a, f);
+}
+return false;
+}
+
+static const ENVScalar2 f_scalar_sqdmulh = {
+{ NULL, gen_helper_neon_qdmulh_s16, gen_helper_neon_qdmulh_s32 }
+};
+TRANS(SQDMULH_s, do_env_scalar2_hs, a, _scalar_sqdmulh)
+
+static const ENVScalar2 f_scalar_sqrdmulh = {
+{ NULL, gen_helper_neon_qrdmulh_s16, gen_helper_neon_qrdmulh_s32 }
+};
+TRANS(SQRDMULH_s, do_env_scalar2_hs, a, _scalar_sqrdmulh)
+
 static bool do_cmop_d(DisasContext *s,

[PATCH v3 19/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/gengvec.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index e64ca02e0c..2451d23823 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -944,9 +944,7 @@ void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 
 static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
 {
-tcg_gen_and_vec(vece, d, a, b);
-tcg_gen_dupi_vec(vece, a, 0);
-tcg_gen_cmp_vec(TCG_COND_NE, vece, d, d, a);
+tcg_gen_cmp_vec(TCG_COND_TSTNE, vece, d, a, b);
 }
 
 void gen_gvec_cmtst(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
-- 
2.34.1

[PATCH v3 33/33] target/arm: Convert FCSEL to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |   4 ++
 target/arm/tcg/translate-a64.c | 108 ++---
 2 files changed, 49 insertions(+), 63 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 6f6cd805b7..5dadbc74d7 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -1000,6 +1000,10 @@ SQDMULH_vi  0.00  10 . . 1100 . 0 . 
.   @qrrx_s
 SQRDMULH_vi 0.00  01 ..  1101 . 0 . .   @qrrx_h
 SQRDMULH_vi 0.00  10 . . 1101 . 0 . .   @qrrx_s
 
+# Floating-point conditional select
+
+FCSEL   0001 1110 .. 1 rm:5 cond:4 11 rn:5 rd:5 esz=%esz_hsd
+
 # Floating-point data-processing (3 source)
 
 @_hsd     .. . rm:5  . ra:5  rn:5  rd:5 _e 
esz=%esz_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 78a2e6d692..f1dea5834c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5866,6 +5866,50 @@ static bool trans_ADDP_s(DisasContext *s, arg_rr_e *a)
 return true;
 }
 
+/*
+ * Floating-point conditional select
+ */
+
+static bool trans_FCSEL(DisasContext *s, arg_FCSEL *a)
+{
+TCGv_i64 t_true, t_false;
+DisasCompare64 c;
+
+switch (a->esz) {
+case MO_32:
+case MO_64:
+break;
+case MO_16:
+if (!dc_isar_feature(aa64_fp16, s)) {
+return false;
+}
+break;
+default:
+return false;
+}
+
+if (!fp_access_check(s)) {
+return true;
+}
+
+/* Zero extend sreg & hreg inputs to 64 bits now.  */
+t_true = tcg_temp_new_i64();
+t_false = tcg_temp_new_i64();
+read_vec_element(s, t_true, a->rn, 0, a->esz);
+read_vec_element(s, t_false, a->rm, 0, a->esz);
+
+a64_test_cc(, a->cond);
+tcg_gen_movcond_i64(c.cond, t_true, c.value, tcg_constant_i64(0),
+t_true, t_false);
+
+/*
+ * Note that sregs & hregs write back zeros to the high bits,
+ * and we've already done the zero-extension.
+ */
+write_fp_dreg(s, a->rd, t_true);
+return true;
+}
+
 /*
  * Floating-point data-processing (3 source)
  */
@@ -7332,68 +7376,6 @@ static void disas_fp_ccomp(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* Floating point conditional select
- *   31  30  29 28   24 23  22  21 20  16 15  12 11 10 95 40
- * +---+---+---+---+--+---+--+--+-+--+--+
- * | M | 0 | S | 1 1 1 1 0 | type | 1 |  Rm  | cond | 1 1 |  Rn  |  Rd  |
- * +---+---+---+---+--+---+--+--+-+--+--+
- */
-static void disas_fp_csel(DisasContext *s, uint32_t insn)
-{
-unsigned int mos, type, rm, cond, rn, rd;
-TCGv_i64 t_true, t_false;
-DisasCompare64 c;
-MemOp sz;
-
-mos = extract32(insn, 29, 3);
-type = extract32(insn, 22, 2);
-rm = extract32(insn, 16, 5);
-cond = extract32(insn, 12, 4);
-rn = extract32(insn, 5, 5);
-rd = extract32(insn, 0, 5);
-
-if (mos) {
-unallocated_encoding(s);
-return;
-}
-
-switch (type) {
-case 0:
-sz = MO_32;
-break;
-case 1:
-sz = MO_64;
-break;
-case 3:
-sz = MO_16;
-if (dc_isar_feature(aa64_fp16, s)) {
-break;
-}
-/* fallthru */
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-/* Zero extend sreg & hreg inputs to 64 bits now.  */
-t_true = tcg_temp_new_i64();
-t_false = tcg_temp_new_i64();
-read_vec_element(s, t_true, rn, 0, sz);
-read_vec_element(s, t_false, rm, 0, sz);
-
-a64_test_cc(, cond);
-tcg_gen_movcond_i64(c.cond, t_true, c.value, tcg_constant_i64(0),
-t_true, t_false);
-
-/* Note that sregs & hregs write back zeros to the high bits,
-   and we've already done the zero-extension.  */
-write_fp_dreg(s, rd, t_true);
-}
-
 /* Floating-point data-processing (1 source) - half precision */
 static void handle_fp_1src_half(DisasContext *s, int opcode, int rd, int rn)
 {
@@ -8207,7 +8189,7 @@ static void disas_data_proc_fp(DisasContext *s, uint32_t 
insn)
 break;
 case 3:
 /* Floating point conditional select */
-disas_fp_csel(s, insn);
+unallocated_encoding(s); /* in decodetree */
 break;
 case 0:
 switch (ctz32(extract32(insn, 12, 4))) {
-- 
2.34.1

[PATCH v3 32/33] target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB to decodetree

2024-05-28 Thread Richard Henderson

These are the only instructions in the 3 source scalar class.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  10 ++
 target/arm/tcg/translate-a64.c | 231 -
 2 files changed, 93 insertions(+), 148 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index f7f897f9fc..6f6cd805b7 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -32,6 +32,7 @@
 _e   rd rn esz
 _e  rd rn rm esz
 _e  rd rn rm idx esz
+_e rd rn rm ra esz
 _e  q rd rn esz
 _e q rd rn rm esz
 _e q rd rn rm idx esz
@@ -998,3 +999,12 @@ SQDMULH_vi  0.00  10 . . 1100 . 0 . .  
 @qrrx_s
 
 SQRDMULH_vi 0.00  01 ..  1101 . 0 . .   @qrrx_h
 SQRDMULH_vi 0.00  10 . . 1101 . 0 . .   @qrrx_s
+
+# Floating-point data-processing (3 source)
+
+@_hsd     .. . rm:5  . ra:5  rn:5  rd:5 _e 
esz=%esz_hsd
+
+FMADD   0001  .. 0 . 0 . . .@_hsd
+FMSUB   0001  .. 0 . 1 . . .@_hsd
+FNMADD  0001  .. 1 . 0 . . .@_hsd
+FNMSUB  0001  .. 1 . 1 . . .@_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 14226c56cf..78a2e6d692 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5866,6 +5866,88 @@ static bool trans_ADDP_s(DisasContext *s, arg_rr_e *a)
 return true;
 }
 
+/*
+ * Floating-point data-processing (3 source)
+ */
+
+static bool do_fmadd(DisasContext *s, arg__e *a, bool neg_a, bool neg_n)
+{
+TCGv_ptr fpst;
+
+/*
+ * These are fused multiply-add.  Note that doing the negations here
+ * as separate steps is correct: an input NaN should come out with
+ * its sign bit flipped if it is a negated-input.
+ */
+switch (a->esz) {
+case MO_64:
+if (fp_access_check(s)) {
+TCGv_i64 tn = read_fp_dreg(s, a->rn);
+TCGv_i64 tm = read_fp_dreg(s, a->rm);
+TCGv_i64 ta = read_fp_dreg(s, a->ra);
+
+if (neg_a) {
+gen_vfp_negd(ta, ta);
+}
+if (neg_n) {
+gen_vfp_negd(tn, tn);
+}
+fpst = fpstatus_ptr(FPST_FPCR);
+gen_helper_vfp_muladdd(ta, tn, tm, ta, fpst);
+write_fp_dreg(s, a->rd, ta);
+}
+break;
+
+case MO_32:
+if (fp_access_check(s)) {
+TCGv_i32 tn = read_fp_sreg(s, a->rn);
+TCGv_i32 tm = read_fp_sreg(s, a->rm);
+TCGv_i32 ta = read_fp_sreg(s, a->ra);
+
+if (neg_a) {
+gen_vfp_negs(ta, ta);
+}
+if (neg_n) {
+gen_vfp_negs(tn, tn);
+}
+fpst = fpstatus_ptr(FPST_FPCR);
+gen_helper_vfp_muladds(ta, tn, tm, ta, fpst);
+write_fp_sreg(s, a->rd, ta);
+}
+break;
+
+case MO_16:
+if (!dc_isar_feature(aa64_fp16, s)) {
+return false;
+}
+if (fp_access_check(s)) {
+TCGv_i32 tn = read_fp_hreg(s, a->rn);
+TCGv_i32 tm = read_fp_hreg(s, a->rm);
+TCGv_i32 ta = read_fp_hreg(s, a->ra);
+
+if (neg_a) {
+gen_vfp_negh(ta, ta);
+}
+if (neg_n) {
+gen_vfp_negh(tn, tn);
+}
+fpst = fpstatus_ptr(FPST_FPCR_F16);
+gen_helper_advsimd_muladdh(ta, tn, tm, ta, fpst);
+write_fp_sreg(s, a->rd, ta);
+}
+break;
+
+default:
+return false;
+}
+return true;
+}
+
+TRANS(FMADD, do_fmadd, a, false, false)
+TRANS(FNMADD, do_fmadd, a, true, true)
+TRANS(FMSUB, do_fmadd, a, false, true)
+TRANS(FNMSUB, do_fmadd, a, true, false)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -7665,152 +7747,6 @@ static void disas_fp_1src(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* Floating-point data-processing (3 source) - single precision */
-static void handle_fp_3src_single(DisasContext *s, bool o0, bool o1,
-  int rd, int rn, int rm, int ra)
-{
-TCGv_i32 tcg_op1, tcg_op2, tcg_op3;
-TCGv_i32 tcg_res = tcg_temp_new_i32();
-TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
-
-tcg_op1 = read_fp_sreg(s, rn);
-tcg_op2 = read_fp_sreg(s, rm);
-tcg_op3 = read_fp_sreg(s, ra);
-
-/* These are fused multiply-add, and must be done as one
- * floating point operation with no rounding between the
- * multiplication and addition steps.
- * NB that doing the negations here as separate steps is
- * correct : an input NaN should come out with its sign bit
- * flipped if it is a

[PATCH v3 26/33] target/arm: Convert SMAX, SMIN, UMAX, UMIN to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  4 
 target/arm/tcg/translate-a64.c | 22 ++
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 1c448b4f7c..bc98963bc5 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -918,6 +918,10 @@ SHSUB_v 0.00 1110 ..1 . 00100 1 . . 
@qrrr_e
 UHSUB_v 0.10 1110 ..1 . 00100 1 . . @qrrr_e
 SRHADD_v0.00 1110 ..1 . 00010 1 . . @qrrr_e
 URHADD_v0.10 1110 ..1 . 00010 1 . . @qrrr_e
+SMAX_v  0.00 1110 ..1 . 01100 1 . . @qrrr_e
+UMAX_v  0.10 1110 ..1 . 01100 1 . . @qrrr_e
+SMIN_v  0.00 1110 ..1 . 01101 1 . . @qrrr_e
+UMIN_v  0.10 1110 ..1 . 01101 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9ef5de6755..db6f59df17 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5460,6 +5460,10 @@ TRANS(SHSUB_v, do_gvec_fn3_no64, a, gen_gvec_shsub)
 TRANS(UHSUB_v, do_gvec_fn3_no64, a, gen_gvec_uhsub)
 TRANS(SRHADD_v, do_gvec_fn3_no64, a, gen_gvec_srhadd)
 TRANS(URHADD_v, do_gvec_fn3_no64, a, gen_gvec_urhadd)
+TRANS(SMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smax)
+TRANS(UMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umax)
+TRANS(SMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smin)
+TRANS(UMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umin)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -10925,8 +10929,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 return;
 }
 /* fall through */
-case 0xc: /* SMAX, UMAX */
-case 0xd: /* SMIN, UMIN */
 case 0xe: /* SABD, UABD */
 case 0xf: /* SABA, UABA */
 case 0x12: /* MLA, MLS */
@@ -10959,6 +10961,8 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 case 0x09: /* SQSHL, UQSHL */
 case 0x0a: /* SRSHL, URSHL */
 case 0x0b: /* SQRSHL, UQRSHL */
+case 0x0c: /* SMAX, UMAX */
+case 0x0d: /* SMIN, UMIN */
 case 0x10: /* ADD, SUB */
 case 0x11: /* CMTST, CMEQ */
 unallocated_encoding(s);
@@ -10970,20 +10974,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x0c: /* SMAX, UMAX */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smax, size);
-}
-return;
-case 0x0d: /* SMIN, UMIN */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umin, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_smin, size);
-}
-return;
 case 0xe: /* SABD, UABD */
 if (u) {
 gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uabd, size);
-- 
2.34.1

[PATCH v3 12/33] target/arm: Convert SQSHL and UQSHL (register) to gvec

2024-05-28 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper.h |  8 
 target/arm/tcg/translate.h  |  4 
 target/arm/tcg/neon-dp.decode   | 10 ++---
 target/arm/tcg/gengvec.c| 24 ++
 target/arm/tcg/neon_helper.c| 36 +
 target/arm/tcg/translate-a64.c  | 17 +++-
 target/arm/tcg/translate-neon.c |  6 ++
 7 files changed, 83 insertions(+), 22 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 25eb7bf5df..f345087ddb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -326,6 +326,14 @@ DEF_HELPER_3(neon_qrshl_u32, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrshl_s32, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrshl_u64, i64, env, i64, i64)
 DEF_HELPER_3(neon_qrshl_s64, i64, env, i64, i64)
+DEF_HELPER_FLAGS_5(neon_sqshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
 DEF_HELPER_FLAGS_4(gvec_srshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index ea63ffc47b..6c6d4d49e7 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -463,6 +463,10 @@ void gen_gvec_srshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_sqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/neon-dp.decode b/target/arm/tcg/neon-dp.decode
index 8525c65c0d..6d4996b8d8 100644
--- a/target/arm/tcg/neon-dp.decode
+++ b/target/arm/tcg/neon-dp.decode
@@ -109,14 +109,8 @@ VSHL_U_3s 001 1 0 . ..   0100 . . . 0 
 @3same_rev
 @3same_64_rev ... . . . 11    . q:1 . .  \
  &3same vm=%vn_dp vn=%vm_dp vd=%vd_dp size=3
 
-{
-  VQSHL_S64_3s    001 0 0 . ..   0100 . . . 1  @3same_64_rev
-  VQSHL_S_3s  001 0 0 . ..   0100 . . . 1  @3same_rev
-}
-{
-  VQSHL_U64_3s    001 1 0 . ..   0100 . . . 1  @3same_64_rev
-  VQSHL_U_3s  001 1 0 . ..   0100 . . . 1  @3same_rev
-}
+VQSHL_S_3s    001 0 0 . ..   0100 . . . 1  @3same_rev
+VQSHL_U_3s    001 1 0 . ..   0100 . . . 1  @3same_rev
 VRSHL_S_3s    001 0 0 . ..   0101 . . . 0  @3same_rev
 VRSHL_U_3s    001 1 0 . ..   0101 . . . 0  @3same_rev
 {
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 216a9f81e3..63c3ec2e73 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1240,6 +1240,30 @@ void gen_gvec_urshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
 }
 
+void gen_neon_sqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[] = {
+gen_helper_neon_sqshl_b, gen_helper_neon_sqshl_h,
+gen_helper_neon_sqshl_s, gen_helper_neon_sqshl_d,
+};
+tcg_debug_assert(vece <= MO_64);
+tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, tcg_env,
+   opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[] = {
+gen_helper_neon_uqshl_b, gen_helper_neon_uqshl_h,
+gen_helper_neon_uqshl_s, gen_helper_neon_uqshl_d,
+};
+tcg_debug_assert(vece <= MO_64);
+tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, tcg_env,
+   opr_sz, max_sz, 0, fns[vece]);
+}
+
 void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp 
esz)
 {
 uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
diff --git

[PATCH v3 03/33] target/arm: Assert oprsz in range when using vfp.qc

2024-05-28 Thread Richard Henderson

Suggested-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/gengvec.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index bfe6885a01..3e2d3c21a1 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -29,6 +29,7 @@ static void gen_gvec_fn3_qc(uint32_t rd_ofs, uint32_t rn_ofs, 
uint32_t rm_ofs,
 {
 TCGv_ptr qc_ptr = tcg_temp_new_ptr();
 
+tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
 tcg_gen_addi_ptr(qc_ptr, tcg_env, offsetof(CPUARMState, vfp.qc));
 tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, qc_ptr,
opr_sz, max_sz, 0, fn);
@@ -1255,6 +1256,8 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
   .opt_opc = vecop_list,
   .vece = MO_64 },
 };
+
+tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
 tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
@@ -1297,6 +1300,8 @@ void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
   .write_aofs = true,
   .vece = MO_64 },
 };
+
+tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
 tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
@@ -1339,6 +1344,8 @@ void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
   .write_aofs = true,
   .vece = MO_64 },
 };
+
+tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
 tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
@@ -1381,6 +1388,8 @@ void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
   .write_aofs = true,
   .vece = MO_64 },
 };
+
+tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
 tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
-- 
2.34.1

[PATCH v3 06/33] target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB

2024-05-28 Thread Richard Henderson

This eliminates the last uses of these neon helpers.
Incorporate the MO_64 expanders as an option to the vector expander.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  17 
 target/arm/tcg/translate.h |  15 +++
 target/arm/tcg/gengvec.c   | 116 +++
 target/arm/tcg/neon_helper.c   | 162 -
 target/arm/tcg/translate-a64.c |  67 --
 5 files changed, 169 insertions(+), 208 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index c76158d6d3..a14c040451 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -268,23 +268,6 @@ DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
 DEF_HELPER_FLAGS_3(check_hcr_el2_trap, TCG_CALL_NO_WG, void, env, i32, i32)
 
 /* neon_helper.c */
-DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_qadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_u8, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_s8, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_u16, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_s16, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_u32, i32, env, i32, i32)
-DEF_HELPER_3(neon_qsub_s32, i32, env, i32, i32)
-DEF_HELPER_3(neon_qadd_u64, i64, env, i64, i64)
-DEF_HELPER_3(neon_qadd_s64, i64, env, i64, i64)
-DEF_HELPER_3(neon_qsub_u64, i64, env, i64, i64)
-DEF_HELPER_3(neon_qsub_s64, i64, env, i64, i64)
-
 DEF_HELPER_2(neon_hadd_s8, i32, i32, i32)
 DEF_HELPER_2(neon_hadd_u8, i32, i32, i32)
 DEF_HELPER_2(neon_hadd_s16, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 3abdbedfe5..87439dcc61 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -466,12 +466,27 @@ void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
+void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+   TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_uqadd_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
 void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
+void gen_sqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+   TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_sqadd_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
 void gen_gvec_sqadd_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
+void gen_uqsub_bhs(TCGv_i64 res, TCGv_i64 qc,
+   TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_uqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
 void gen_gvec_uqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
+void gen_sqsub_bhs(TCGv_i64 res, TCGv_i64 qc,
+   TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_sqsub_d(TCGv_i64 d, TCGv_i64 q, TCGv_i64 a, TCGv_i64 b);
 void gen_gvec_sqsub_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 3e2d3c21a1..740f3f864e 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1218,6 +1218,28 @@ void gen_gvec_sshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
+void gen_uqadd_bhs(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b, MemOp 
esz)
+{
+uint64_t max = MAKE_64BIT_MASK(0, 8 << esz);
+TCGv_i64 tmp = tcg_temp_new_i64();
+
+tcg_gen_add_i64(tmp, a, b);
+tcg_gen_umin_i64(res, tmp, tcg_constant_i64(max));
+tcg_gen_xor_i64(tmp, tmp, res);
+tcg_gen_or_i64(qc, qc, tmp);
+}
+
+void gen_uqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_add_i64(t, a, b);
+tcg_gen_movcond_i64(TCG_COND_LTU, res, t, a,
+tcg_constant_i64(UINT64_MAX), t);
+tcg_gen_xor_i64(t, t, res);
+tcg_gen_or_i64(qc, qc, t);
+}
+
 static void gen_uqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
   TCGv_vec a, TCGv_vec b)
 {
@@ -1251,6 +1273,7 @@ void gen_gvec_uqadd_qc(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
   .opt_opc = vecop_list,
   .vece = MO_32 },
 { .fniv = gen_uqadd_vec,
+  .fni8 = gen_uqadd_d,
   .fno = gen_helper_gvec_uqadd_d,
   .write_aofs = true,
   .opt_opc = vecop_list,
@@ -1262,6

[PATCH v3 09/33] target/arm: Convert SSHL, USHL to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  7 ++
 target/arm/tcg/translate-a64.c | 40 +-
 2 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7c350ba833..ea897d6732 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -42,6 +42,7 @@
 @rr_sd   ... . .. rn:5 rd:5 _e esz=%esz_sd
 
 @rrr_h   ... rm:5 .. rn:5 rd:5  _e esz=1
+@rrr_d   ... rm:5 .. rn:5 rd:5  _e esz=3
 @rrr_sd  ... rm:5 .. rn:5 rd:5  _e esz=%esz_sd
 @rrr_hsd ... rm:5 .. rn:5 rd:5  _e esz=%esz_hsd
 @rrr_e   esz:2 . rm:5 .. rn:5 rd:5  _e
@@ -755,6 +756,9 @@ UQSUB_s 0111 1110 ..1 . 00101 1 . . 
@rrr_e
 SUQADD_s0101 1110 ..1 0 00111 0 . . @r2r_e
 USQADD_s0111 1110 ..1 0 00111 0 . . @r2r_e
 
+SSHL_s  0101 1110 111 . 01000 1 . . @rrr_d
+USHL_s  0111 1110 111 . 01000 1 . . @rrr_d
+
 ### Advanced SIMD scalar pairwise
 
 FADDP_s 0101 1110 0011  1101 10 . . @rr_h
@@ -876,6 +880,9 @@ UQSUB_v 0.10 1110 ..1 . 00101 1 . . 
@qrrr_e
 SUQADD_v0.00 1110 ..1 0 00111 0 . . @qr2r_e
 USQADD_v0.10 1110 ..1 0 00111 0 . . @qr2r_e
 
+SSHL_v  0.00 1110 ..1 . 01000 1 . . @qrrr_e
+USHL_v  0.10 1110 ..1 . 01000 1 . . @qrrr_e
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c0637bda0f..7c7a22985b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5099,6 +5099,24 @@ TRANS(UQSUB_s, do_satacc_s, a, 0, 0, gen_uqsub_bhs, 
gen_uqsub_d)
 TRANS(SUQADD_s, do_satacc_s, a, MO_SIGN, 0, gen_suqadd_bhs, gen_suqadd_d)
 TRANS(USQADD_s, do_satacc_s, a, 0, MO_SIGN, gen_usqadd_bhs, gen_usqadd_d)
 
+static bool do_int3_scalar_d(DisasContext *s, arg_rrr_e *a,
+ void (*fn)(TCGv_i64, TCGv_i64, TCGv_i64))
+{
+if (fp_access_check(s)) {
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+
+read_vec_element(s, t0, a->rn, 0, MO_64);
+read_vec_element(s, t1, a->rm, 0, MO_64);
+fn(t0, t0, t1);
+write_fp_dreg(s, a->rd, t0);
+}
+return true;
+}
+
+TRANS(SSHL_s, do_int3_scalar_d, a, gen_sshl_i64)
+TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5344,6 +5362,10 @@ TRANS(UQSUB_v, do_gvec_fn3, a, gen_gvec_uqsub_qc)
 TRANS(SUQADD_v, do_gvec_fn3, a, gen_gvec_suqadd_qc)
 TRANS(USQADD_v, do_gvec_fn3, a, gen_gvec_usqadd_qc)
 
+TRANS(SSHL_v, do_gvec_fn3, a, gen_gvec_sshl)
+TRANS(USHL_v, do_gvec_fn3, a, gen_gvec_ushl)
+
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -9355,13 +9377,6 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 }
 gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
 break;
-case 0x8: /* SSHL, USHL */
-if (u) {
-gen_ushl_i64(tcg_rd, tcg_rn, tcg_rm);
-} else {
-gen_sshl_i64(tcg_rd, tcg_rn, tcg_rm);
-}
-break;
 case 0x9: /* SQSHL, UQSHL */
 if (u) {
 gen_helper_neon_qshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
@@ -9393,6 +9408,7 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 default:
 case 0x1: /* SQADD / UQADD */
 case 0x5: /* SQSUB / UQSUB */
+case 0x8: /* SSHL, USHL */
 g_assert_not_reached();
 }
 }
@@ -9417,7 +9433,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x9: /* SQSHL, UQSHL */
 case 0xb: /* SQRSHL, UQRSHL */
 break;
-case 0x8: /* SSHL, USHL */
 case 0xa: /* SRSHL, URSHL */
 case 0x6: /* CMGT, CMHI */
 case 0x7: /* CMGE, CMHS */
@@ -9437,6 +9452,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 default:
 case 0x1: /* SQADD, UQADD */
 case 0x5: /* SQSUB, UQSUB */
+case 0x8: /* SSHL, USHL */
 unallocated_encoding(s);
 return;
 }
@@ -10912,6 +10928,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 
 case 0x01: /* SQADD, UQADD */
 case 0x05: /* SQSUB, UQSUB */
+case 0x08: /* SSHL, USHL */
 unallocated_encoding(s);
 return;
 }
@@ -10921,13 +10938,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x08: /* SSHL, USHL */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_ushl, size);
-

[PATCH v3 13/33] target/arm: Convert SQSHL, UQSHL to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  4 ++
 target/arm/tcg/translate-a64.c | 74 ++
 2 files changed, 53 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 9e02776036..85caf37948 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -760,6 +760,8 @@ SSHL_s  0101 1110 111 . 01000 1 . . 
@rrr_d
 USHL_s  0111 1110 111 . 01000 1 . . @rrr_d
 SRSHL_s 0101 1110 111 . 01010 1 . . @rrr_d
 URSHL_s 0111 1110 111 . 01010 1 . . @rrr_d
+SQSHL_s 0101 1110 ..1 . 01001 1 . . @rrr_e
+UQSHL_s 0111 1110 ..1 . 01001 1 . . @rrr_e
 
 ### Advanced SIMD scalar pairwise
 
@@ -886,6 +888,8 @@ SSHL_v  0.00 1110 ..1 . 01000 1 . . 
@qrrr_e
 USHL_v  0.10 1110 ..1 . 01000 1 . . @qrrr_e
 SRSHL_v 0.00 1110 ..1 . 01010 1 . . @qrrr_e
 URSHL_v 0.10 1110 ..1 . 01010 1 . . @qrrr_e
+SQSHL_v 0.00 1110 ..1 . 01001 1 . . @qrrr_e
+UQSHL_v 0.10 1110 ..1 . 01001 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index c88702dad6..97bd69eb3f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5119,6 +5119,49 @@ TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
 TRANS(SRSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_s64)
 TRANS(URSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_u64)
 
+typedef struct ENVScalar2 {
+NeonGenTwoOpEnvFn *gen_bhs[3];
+NeonGenTwo64OpEnvFn *gen_d;
+} ENVScalar2;
+
+static bool do_env_scalar2(DisasContext *s, arg_rrr_e *a, const ENVScalar2 *f)
+{
+if (!fp_access_check(s)) {
+return true;
+}
+if (a->esz == MO_64) {
+TCGv_i64 t0 = read_fp_dreg(s, a->rn);
+TCGv_i64 t1 = read_fp_dreg(s, a->rm);
+f->gen_d(t0, tcg_env, t0, t1);
+write_fp_dreg(s, a->rd, t0);
+} else {
+TCGv_i32 t0 = tcg_temp_new_i32();
+TCGv_i32 t1 = tcg_temp_new_i32();
+
+read_vec_element_i32(s, t0, a->rn, 0, a->esz);
+read_vec_element_i32(s, t1, a->rm, 0, a->esz);
+f->gen_bhs[a->esz](t0, tcg_env, t0, t1);
+write_fp_sreg(s, a->rd, t0);
+}
+return true;
+}
+
+static const ENVScalar2 f_scalar_sqshl = {
+{ gen_helper_neon_qshl_s8,
+  gen_helper_neon_qshl_s16,
+  gen_helper_neon_qshl_s32 },
+gen_helper_neon_qshl_s64,
+};
+TRANS(SQSHL_s, do_env_scalar2, a, _scalar_sqshl)
+
+static const ENVScalar2 f_scalar_uqshl = {
+{ gen_helper_neon_qshl_u8,
+  gen_helper_neon_qshl_u16,
+  gen_helper_neon_qshl_u32 },
+gen_helper_neon_qshl_u64,
+};
+TRANS(UQSHL_s, do_env_scalar2, a, _scalar_uqshl)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5368,6 +5411,8 @@ TRANS(SSHL_v, do_gvec_fn3, a, gen_gvec_sshl)
 TRANS(USHL_v, do_gvec_fn3, a, gen_gvec_ushl)
 TRANS(SRSHL_v, do_gvec_fn3, a, gen_gvec_srshl)
 TRANS(URSHL_v, do_gvec_fn3, a, gen_gvec_urshl)
+TRANS(SQSHL_v, do_gvec_fn3, a, gen_neon_sqshl)
+TRANS(UQSHL_v, do_gvec_fn3, a, gen_neon_uqshl)
 
 
 /*
@@ -9381,13 +9426,6 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 }
 gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
 break;
-case 0x9: /* SQSHL, UQSHL */
-if (u) {
-gen_helper_neon_qshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
-} else {
-gen_helper_neon_qshl_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
-}
-break;
 case 0xb: /* SQRSHL, UQRSHL */
 if (u) {
 gen_helper_neon_qrshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
@@ -9406,6 +9444,7 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 case 0x1: /* SQADD / UQADD */
 case 0x5: /* SQSUB / UQSUB */
 case 0x8: /* SSHL, USHL */
+case 0x9: /* SQSHL, UQSHL */
 case 0xa: /* SRSHL, URSHL */
 g_assert_not_reached();
 }
@@ -9428,7 +9467,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 TCGv_i64 tcg_rd;
 
 switch (opcode) {
-case 0x9: /* SQSHL, UQSHL */
 case 0xb: /* SQRSHL, UQRSHL */
 break;
 case 0x6: /* CMGT, CMHI */
@@ -9450,6 +9488,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x1: /* SQADD, UQADD */
 case 0x5: /* SQSUB, UQSUB */
 case 0x8: /* SSHL, USHL */
+case 0x9: /* SQSHL, UQSHL */
 case 0xa: /* SRSHL, URSHL */
 unallocated_encoding(s);
 return;
@@ -9477,16 +9516,6 @@ static void 
disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
 void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
 
 switch (opcode)

[PATCH v3 27/33] target/arm: Convert SABA, SABD, UABA, UABD to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  4 
 target/arm/tcg/translate-a64.c | 22 ++
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index bc98963bc5..07b604ec30 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -922,6 +922,10 @@ SMAX_v  0.00 1110 ..1 . 01100 1 . . 
@qrrr_e
 UMAX_v  0.10 1110 ..1 . 01100 1 . . @qrrr_e
 SMIN_v  0.00 1110 ..1 . 01101 1 . . @qrrr_e
 UMIN_v  0.10 1110 ..1 . 01101 1 . . @qrrr_e
+SABD_v  0.00 1110 ..1 . 01110 1 . . @qrrr_e
+UABD_v  0.10 1110 ..1 . 01110 1 . . @qrrr_e
+SABA_v  0.00 1110 ..1 . 0 1 . . @qrrr_e
+UABA_v  0.10 1110 ..1 . 0 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index db6f59df17..61afbc434f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5464,6 +5464,10 @@ TRANS(SMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smax)
 TRANS(UMAX_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umax)
 TRANS(SMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_smin)
 TRANS(UMIN_v, do_gvec_fn3_no64, a, tcg_gen_gvec_umin)
+TRANS(SABA_v, do_gvec_fn3_no64, a, gen_gvec_saba)
+TRANS(UABA_v, do_gvec_fn3_no64, a, gen_gvec_uaba)
+TRANS(SABD_v, do_gvec_fn3_no64, a, gen_gvec_sabd)
+TRANS(UABD_v, do_gvec_fn3_no64, a, gen_gvec_uabd)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -10929,8 +10933,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 return;
 }
 /* fall through */
-case 0xe: /* SABD, UABD */
-case 0xf: /* SABA, UABA */
 case 0x12: /* MLA, MLS */
 if (size == 3) {
 unallocated_encoding(s);
@@ -10963,6 +10965,8 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 case 0x0b: /* SQRSHL, UQRSHL */
 case 0x0c: /* SMAX, UMAX */
 case 0x0d: /* SMIN, UMIN */
+case 0x0e: /* SABD, UABD */
+case 0x0f: /* SABA, UABA */
 case 0x10: /* ADD, SUB */
 case 0x11: /* CMTST, CMEQ */
 unallocated_encoding(s);
@@ -10974,20 +10978,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0xe: /* SABD, UABD */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uabd, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_sabd, size);
-}
-return;
-case 0xf: /* SABA, UABA */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uaba, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_saba, size);
-}
-return;
 case 0x13: /* MUL, PMUL */
 if (!u) { /* MUL */
 gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
-- 
2.34.1

[PATCH v3 21/33] target/arm: Convert SHADD, UHADD to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  2 ++
 target/arm/tcg/translate-a64.c | 11 +++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 3061e26242..e33d91fd0a 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -912,6 +912,8 @@ CMGE_v  0.00 1110 ..1 . 00111 1 . . 
@qrrr_e
 CMHS_v  0.10 1110 ..1 . 00111 1 . . @qrrr_e
 CMTST_v 0.00 1110 ..1 . 10001 1 . . @qrrr_e
 CMEQ_v  0.10 1110 ..1 . 10001 1 . . @qrrr_e
+SHADD_v 0.00 1110 ..1 . 0 1 . . @qrrr_e
+UHADD_v 0.10 1110 ..1 . 0 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5f3423513d..00c04425c1 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5454,6 +5454,8 @@ TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
 
 TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
 TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
+TRANS(SHADD_v, do_gvec_fn3_no64, a, gen_gvec_shadd)
+TRANS(UHADD_v, do_gvec_fn3_no64, a, gen_gvec_uhadd)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -10920,7 +10922,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 return;
 }
 /* fall through */
-case 0x0: /* SHADD, UHADD */
 case 0x2: /* SRHADD, URHADD */
 case 0x4: /* SHSUB, UHSUB */
 case 0xc: /* SMAX, UMAX */
@@ -10946,6 +10947,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 break;
 
+case 0x0: /* SHADD, UHADD */
 case 0x01: /* SQADD, UQADD */
 case 0x05: /* SQSUB, UQSUB */
 case 0x06: /* CMGT, CMHI */
@@ -10965,13 +10967,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x00: /* SHADD, UHADD */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uhadd, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_shadd, size);
-}
-return;
 case 0x0c: /* SMAX, UMAX */
 if (u) {
 gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
-- 
2.34.1

[PATCH v3 16/33] target/arm: Convert ADD, SUB (vector) to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  6 ++
 target/arm/tcg/translate-a64.c | 22 +++---
 2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 96ce35ad40..44383b4fc7 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -765,6 +765,9 @@ UQSHL_s 0111 1110 ..1 . 01001 1 . . 
@rrr_e
 SQRSHL_s0101 1110 ..1 . 01011 1 . . @rrr_e
 UQRSHL_s0111 1110 ..1 . 01011 1 . . @rrr_e
 
+ADD_s   0101 1110 111 . 1 1 . . @rrr_d
+SUB_s   0111 1110 111 . 1 1 . . @rrr_d
+
 ### Advanced SIMD scalar pairwise
 
 FADDP_s 0101 1110 0011  1101 10 . . @rr_h
@@ -895,6 +898,9 @@ UQSHL_v 0.10 1110 ..1 . 01001 1 . . 
@qrrr_e
 SQRSHL_v0.00 1110 ..1 . 01011 1 . . @qrrr_e
 UQRSHL_v0.10 1110 ..1 . 01011 1 . . @qrrr_e
 
+ADD_v   0.00 1110 ..1 . 1 1 . . @qrrr_e
+SUB_v   0.10 1110 ..1 . 1 1 . . @qrrr_e
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2424c6d314..77a64923e7 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5118,6 +5118,8 @@ TRANS(SSHL_s, do_int3_scalar_d, a, gen_sshl_i64)
 TRANS(USHL_s, do_int3_scalar_d, a, gen_ushl_i64)
 TRANS(SRSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_s64)
 TRANS(URSHL_s, do_int3_scalar_d, a, gen_helper_neon_rshl_u64)
+TRANS(ADD_s, do_int3_scalar_d, a, tcg_gen_add_i64)
+TRANS(SUB_s, do_int3_scalar_d, a, tcg_gen_sub_i64)
 
 typedef struct ENVScalar2 {
 NeonGenTwoOpEnvFn *gen_bhs[3];
@@ -5432,6 +5434,8 @@ TRANS(UQSHL_v, do_gvec_fn3, a, gen_neon_uqshl)
 TRANS(SQRSHL_v, do_gvec_fn3, a, gen_neon_sqrshl)
 TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
 
+TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
+TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
 
 /*
  * Advanced SIMD scalar/vector x indexed element
@@ -9444,13 +9448,6 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 }
 gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
 break;
-case 0x10: /* ADD, SUB */
-if (u) {
-tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
-} else {
-tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
-}
-break;
 default:
 case 0x1: /* SQADD / UQADD */
 case 0x5: /* SQSUB / UQSUB */
@@ -9458,6 +9455,7 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 case 0x9: /* SQSHL, UQSHL */
 case 0xa: /* SRSHL, URSHL */
 case 0xb: /* SQRSHL, UQRSHL */
+case 0x10: /* ADD, SUB */
 g_assert_not_reached();
 }
 }
@@ -9482,7 +9480,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x6: /* CMGT, CMHI */
 case 0x7: /* CMGE, CMHS */
 case 0x11: /* CMTST, CMEQ */
-case 0x10: /* ADD, SUB (vector) */
 if (size != 3) {
 unallocated_encoding(s);
 return;
@@ -9501,6 +9498,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x9: /* SQSHL, UQSHL */
 case 0xa: /* SRSHL, URSHL */
 case 0xb: /* SQRSHL, UQRSHL */
+case 0x10: /* ADD, SUB (vector) */
 unallocated_encoding(s);
 return;
 }
@@ -10962,6 +10960,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 case 0x09: /* SQSHL, UQSHL */
 case 0x0a: /* SRSHL, URSHL */
 case 0x0b: /* SQRSHL, UQRSHL */
+case 0x10: /* ADD, SUB */
 unallocated_encoding(s);
 return;
 }
@@ -10999,13 +10998,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_saba, size);
 }
 return;
-case 0x10: /* ADD, SUB */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_sub, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_add, size);
-}
-return;
 case 0x13: /* MUL, PMUL */
 if (!u) { /* MUL */
 gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
-- 
2.34.1

[PATCH v3 28/33] target/arm: Convert MUL, PMUL to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  5 
 target/arm/tcg/translate-a64.c | 51 +-
 2 files changed, 25 insertions(+), 31 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 07b604ec30..3ea0643370 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -926,6 +926,8 @@ SABD_v  0.00 1110 ..1 . 01110 1 . . 
@qrrr_e
 UABD_v  0.10 1110 ..1 . 01110 1 . . @qrrr_e
 SABA_v  0.00 1110 ..1 . 0 1 . . @qrrr_e
 UABA_v  0.10 1110 ..1 . 0 1 . . @qrrr_e
+MUL_v   0.00 1110 ..1 . 10011 1 . . @qrrr_e
+PMUL_v  0.10 1110 001 . 10011 1 . . @qrrr_b
 
 ### Advanced SIMD scalar x indexed element
 
@@ -967,3 +969,6 @@ FMLAL_vi0.00  10 ..   . 0 . .   
@qrrx_h
 FMLSL_vi0.00  10 ..  0100 . 0 . .   @qrrx_h
 FMLAL2_vi   0.10  10 ..  1000 . 0 . .   @qrrx_h
 FMLSL2_vi   0.10  10 ..  1100 . 0 . .   @qrrx_h
+
+MUL_vi  0.00  01 ..  1000 . 0 . .   @qrrx_h
+MUL_vi  0.00  10 . . 1000 . 0 . .   @qrrx_s
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 61afbc434f..1909d1426c 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5468,6 +5468,8 @@ TRANS(SABA_v, do_gvec_fn3_no64, a, gen_gvec_saba)
 TRANS(UABA_v, do_gvec_fn3_no64, a, gen_gvec_uaba)
 TRANS(SABD_v, do_gvec_fn3_no64, a, gen_gvec_sabd)
 TRANS(UABD_v, do_gvec_fn3_no64, a, gen_gvec_uabd)
+TRANS(MUL_v, do_gvec_fn3_no64, a, tcg_gen_gvec_mul)
+TRANS(PMUL_v, do_gvec_op3_ool, a, 0, gen_helper_gvec_pmul_b)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -5694,6 +5696,22 @@ TRANS_FEAT(FMLSL_vi, aa64_fhm, do_fmlal_idx, a, true, 
false)
 TRANS_FEAT(FMLAL2_vi, aa64_fhm, do_fmlal_idx, a, false, true)
 TRANS_FEAT(FMLSL2_vi, aa64_fhm, do_fmlal_idx, a, true, true)
 
+static bool do_int3_vector_idx(DisasContext *s, arg_qrrx_e *a,
+   gen_helper_gvec_3 * const fns[2])
+{
+assert(a->esz == MO_16 || a->esz == MO_32);
+if (fp_access_check(s)) {
+gen_gvec_op3_ool(s, a->q, a->rd, a->rn, a->rm, a->idx, fns[a->esz - 
1]);
+}
+return true;
+}
+
+static gen_helper_gvec_3 * const f_vector_idx_mul[2] = {
+gen_helper_gvec_mul_idx_h,
+gen_helper_gvec_mul_idx_s,
+};
+TRANS(MUL_vi, do_int3_vector_idx, a, f_vector_idx_mul)
+
 /*
  * Advanced SIMD scalar pairwise
  */
@@ -10927,12 +10945,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 int rd = extract32(insn, 0, 5);
 
 switch (opcode) {
-case 0x13: /* MUL, PMUL */
-if (u && size != 0) {
-unallocated_encoding(s);
-return;
-}
-/* fall through */
 case 0x12: /* MLA, MLS */
 if (size == 3) {
 unallocated_encoding(s);
@@ -10969,6 +10981,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 case 0x0f: /* SABA, UABA */
 case 0x10: /* ADD, SUB */
 case 0x11: /* CMTST, CMEQ */
+case 0x13: /* MUL, PMUL */
 unallocated_encoding(s);
 return;
 }
@@ -10978,13 +10991,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x13: /* MUL, PMUL */
-if (!u) { /* MUL */
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_mul, size);
-} else {  /* PMUL */
-gen_gvec_op3_ool(s, is_q, rd, rn, rm, 0, gen_helper_gvec_pmul_b);
-}
-return;
 case 0x12: /* MLA, MLS */
 if (u) {
 gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mls, size);
@@ -12198,7 +12204,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 TCGv_ptr fpst;
 
 switch (16 * u + opcode) {
-case 0x08: /* MUL */
 case 0x10: /* MLA */
 case 0x14: /* MLS */
 if (is_scalar) {
@@ -12285,6 +12290,7 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 case 0x01: /* FMLA */
 case 0x04: /* FMLSL */
 case 0x05: /* FMLS */
+case 0x08: /* MUL */
 case 0x09: /* FMUL */
 case 0x18: /* FMLAL2 */
 case 0x19: /* FMULX */
@@ -12407,22 +12413,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 return;
 
-case 0x08: /* MUL */
-if (!is_long && !is_scalar) {
-static gen_helper_gvec_3 * const fns[3] = {
-gen_helper_gvec_mul_idx_h,
-gen_helper_gvec_mul_idx_s,
-gen_helper_gvec_mul_idx_d,
-};
-tcg_gen_gvec_3_ool(vec_full_reg_offset(s, rd),
-   vec_full_reg_offset(s, rn),
-   vec_full_reg_offset(s, rm),
-   is_q ? 16 : 8,

[PATCH v3 23/33] target/arm: Convert SHSUB, UHSUB to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  2 ++
 target/arm/tcg/translate-a64.c | 11 +++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index e33d91fd0a..b1bbcb144e 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -914,6 +914,8 @@ CMTST_v 0.00 1110 ..1 . 10001 1 . . 
@qrrr_e
 CMEQ_v  0.10 1110 ..1 . 10001 1 . . @qrrr_e
 SHADD_v 0.00 1110 ..1 . 0 1 . . @qrrr_e
 UHADD_v 0.10 1110 ..1 . 0 1 . . @qrrr_e
+SHSUB_v 0.00 1110 ..1 . 00100 1 . . @qrrr_e
+UHSUB_v 0.10 1110 ..1 . 00100 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 63f7a59f94..6571b999f4 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5456,6 +5456,8 @@ TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
 TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
 TRANS(SHADD_v, do_gvec_fn3_no64, a, gen_gvec_shadd)
 TRANS(UHADD_v, do_gvec_fn3_no64, a, gen_gvec_uhadd)
+TRANS(SHSUB_v, do_gvec_fn3_no64, a, gen_gvec_shsub)
+TRANS(UHSUB_v, do_gvec_fn3_no64, a, gen_gvec_uhsub)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -10923,7 +10925,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 /* fall through */
 case 0x2: /* SRHADD, URHADD */
-case 0x4: /* SHSUB, UHSUB */
 case 0xc: /* SMAX, UMAX */
 case 0xd: /* SMIN, UMIN */
 case 0xe: /* SABD, UABD */
@@ -10949,6 +10950,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 
 case 0x0: /* SHADD, UHADD */
 case 0x01: /* SQADD, UQADD */
+case 0x04: /* SHSUB, UHSUB */
 case 0x05: /* SQSUB, UQSUB */
 case 0x06: /* CMGT, CMHI */
 case 0x07: /* CMGE, CMHS */
@@ -10967,13 +10969,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x04: /* SHSUB, UHSUB */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_uhsub, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_shsub, size);
-}
-return;
 case 0x0c: /* SMAX, UMAX */
 if (u) {
 gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
-- 
2.34.1

[PATCH v3 18/33] target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32, i64}

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/gengvec.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 6dc96269d5..e64ca02e0c 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -934,14 +934,12 @@ void gen_gvec_mls(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 /* CMTST : test is "if (X & Y != 0)". */
 static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
-tcg_gen_and_i32(d, a, b);
-tcg_gen_negsetcond_i32(TCG_COND_NE, d, d, tcg_constant_i32(0));
+tcg_gen_negsetcond_i32(TCG_COND_TSTNE, d, a, b);
 }
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
-tcg_gen_and_i64(d, a, b);
-tcg_gen_negsetcond_i64(TCG_COND_NE, d, d, tcg_constant_i64(0));
+tcg_gen_negsetcond_i64(TCG_COND_TSTNE, d, a, b);
 }
 
 static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
-- 
2.34.1

[PATCH v3 29/33] target/arm: Convert MLA, MLS to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  8 
 target/arm/tcg/translate-a64.c | 77 ++
 2 files changed, 31 insertions(+), 54 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 3ea0643370..2dea68a0a9 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -928,6 +928,8 @@ SABA_v  0.00 1110 ..1 . 0 1 . . 
@qrrr_e
 UABA_v  0.10 1110 ..1 . 0 1 . . @qrrr_e
 MUL_v   0.00 1110 ..1 . 10011 1 . . @qrrr_e
 PMUL_v  0.10 1110 001 . 10011 1 . . @qrrr_b
+MLA_v   0.00 1110 ..1 . 10010 1 . . @qrrr_e
+MLS_v   0.10 1110 ..1 . 10010 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
@@ -972,3 +974,9 @@ FMLSL2_vi   0.10  10 ..  1100 . 0 . .   
@qrrx_h
 
 MUL_vi  0.00  01 ..  1000 . 0 . .   @qrrx_h
 MUL_vi  0.00  10 . . 1000 . 0 . .   @qrrx_s
+
+MLA_vi  0.10  01 ..   . 0 . .   @qrrx_h
+MLA_vi  0.10  10 . .  . 0 . .   @qrrx_s
+
+MLS_vi  0.10  01 ..  0100 . 0 . .   @qrrx_h
+MLS_vi  0.10  10 . . 0100 . 0 . .   @qrrx_s
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1909d1426c..c4601cde2f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5470,6 +5470,8 @@ TRANS(SABD_v, do_gvec_fn3_no64, a, gen_gvec_sabd)
 TRANS(UABD_v, do_gvec_fn3_no64, a, gen_gvec_uabd)
 TRANS(MUL_v, do_gvec_fn3_no64, a, tcg_gen_gvec_mul)
 TRANS(PMUL_v, do_gvec_op3_ool, a, 0, gen_helper_gvec_pmul_b)
+TRANS(MLA_v, do_gvec_fn3_no64, a, gen_gvec_mla)
+TRANS(MLS_v, do_gvec_fn3_no64, a, gen_gvec_mls)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -5712,6 +5714,24 @@ static gen_helper_gvec_3 * const f_vector_idx_mul[2] = {
 };
 TRANS(MUL_vi, do_int3_vector_idx, a, f_vector_idx_mul)
 
+static bool do_mla_vector_idx(DisasContext *s, arg_qrrx_e *a, bool sub)
+{
+static gen_helper_gvec_4 * const fns[2][2] = {
+{ gen_helper_gvec_mla_idx_h, gen_helper_gvec_mls_idx_h },
+{ gen_helper_gvec_mla_idx_s, gen_helper_gvec_mls_idx_s },
+};
+
+assert(a->esz == MO_16 || a->esz == MO_32);
+if (fp_access_check(s)) {
+gen_gvec_op4_ool(s, a->q, a->rd, a->rn, a->rm, a->rd,
+ a->idx, fns[a->esz - 1][sub]);
+}
+return true;
+}
+
+TRANS(MLA_vi, do_mla_vector_idx, a, false)
+TRANS(MLS_vi, do_mla_vector_idx, a, true)
+
 /*
  * Advanced SIMD scalar pairwise
  */
@@ -10945,12 +10965,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 int rd = extract32(insn, 0, 5);
 
 switch (opcode) {
-case 0x12: /* MLA, MLS */
-if (size == 3) {
-unallocated_encoding(s);
-return;
-}
-break;
 case 0x16: /* SQDMULH, SQRDMULH */
 if (size == 0 || size == 3) {
 unallocated_encoding(s);
@@ -10981,6 +10995,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 case 0x0f: /* SABA, UABA */
 case 0x10: /* ADD, SUB */
 case 0x11: /* CMTST, CMEQ */
+case 0x12: /* MLA, MLS */
 case 0x13: /* MUL, PMUL */
 unallocated_encoding(s);
 return;
@@ -10991,13 +11006,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x12: /* MLA, MLS */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mls, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_mla, size);
-}
-return;
 case 0x16: /* SQDMULH, SQRDMULH */
 {
 static gen_helper_gvec_3_ptr * const fns[2][2] = {
@@ -12204,13 +12212,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 TCGv_ptr fpst;
 
 switch (16 * u + opcode) {
-case 0x10: /* MLA */
-case 0x14: /* MLS */
-if (is_scalar) {
-unallocated_encoding(s);
-return;
-}
-break;
 case 0x02: /* SMLAL, SMLAL2 */
 case 0x12: /* UMLAL, UMLAL2 */
 case 0x06: /* SMLSL, SMLSL2 */
@@ -12292,6 +12293,8 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 case 0x05: /* FMLS */
 case 0x08: /* MUL */
 case 0x09: /* FMUL */
+case 0x10: /* MLA */
+case 0x14: /* MLS */
 case 0x18: /* FMLAL2 */
 case 0x19: /* FMULX */
 case 0x1c: /* FMLSL2 */
@@ -12412,40 +12415,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
: gen_helper_gvec_fcmlah_idx);
 }
 return;
-
-case 0x10: /* MLA */
-if (!is_long && !is_scalar) {
-static gen_helper_gvec_4 * const fns[3] = {
-gen_helper_gvec_mla_idx_h,
-

[PATCH v3 15/33] target/arm: Convert SQRSHL, UQRSHL to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  4 +++
 target/arm/tcg/translate-a64.c | 48 --
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 85caf37948..96ce35ad40 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -762,6 +762,8 @@ SRSHL_s 0101 1110 111 . 01010 1 . . 
@rrr_d
 URSHL_s 0111 1110 111 . 01010 1 . . @rrr_d
 SQSHL_s 0101 1110 ..1 . 01001 1 . . @rrr_e
 UQSHL_s 0111 1110 ..1 . 01001 1 . . @rrr_e
+SQRSHL_s0101 1110 ..1 . 01011 1 . . @rrr_e
+UQRSHL_s0111 1110 ..1 . 01011 1 . . @rrr_e
 
 ### Advanced SIMD scalar pairwise
 
@@ -890,6 +892,8 @@ SRSHL_v 0.00 1110 ..1 . 01010 1 . . 
@qrrr_e
 URSHL_v 0.10 1110 ..1 . 01010 1 . . @qrrr_e
 SQSHL_v 0.00 1110 ..1 . 01001 1 . . @qrrr_e
 UQSHL_v 0.10 1110 ..1 . 01001 1 . . @qrrr_e
+SQRSHL_v0.00 1110 ..1 . 01011 1 . . @qrrr_e
+UQRSHL_v0.10 1110 ..1 . 01011 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index b9d577f620..2424c6d314 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5162,6 +5162,22 @@ static const ENVScalar2 f_scalar_uqshl = {
 };
 TRANS(UQSHL_s, do_env_scalar2, a, _scalar_uqshl)
 
+static const ENVScalar2 f_scalar_sqrshl = {
+{ gen_helper_neon_qrshl_s8,
+  gen_helper_neon_qrshl_s16,
+  gen_helper_neon_qrshl_s32 },
+gen_helper_neon_qrshl_s64,
+};
+TRANS(SQRSHL_s, do_env_scalar2, a, _scalar_sqrshl)
+
+static const ENVScalar2 f_scalar_uqrshl = {
+{ gen_helper_neon_qrshl_u8,
+  gen_helper_neon_qrshl_u16,
+  gen_helper_neon_qrshl_u32 },
+gen_helper_neon_qrshl_u64,
+};
+TRANS(UQRSHL_s, do_env_scalar2, a, _scalar_uqrshl)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5413,6 +5429,8 @@ TRANS(SRSHL_v, do_gvec_fn3, a, gen_gvec_srshl)
 TRANS(URSHL_v, do_gvec_fn3, a, gen_gvec_urshl)
 TRANS(SQSHL_v, do_gvec_fn3, a, gen_neon_sqshl)
 TRANS(UQSHL_v, do_gvec_fn3, a, gen_neon_uqshl)
+TRANS(SQRSHL_v, do_gvec_fn3, a, gen_neon_sqrshl)
+TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
 
 
 /*
@@ -9426,13 +9444,6 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 }
 gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
 break;
-case 0xb: /* SQRSHL, UQRSHL */
-if (u) {
-gen_helper_neon_qrshl_u64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
-} else {
-gen_helper_neon_qrshl_s64(tcg_rd, tcg_env, tcg_rn, tcg_rm);
-}
-break;
 case 0x10: /* ADD, SUB */
 if (u) {
 tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
@@ -9446,6 +9457,7 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
 case 0x8: /* SSHL, USHL */
 case 0x9: /* SQSHL, UQSHL */
 case 0xa: /* SRSHL, URSHL */
+case 0xb: /* SQRSHL, UQRSHL */
 g_assert_not_reached();
 }
 }
@@ -9467,8 +9479,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 TCGv_i64 tcg_rd;
 
 switch (opcode) {
-case 0xb: /* SQRSHL, UQRSHL */
-break;
 case 0x6: /* CMGT, CMHI */
 case 0x7: /* CMGE, CMHS */
 case 0x11: /* CMTST, CMEQ */
@@ -9490,6 +9500,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x8: /* SSHL, USHL */
 case 0x9: /* SQSHL, UQSHL */
 case 0xa: /* SRSHL, URSHL */
+case 0xb: /* SQRSHL, UQRSHL */
 unallocated_encoding(s);
 return;
 }
@@ -9516,16 +9527,6 @@ static void 
disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
 void (*genfn)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp) = NULL;
 
 switch (opcode) {
-case 0xb: /* SQRSHL, UQRSHL */
-{
-static NeonGenTwoOpEnvFn * const fns[3][2] = {
-{ gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
-{ gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
-{ gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
-};
-genenvfn = fns[size][u];
-break;
-}
 case 0x16: /* SQDMULH, SQRDMULH */
 {
 static NeonGenTwoOpEnvFn * const fns[2][2] = {
@@ -9540,6 +9541,7 @@ static void disas_simd_scalar_three_reg_same(DisasContext 
*s, uint32_t insn)
 case 0x1: /* SQADD, UQADD */
 case 0x5: /* SQSUB, UQSUB */
 case 0x9: /* SQSHL, UQSHL */
+case 0xb: /* SQRSHL, UQRSHL */
 g_assert_not_reached();
 }
 
@@ -10959,6 +10961,7 @@ static void

[PATCH v3 17/33] target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  12 +++
 target/arm/tcg/translate-a64.c | 132 -
 2 files changed, 60 insertions(+), 84 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 44383b4fc7..3061e26242 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -767,6 +767,12 @@ UQRSHL_s0111 1110 ..1 . 01011 1 . . 
@rrr_e
 
 ADD_s   0101 1110 111 . 1 1 . . @rrr_d
 SUB_s   0111 1110 111 . 1 1 . . @rrr_d
+CMGT_s  0101 1110 111 . 00110 1 . . @rrr_d
+CMHI_s  0111 1110 111 . 00110 1 . . @rrr_d
+CMGE_s  0101 1110 111 . 00111 1 . . @rrr_d
+CMHS_s  0111 1110 111 . 00111 1 . . @rrr_d
+CMTST_s 0101 1110 111 . 10001 1 . . @rrr_d
+CMEQ_s  0111 1110 111 . 10001 1 . . @rrr_d
 
 ### Advanced SIMD scalar pairwise
 
@@ -900,6 +906,12 @@ UQRSHL_v0.10 1110 ..1 . 01011 1 . . 
@qrrr_e
 
 ADD_v   0.00 1110 ..1 . 1 1 . . @qrrr_e
 SUB_v   0.10 1110 ..1 . 1 1 . . @qrrr_e
+CMGT_v  0.00 1110 ..1 . 00110 1 . . @qrrr_e
+CMHI_v  0.10 1110 ..1 . 00110 1 . . @qrrr_e
+CMGE_v  0.00 1110 ..1 . 00111 1 . . @qrrr_e
+CMHS_v  0.10 1110 ..1 . 00111 1 . . @qrrr_e
+CMTST_v 0.00 1110 ..1 . 10001 1 . . @qrrr_e
+CMEQ_v  0.10 1110 ..1 . 10001 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 77a64923e7..3c6cfc2952 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5180,6 +5180,24 @@ static const ENVScalar2 f_scalar_uqrshl = {
 };
 TRANS(UQRSHL_s, do_env_scalar2, a, _scalar_uqrshl)
 
+static bool do_cmop_d(DisasContext *s, arg_rrr_e *a, TCGCond cond)
+{
+if (fp_access_check(s)) {
+TCGv_i64 t0 = read_fp_dreg(s, a->rn);
+TCGv_i64 t1 = read_fp_dreg(s, a->rm);
+tcg_gen_negsetcond_i64(cond, t0, t0, t1);
+write_fp_dreg(s, a->rd, t0);
+}
+return true;
+}
+
+TRANS(CMGT_s, do_cmop_d, a, TCG_COND_GT)
+TRANS(CMHI_s, do_cmop_d, a, TCG_COND_GTU)
+TRANS(CMGE_s, do_cmop_d, a, TCG_COND_GE)
+TRANS(CMHS_s, do_cmop_d, a, TCG_COND_GEU)
+TRANS(CMEQ_s, do_cmop_d, a, TCG_COND_EQ)
+TRANS(CMTST_s, do_cmop_d, a, TCG_COND_TSTNE)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5437,6 +5455,28 @@ TRANS(UQRSHL_v, do_gvec_fn3, a, gen_neon_uqrshl)
 TRANS(ADD_v, do_gvec_fn3, a, tcg_gen_gvec_add)
 TRANS(SUB_v, do_gvec_fn3, a, tcg_gen_gvec_sub)
 
+static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
+{
+if (a->esz == MO_64 && !a->q) {
+return false;
+}
+if (fp_access_check(s)) {
+tcg_gen_gvec_cmp(cond, a->esz,
+ vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm),
+ a->q ? 16 : 8, vec_full_reg_size(s));
+}
+return true;
+}
+
+TRANS(CMGT_v, do_cmop_v, a, TCG_COND_GT)
+TRANS(CMHI_v, do_cmop_v, a, TCG_COND_GTU)
+TRANS(CMGE_v, do_cmop_v, a, TCG_COND_GE)
+TRANS(CMHS_v, do_cmop_v, a, TCG_COND_GEU)
+TRANS(CMEQ_v, do_cmop_v, a, TCG_COND_EQ)
+TRANS(CMTST_v, do_gvec_fn3, a, gen_gvec_cmtst)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -9421,45 +9461,6 @@ static void 
disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
 }
 }
 
-static void handle_3same_64(DisasContext *s, int opcode, bool u,
-TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm)
-{
-/* Handle 64x64->64 opcodes which are shared between the scalar
- * and vector 3-same groups. We cover every opcode where size == 3
- * is valid in either the three-reg-same (integer, not pairwise)
- * or scalar-three-reg-same groups.
- */
-TCGCond cond;
-
-switch (opcode) {
-case 0x6: /* CMGT, CMHI */
-cond = u ? TCG_COND_GTU : TCG_COND_GT;
-do_cmop:
-/* 64 bit integer comparison, result = test ? -1 : 0. */
-tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
-break;
-case 0x7: /* CMGE, CMHS */
-cond = u ? TCG_COND_GEU : TCG_COND_GE;
-goto do_cmop;
-case 0x11: /* CMTST, CMEQ */
-if (u) {
-cond = TCG_COND_EQ;
-goto do_cmop;
-}
-gen_cmtst_i64(tcg_rd, tcg_rn, tcg_rm);
-break;
-default:
-case 0x1: /* SQADD / UQADD */
-case 0x5: /* SQSUB / UQSUB */
-case 0x8: /* SSHL, USHL */
-case 0x9: /* SQSHL, UQSHL */
-case 0xa: /* SRSHL, URSHL */
-case 0xb: /* SQRSHL, UQRSHL */
-

[PATCH v3 08/33] target/arm: Convert SUQADD, USQADD to decodetree

2024-05-28 Thread Richard Henderson

These are faux 2-operand instructions, reading from rd.
Sort them next to the other three-operand same insns for clarity.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  8 +
 target/arm/tcg/translate-a64.c | 64 --
 2 files changed, 14 insertions(+), 58 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 19010af03b..7c350ba833 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -45,6 +45,7 @@
 @rrr_sd  ... rm:5 .. rn:5 rd:5  _e esz=%esz_sd
 @rrr_hsd ... rm:5 .. rn:5 rd:5  _e esz=%esz_hsd
 @rrr_e   esz:2 . rm:5 .. rn:5 rd:5  _e
+@r2r_e   esz:2 . . .. rm:5 rd:5 _e rn=%rd
 
 @rrx_h   .. .. rm:4  . . rn:5 rd:5  _e esz=1 idx=%hlm
 @rrx_s   .. . rm:5   . . rn:5 rd:5  _e esz=2 idx=%hl
@@ -60,6 +61,7 @@
 @qrrr_h . q:1 .. ... rm:5 .. rn:5 rd:5  _e esz=1
 @qrrr_sd. q:1 .. ... rm:5 .. rn:5 rd:5  _e esz=%esz_sd
 @qrrr_e . q:1 .. esz:2 . rm:5 .. rn:5 rd:5  _e
+@qr2r_e . q:1 .. esz:2 . . .. rm:5 rd:5 _e rn=%rd
 
 @qrrx_h . q:1 ..  .. .. rm:4  . . rn:5 rd:5 \
 _e esz=1 idx=%hlm
@@ -750,6 +752,9 @@ UQADD_s 0111 1110 ..1 . 1 1 . . 
@rrr_e
 SQSUB_s 0101 1110 ..1 . 00101 1 . . @rrr_e
 UQSUB_s 0111 1110 ..1 . 00101 1 . . @rrr_e
 
+SUQADD_s0101 1110 ..1 0 00111 0 . . @r2r_e
+USQADD_s0111 1110 ..1 0 00111 0 . . @r2r_e
+
 ### Advanced SIMD scalar pairwise
 
 FADDP_s 0101 1110 0011  1101 10 . . @rr_h
@@ -868,6 +873,9 @@ UQADD_v 0.10 1110 ..1 . 1 1 . . 
@qrrr_e
 SQSUB_v 0.00 1110 ..1 . 00101 1 . . @qrrr_e
 UQSUB_v 0.10 1110 ..1 . 00101 1 . . @qrrr_e
 
+SUQADD_v0.00 1110 ..1 0 00111 0 . . @qr2r_e
+USQADD_v0.10 1110 ..1 0 00111 0 . . @qr2r_e
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3956c41543..c0637bda0f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5096,6 +5096,8 @@ TRANS(SQADD_s, do_satacc_s, a, MO_SIGN, MO_SIGN, 
gen_sqadd_bhs, gen_sqadd_d)
 TRANS(SQSUB_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqsub_bhs, gen_sqsub_d)
 TRANS(UQADD_s, do_satacc_s, a, 0, 0, gen_uqadd_bhs, gen_uqadd_d)
 TRANS(UQSUB_s, do_satacc_s, a, 0, 0, gen_uqsub_bhs, gen_uqsub_d)
+TRANS(SUQADD_s, do_satacc_s, a, MO_SIGN, 0, gen_suqadd_bhs, gen_suqadd_d)
+TRANS(USQADD_s, do_satacc_s, a, 0, MO_SIGN, gen_usqadd_bhs, gen_usqadd_d)
 
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
@@ -5339,6 +5341,8 @@ TRANS(SQADD_v, do_gvec_fn3, a, gen_gvec_sqadd_qc)
 TRANS(UQADD_v, do_gvec_fn3, a, gen_gvec_uqadd_qc)
 TRANS(SQSUB_v, do_gvec_fn3, a, gen_gvec_sqsub_qc)
 TRANS(UQSUB_v, do_gvec_fn3, a, gen_gvec_uqsub_qc)
+TRANS(SUQADD_v, do_gvec_fn3, a, gen_gvec_suqadd_qc)
+TRANS(USQADD_v, do_gvec_fn3, a, gen_gvec_usqadd_qc)
 
 /*
  * Advanced SIMD scalar/vector x indexed element
@@ -10009,48 +10013,6 @@ static void handle_2misc_narrow(DisasContext *s, bool 
scalar,
 clear_vec_high(s, is_q, rd);
 }
 
-/* Remaining saturating accumulating ops */
-static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u,
-bool is_q, unsigned size, int rn, int rd)
-{
-TCGv_i64 res, qc, a, b;
-
-if (!is_scalar) {
-gen_gvec_fn3(s, is_q, rd, rd, rn,
- is_u ? gen_gvec_usqadd_qc : gen_gvec_suqadd_qc, size);
-return;
-}
-
-res = tcg_temp_new_i64();
-qc = tcg_temp_new_i64();
-a = tcg_temp_new_i64();
-b = tcg_temp_new_i64();
-
-/* Read and extend scalar inputs to 64-bits. */
-read_vec_element(s, a, rd, 0, size | (is_u ? 0 : MO_SIGN));
-read_vec_element(s, b, rn, 0, size | (is_u ? MO_SIGN : 0));
-tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-
-if (size == MO_64) {
-if (is_u) {
-gen_usqadd_d(res, qc, a, b);
-} else {
-gen_suqadd_d(res, qc, a, b);
-}
-} else {
-if (is_u) {
-gen_usqadd_bhs(res, qc, a, b, size);
-} else {
-gen_suqadd_bhs(res, qc, a, b, size);
-/* Truncate signed 64-bit result for writeback. */
-tcg_gen_ext_i64(res, res, size);
-}
-}
-
-write_fp_dreg(s, rd, res);
-tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-}
-
 /* AdvSIMD scalar two reg misc
  *  31 30  29 28   24 23  22 21   17 1612 11 10 95 40
  *

[PATCH v3 07/33] target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  | 11 
 target/arm/tcg/translate-a64.c | 96 +++---
 2 files changed, 64 insertions(+), 43 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index f48adef5bb..19010af03b 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -44,6 +44,7 @@
 @rrr_h   ... rm:5 .. rn:5 rd:5  _e esz=1
 @rrr_sd  ... rm:5 .. rn:5 rd:5  _e esz=%esz_sd
 @rrr_hsd ... rm:5 .. rn:5 rd:5  _e esz=%esz_hsd
+@rrr_e   esz:2 . rm:5 .. rn:5 rd:5  _e
 
 @rrx_h   .. .. rm:4  . . rn:5 rd:5  _e esz=1 idx=%hlm
 @rrx_s   .. . rm:5   . . rn:5 rd:5  _e esz=2 idx=%hl
@@ -744,6 +745,11 @@ FRECPS_s0101 1110 0.1 . 1 1 . . 
@rrr_sd
 FRSQRTS_s   0101 1110 110 . 00111 1 . . @rrr_h
 FRSQRTS_s   0101 1110 1.1 . 1 1 . . @rrr_sd
 
+SQADD_s 0101 1110 ..1 . 1 1 . . @rrr_e
+UQADD_s 0111 1110 ..1 . 1 1 . . @rrr_e
+SQSUB_s 0101 1110 ..1 . 00101 1 . . @rrr_e
+UQSUB_s 0111 1110 ..1 . 00101 1 . . @rrr_e
+
 ### Advanced SIMD scalar pairwise
 
 FADDP_s 0101 1110 0011  1101 10 . . @rr_h
@@ -857,6 +863,11 @@ BSL_v   0.10 1110 011 . 00011 1 . . 
@qrrr_b
 BIT_v   0.10 1110 101 . 00011 1 . . @qrrr_b
 BIF_v   0.10 1110 111 . 00011 1 . . @qrrr_b
 
+SQADD_v 0.00 1110 ..1 . 1 1 . . @qrrr_e
+UQADD_v 0.10 1110 ..1 . 1 1 . . @qrrr_e
+SQSUB_v 0.00 1110 ..1 . 00101 1 . . @qrrr_e
+UQSUB_v 0.10 1110 ..1 . 00101 1 . . @qrrr_e
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index ca7ba6b1e8..3956c41543 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5060,6 +5060,43 @@ static const FPScalar f_scalar_frsqrts = {
 };
 TRANS(FRSQRTS_s, do_fp3_scalar, a, _scalar_frsqrts)
 
+static bool do_satacc_s(DisasContext *s, arg_rrr_e *a,
+MemOp sgn_n, MemOp sgn_m,
+void (*gen_bhs)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64, MemOp),
+void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
+{
+TCGv_i64 t0, t1, t2, qc;
+MemOp esz = a->esz;
+
+if (!fp_access_check(s)) {
+return true;
+}
+
+t0 = tcg_temp_new_i64();
+t1 = tcg_temp_new_i64();
+t2 = tcg_temp_new_i64();
+qc = tcg_temp_new_i64();
+read_vec_element(s, t1, a->rn, 0, esz | sgn_n);
+read_vec_element(s, t2, a->rm, 0, esz | sgn_m);
+tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
+
+if (esz == MO_64) {
+gen_d(t0, qc, t1, t2);
+} else {
+gen_bhs(t0, qc, t1, t2, esz);
+tcg_gen_ext_i64(t0, t0, esz);
+}
+
+write_fp_dreg(s, a->rd, t0);
+tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
+return true;
+}
+
+TRANS(SQADD_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqadd_bhs, gen_sqadd_d)
+TRANS(SQSUB_s, do_satacc_s, a, MO_SIGN, MO_SIGN, gen_sqsub_bhs, gen_sqsub_d)
+TRANS(UQADD_s, do_satacc_s, a, 0, 0, gen_uqadd_bhs, gen_uqadd_d)
+TRANS(UQSUB_s, do_satacc_s, a, 0, 0, gen_uqsub_bhs, gen_uqsub_d)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5298,6 +5335,11 @@ TRANS(BSL_v, do_bitsel, a->q, a->rd, a->rd, a->rn, a->rm)
 TRANS(BIT_v, do_bitsel, a->q, a->rd, a->rm, a->rn, a->rd)
 TRANS(BIF_v, do_bitsel, a->q, a->rd, a->rm, a->rd, a->rn)
 
+TRANS(SQADD_v, do_gvec_fn3, a, gen_gvec_sqadd_qc)
+TRANS(UQADD_v, do_gvec_fn3, a, gen_gvec_uqadd_qc)
+TRANS(SQSUB_v, do_gvec_fn3, a, gen_gvec_sqsub_qc)
+TRANS(UQSUB_v, do_gvec_fn3, a, gen_gvec_uqsub_qc)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -9291,29 +9333,8 @@ static void handle_3same_64(DisasContext *s, int opcode, 
bool u,
  * or scalar-three-reg-same groups.
  */
 TCGCond cond;
-TCGv_i64 qc;
 
 switch (opcode) {
-case 0x1: /* SQADD */
-qc = tcg_temp_new_i64();
-tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-if (u) {
-gen_uqadd_d(tcg_rd, qc, tcg_rn, tcg_rm);
-} else {
-gen_sqadd_d(tcg_rd, qc, tcg_rn, tcg_rm);
-}
-tcg_gen_st_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-break;
-case 0x5: /* SQSUB */
-qc = tcg_temp_new_i64();
-tcg_gen_ld_i64(qc, tcg_env, offsetof(CPUARMState, vfp.qc));
-if (u) {
-gen_uqsub_d(tcg_rd, qc, tcg_rn, tcg_rm);
-} else {
-gen_sqsub_d(tcg_rd, qc, tcg_rn, tcg_rm);
-

[PATCH v3 25/33] target/arm: Convert SRHADD, URHADD to decodetree

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/a64.decode  |  2 ++
 target/arm/tcg/translate-a64.c | 11 +++
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index b1bbcb144e..1c448b4f7c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -916,6 +916,8 @@ SHADD_v 0.00 1110 ..1 . 0 1 . . 
@qrrr_e
 UHADD_v 0.10 1110 ..1 . 0 1 . . @qrrr_e
 SHSUB_v 0.00 1110 ..1 . 00100 1 . . @qrrr_e
 UHSUB_v 0.10 1110 ..1 . 00100 1 . . @qrrr_e
+SRHADD_v0.00 1110 ..1 . 00010 1 . . @qrrr_e
+URHADD_v0.10 1110 ..1 . 00010 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 40aa7a9d57..9ef5de6755 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5458,6 +5458,8 @@ TRANS(SHADD_v, do_gvec_fn3_no64, a, gen_gvec_shadd)
 TRANS(UHADD_v, do_gvec_fn3_no64, a, gen_gvec_uhadd)
 TRANS(SHSUB_v, do_gvec_fn3_no64, a, gen_gvec_shsub)
 TRANS(UHSUB_v, do_gvec_fn3_no64, a, gen_gvec_uhsub)
+TRANS(SRHADD_v, do_gvec_fn3_no64, a, gen_gvec_srhadd)
+TRANS(URHADD_v, do_gvec_fn3_no64, a, gen_gvec_urhadd)
 
 static bool do_cmop_v(DisasContext *s, arg_qrrr_e *a, TCGCond cond)
 {
@@ -10923,7 +10925,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 return;
 }
 /* fall through */
-case 0x2: /* SRHADD, URHADD */
 case 0xc: /* SMAX, UMAX */
 case 0xd: /* SMIN, UMIN */
 case 0xe: /* SABD, UABD */
@@ -10949,6 +10950,7 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 
 case 0x0: /* SHADD, UHADD */
 case 0x01: /* SQADD, UQADD */
+case 0x02: /* SRHADD, URHADD */
 case 0x04: /* SHSUB, UHSUB */
 case 0x05: /* SQSUB, UQSUB */
 case 0x06: /* CMGT, CMHI */
@@ -10968,13 +10970,6 @@ static void disas_simd_3same_int(DisasContext *s, 
uint32_t insn)
 }
 
 switch (opcode) {
-case 0x02: /* SRHADD, URHADD */
-if (u) {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_urhadd, size);
-} else {
-gen_gvec_fn3(s, is_q, rd, rn, rm, gen_gvec_srhadd, size);
-}
-return;
 case 0x0c: /* SMAX, UMAX */
 if (u) {
 gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_umax, size);
-- 
2.34.1

[PATCH v3 20/33] target/arm: Convert SHADD, UHADD to gvec

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h |   6 --
 target/arm/tcg/translate.h  |   5 ++
 target/arm/tcg/gengvec.c| 144 
 target/arm/tcg/neon_helper.c|  27 --
 target/arm/tcg/translate-a64.c  |  17 ++--
 target/arm/tcg/translate-neon.c |   4 +-
 6 files changed, 158 insertions(+), 45 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 9a89c9cea7..b26bfcb079 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -268,12 +268,6 @@ DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
 DEF_HELPER_FLAGS_3(check_hcr_el2_trap, TCG_CALL_NO_WG, void, env, i32, i32)
 
 /* neon_helper.c */
-DEF_HELPER_2(neon_hadd_s8, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_u8, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_s16, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_u16, i32, i32, i32)
-DEF_HELPER_2(neon_hadd_s32, s32, s32, s32)
-DEF_HELPER_2(neon_hadd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_rhadd_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rhadd_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rhadd_s16, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 048cb45ebe..dd99d76bf2 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -472,6 +472,11 @@ void gen_neon_sqrshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 void gen_neon_uqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
  uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
+void gen_gvec_shadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_uhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
 void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 2451d23823..c0627a787b 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1861,3 +1861,147 @@ void gen_gvec_uminp(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 tcg_debug_assert(vece <= MO_32);
 tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
 }
+
+static void gen_shadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_and_i64(t, a, b);
+tcg_gen_vec_sar8i_i64(a, a, 1);
+tcg_gen_vec_sar8i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+tcg_gen_vec_add8_i64(d, a, b);
+tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_shadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_and_i64(t, a, b);
+tcg_gen_vec_sar16i_i64(a, a, 1);
+tcg_gen_vec_sar16i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+tcg_gen_vec_add16_i64(d, a, b);
+tcg_gen_vec_add16_i64(d, d, t);
+}
+
+static void gen_shadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+TCGv_i32 t = tcg_temp_new_i32();
+
+tcg_gen_and_i32(t, a, b);
+tcg_gen_sari_i32(a, a, 1);
+tcg_gen_sari_i32(b, b, 1);
+tcg_gen_andi_i32(t, t, 1);
+tcg_gen_add_i32(d, a, b);
+tcg_gen_add_i32(d, d, t);
+}
+
+static void gen_shadd_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+tcg_gen_and_vec(vece, t, a, b);
+tcg_gen_sari_vec(vece, a, a, 1);
+tcg_gen_sari_vec(vece, b, b, 1);
+tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+tcg_gen_add_vec(vece, d, a, b);
+tcg_gen_add_vec(vece, d, d, t);
+}
+
+void gen_gvec_shadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_sari_vec, INDEX_op_add_vec, 0
+};
+static const GVecGen3 g[] = {
+{ .fni8 = gen_shadd8_i64,
+  .fniv = gen_shadd_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_8 },
+{ .fni8 = gen_shadd16_i64,
+  .fniv = gen_shadd_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_16 },
+{ .fni4 = gen_shadd_i32,
+  .fniv = gen_shadd_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_32 },
+};
+tcg_debug_assert(vece <= MO_32);
+tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
+}
+
+static void gen_uhadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_and_i64(t, a, b);
+tcg_gen_vec_shr8i_i64(a, a, 1);
+tcg_gen_vec_shr8i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+tcg_gen_vec_add8_i64(d, a, b);
+tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_uhadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+

[PATCH v3 14/33] target/arm: Convert SQRSHL and UQRSHL (register) to gvec

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h |  8 ++
 target/arm/tcg/translate.h  |  4 +++
 target/arm/tcg/neon-dp.decode   | 17 ++--
 target/arm/tcg/gengvec.c| 24 
 target/arm/tcg/neon_helper.c| 24 
 target/arm/tcg/translate-a64.c  | 17 +---
 target/arm/tcg/translate-neon.c | 49 ++---
 7 files changed, 71 insertions(+), 72 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index f345087ddb..9a89c9cea7 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -334,6 +334,14 @@ DEF_HELPER_FLAGS_5(neon_uqshl_b, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(neon_uqshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(neon_uqshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(neon_uqshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_sqrshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(neon_uqrshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
 DEF_HELPER_FLAGS_4(gvec_srshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_srshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 6c6d4d49e7..048cb45ebe 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -467,6 +467,10 @@ void gen_neon_sqshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_sqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_neon_uqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/neon-dp.decode b/target/arm/tcg/neon-dp.decode
index 6d4996b8d8..788578c8fa 100644
--- a/target/arm/tcg/neon-dp.decode
+++ b/target/arm/tcg/neon-dp.decode
@@ -102,25 +102,12 @@ VCGE_U_3s 001 1 0 . ..   0011 . . . 1 
 @3same
 
 VSHL_S_3s 001 0 0 . ..   0100 . . . 0  @3same_rev
 VSHL_U_3s 001 1 0 . ..   0100 . . . 0  @3same_rev
-
-# Insns operating on 64-bit elements (size!=0b11 handled elsewhere)
-# The _rev suffix indicates that Vn and Vm are reversed (as explained
-# by the comment for the @3same_rev format).
-@3same_64_rev ... . . . 11    . q:1 . .  \
- &3same vm=%vn_dp vn=%vm_dp vd=%vd_dp size=3
-
 VQSHL_S_3s    001 0 0 . ..   0100 . . . 1  @3same_rev
 VQSHL_U_3s    001 1 0 . ..   0100 . . . 1  @3same_rev
 VRSHL_S_3s    001 0 0 . ..   0101 . . . 0  @3same_rev
 VRSHL_U_3s    001 1 0 . ..   0101 . . . 0  @3same_rev
-{
-  VQRSHL_S64_3s   001 0 0 . ..   0101 . . . 1  @3same_64_rev
-  VQRSHL_S_3s 001 0 0 . ..   0101 . . . 1  @3same_rev
-}
-{
-  VQRSHL_U64_3s   001 1 0 . ..   0101 . . . 1  @3same_64_rev
-  VQRSHL_U_3s 001 1 0 . ..   0101 . . . 1  @3same_rev
-}
+VQRSHL_S_3s   001 0 0 . ..   0101 . . . 1  @3same_rev
+VQRSHL_U_3s   001 1 0 . ..   0101 . . . 1  @3same_rev
 
 VMAX_S_3s 001 0 0 . ..   0110 . . . 0  @3same
 VMAX_U_3s 001 1 0 . ..   0110 . . . 0  @3same
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 63c3ec2e73..6dc96269d5 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1264,6 +1264,30 @@ void gen_neon_uqshl(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
opr_sz, max_sz, 0, fns[vece]);
 }
 
+void gen_neon_sqrshl(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[] = {
+gen_helper_neon_sqrshl_b, gen_helper_neon_sqrshl_h,
+gen_helper_neon_sqrshl_s, gen_helper_neon_sqrshl_d,
+

[PATCH v3 24/33] target/arm: Convert SRHADD, URHADD to gvec

2024-05-28 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h |   7 --
 target/arm/tcg/translate.h  |   4 +
 target/arm/tcg/gengvec.c| 144 
 target/arm/tcg/neon_helper.c|  27 --
 target/arm/tcg/translate-a64.c  |  48 ++-
 target/arm/tcg/translate-neon.c |  26 +-
 6 files changed, 158 insertions(+), 98 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index b95f24ed0a..85f9302563 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -268,13 +268,6 @@ DEF_HELPER_FLAGS_2(fjcvtzs, TCG_CALL_NO_RWG, i64, f64, ptr)
 DEF_HELPER_FLAGS_3(check_hcr_el2_trap, TCG_CALL_NO_WG, void, env, i32, i32)
 
 /* neon_helper.c */
-DEF_HELPER_2(neon_rhadd_s8, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_u8, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_s16, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_u16, i32, i32, i32)
-DEF_HELPER_2(neon_rhadd_s32, s32, s32, s32)
-DEF_HELPER_2(neon_rhadd_u32, i32, i32, i32)
-
 DEF_HELPER_2(neon_pmin_u8, i32, i32, i32)
 DEF_HELPER_2(neon_pmin_s8, i32, i32, i32)
 DEF_HELPER_2(neon_pmin_u16, i32, i32, i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 315e0afd04..3b1e68b779 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -480,6 +480,10 @@ void gen_gvec_shsub(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
 uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_srhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_urhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index c46365c3a6..119826bf28 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -2149,3 +2149,147 @@ void gen_gvec_uhsub(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 assert(vece <= MO_32);
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
+
+static void gen_srhadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_or_i64(t, a, b);
+tcg_gen_vec_sar8i_i64(a, a, 1);
+tcg_gen_vec_sar8i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+tcg_gen_vec_add8_i64(d, a, b);
+tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_srhadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_or_i64(t, a, b);
+tcg_gen_vec_sar16i_i64(a, a, 1);
+tcg_gen_vec_sar16i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_16, 1));
+tcg_gen_vec_add16_i64(d, a, b);
+tcg_gen_vec_add16_i64(d, d, t);
+}
+
+static void gen_srhadd_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
+{
+TCGv_i32 t = tcg_temp_new_i32();
+
+tcg_gen_or_i32(t, a, b);
+tcg_gen_sari_i32(a, a, 1);
+tcg_gen_sari_i32(b, b, 1);
+tcg_gen_andi_i32(t, t, 1);
+tcg_gen_add_i32(d, a, b);
+tcg_gen_add_i32(d, d, t);
+}
+
+static void gen_srhadd_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec t = tcg_temp_new_vec_matching(d);
+
+tcg_gen_or_vec(vece, t, a, b);
+tcg_gen_sari_vec(vece, a, a, 1);
+tcg_gen_sari_vec(vece, b, b, 1);
+tcg_gen_and_vec(vece, t, t, tcg_constant_vec_matching(d, vece, 1));
+tcg_gen_add_vec(vece, d, a, b);
+tcg_gen_add_vec(vece, d, d, t);
+}
+
+void gen_gvec_srhadd(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+ uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_sari_vec, INDEX_op_add_vec, 0
+};
+static const GVecGen3 g[] = {
+{ .fni8 = gen_srhadd8_i64,
+  .fniv = gen_srhadd_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_8 },
+{ .fni8 = gen_srhadd16_i64,
+  .fniv = gen_srhadd_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_16 },
+{ .fni4 = gen_srhadd_i32,
+  .fniv = gen_srhadd_vec,
+  .opt_opc = vecop_list,
+  .vece = MO_32 },
+};
+assert(vece <= MO_32);
+tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
+}
+
+static void gen_urhadd8_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_or_i64(t, a, b);
+tcg_gen_vec_shr8i_i64(a, a, 1);
+tcg_gen_vec_shr8i_i64(b, b, 1);
+tcg_gen_andi_i64(t, t, dup_const(MO_8, 1));
+tcg_gen_vec_add8_i64(d, a, b);
+tcg_gen_vec_add8_i64(d, d, t);
+}
+
+static void gen_urhadd16_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 t = tcg_temp_new_i64();

[PATCH v3 00/33] target/arm: Convert a64 advsimd to decodetree (part 1b)

2024-05-28 Thread Richard Henderson

Changes for v3:
  * Reword prefetch unpredictable patch.
  * Validate vector length when qc is an implied operand.
  * Adjust some legacy decode based on review.
  * Apply r-b.

Patches needing review:
  01-target-arm-Diagnose-UNPREDICTABLE-operands-to-PLD.patch
  03-target-arm-Assert-oprsz-in-range-when-using-vfp.q.patch
  04-target-arm-Convert-SUQADD-and-USQADD-to-gvec.patch
  10-target-arm-Convert-SRSHL-and-URSHL-register-to-gv.patch
  12-target-arm-Convert-SQSHL-and-UQSHL-register-to-gv.patch
  31-target-arm-Convert-SQDMULH-SQRDMULH-to-decodetree.patch
  32-target-arm-Convert-FMADD-FMSUB-FNMADD-FNMSUB-to-d.patch


r~


Richard Henderson (33):
  target/arm: Diagnose UNPREDICTABLE operands to PLD, PLDW, PLI
  target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB
  target/arm: Assert oprsz in range when using vfp.qc
  target/arm: Convert SUQADD and USQADD to gvec
  target/arm: Inline scalar SUQADD and USQADD
  target/arm: Inline scalar SQADD, UQADD, SQSUB, UQSUB
  target/arm: Convert SQADD, SQSUB, UQADD, UQSUB to decodetree
  target/arm: Convert SUQADD, USQADD to decodetree
  target/arm: Convert SSHL, USHL to decodetree
  target/arm: Convert SRSHL and URSHL (register) to gvec
  target/arm: Convert SRSHL, URSHL to decodetree
  target/arm: Convert SQSHL and UQSHL (register) to gvec
  target/arm: Convert SQSHL, UQSHL to decodetree
  target/arm: Convert SQRSHL and UQRSHL (register) to gvec
  target/arm: Convert SQRSHL, UQRSHL to decodetree
  target/arm: Convert ADD, SUB (vector) to decodetree
  target/arm: Convert CMGT, CMHI, CMGE, CMHS, CMTST, CMEQ to decodetree
  target/arm: Use TCG_COND_TSTNE in gen_cmtst_{i32,i64}
  target/arm: Use TCG_COND_TSTNE in gen_cmtst_vec
  target/arm: Convert SHADD, UHADD to gvec
  target/arm: Convert SHADD, UHADD to decodetree
  target/arm: Convert SHSUB, UHSUB to gvec
  target/arm: Convert SHSUB, UHSUB to decodetree
  target/arm: Convert SRHADD, URHADD to gvec
  target/arm: Convert SRHADD, URHADD to decodetree
  target/arm: Convert SMAX, SMIN, UMAX, UMIN to decodetree
  target/arm: Convert SABA, SABD, UABA, UABD to decodetree
  target/arm: Convert MUL, PMUL to decodetree
  target/arm: Convert MLA, MLS to decodetree
  target/arm: Tidy SQDMULH, SQRDMULH (vector)
  target/arm: Convert SQDMULH, SQRDMULH to decodetree
  target/arm: Convert FMADD, FMSUB, FNMADD, FNMSUB to decodetree
  target/arm: Convert FCSEL to decodetree

 target/arm/helper.h  |   96 ++-
 target/arm/tcg/translate-a64.h   |   14 +
 target/arm/tcg/translate.h   |   44 +
 target/arm/tcg/a32-uncond.decode |8 +-
 target/arm/tcg/a64.decode|  115 +++
 target/arm/tcg/neon-dp.decode|   37 +-
 target/arm/tcg/t32.decode|7 +-
 target/arm/tcg/gengvec.c |  689 +++-
 target/arm/tcg/gengvec64.c   |  181 
 target/arm/tcg/neon_helper.c |  506 +++-
 target/arm/tcg/translate-a64.c   | 1321 ++
 target/arm/tcg/translate-neon.c  |  118 +--
 target/arm/tcg/translate.c   |   58 ++
 target/arm/tcg/vec_helper.c  |  128 +++
 14 files changed, 1829 insertions(+), 1493 deletions(-)

-- 
2.34.1

[PATCH v3 05/33] target/arm: Inline scalar SUQADD and USQADD

2024-05-28 Thread Richard Henderson

This eliminates the last uses of these neon helpers.
Incorporate the MO_64 expanders as an option to the vector expander.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|   8 --
 target/arm/tcg/translate-a64.h |   8 ++
 target/arm/tcg/gengvec64.c |  71 ++
 target/arm/tcg/neon_helper.c   | 165 -
 target/arm/tcg/translate-a64.c |  73 +--
 5 files changed, 103 insertions(+), 222 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index de2c5c9aef..c76158d6d3 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -274,14 +274,6 @@ DEF_HELPER_FLAGS_3(neon_qadd_u16, TCG_CALL_NO_RWG, i32, 
env, i32, i32)
 DEF_HELPER_FLAGS_3(neon_qadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32)
 DEF_HELPER_FLAGS_3(neon_qadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32)
 DEF_HELPER_FLAGS_3(neon_qadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_uqadd_s64, TCG_CALL_NO_RWG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_3(neon_sqadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_sqadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_sqadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32)
-DEF_HELPER_FLAGS_3(neon_sqadd_u64, TCG_CALL_NO_RWG, i64, env, i64, i64)
 DEF_HELPER_3(neon_qsub_u8, i32, env, i32, i32)
 DEF_HELPER_3(neon_qsub_s8, i32, env, i32, i32)
 DEF_HELPER_3(neon_qsub_u16, i32, env, i32, i32)
diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index b5cb26f8a2..0fcf7cb63a 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -197,9 +197,17 @@ void gen_gvec_eor3(unsigned vece, uint32_t d, uint32_t n, 
uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
 void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
+
+void gen_suqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_suqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b);
 void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
 uint32_t rn_ofs, uint32_t rm_ofs,
 uint32_t opr_sz, uint32_t max_sz);
+
+void gen_usqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+TCGv_i64 a, TCGv_i64 b, MemOp esz);
+void gen_usqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b);
 void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs,
 uint32_t rn_ofs, uint32_t rm_ofs,
 uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c
index b3afabd38b..2617cde0a5 100644
--- a/target/arm/tcg/gengvec64.c
+++ b/target/arm/tcg/gengvec64.c
@@ -188,6 +188,38 @@ void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, 
uint32_t m,
 tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, );
 }
 
+/*
+ * Set @res to the correctly saturated result.
+ * Set @qc non-zero if saturation occured.
+ */
+void gen_suqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+TCGv_i64 max = tcg_constant_i64((1ull << ((8 << esz) - 1)) - 1);
+TCGv_i64 t = tcg_temp_new_i64();
+
+tcg_gen_add_i64(t, a, b);
+tcg_gen_smin_i64(res, t, max);
+tcg_gen_xor_i64(t, t, res);
+tcg_gen_or_i64(qc, qc, t);
+}
+
+void gen_suqadd_d(TCGv_i64 res, TCGv_i64 qc, TCGv_i64 a, TCGv_i64 b)
+{
+TCGv_i64 max = tcg_constant_i64(INT64_MAX);
+TCGv_i64 t = tcg_temp_new_i64();
+
+/* Maximum value that can be added to @a without overflow. */
+tcg_gen_sub_i64(t, max, a);
+
+/* Constrain addend so that the next addition never overflows. */
+tcg_gen_umin_i64(t, t, b);
+tcg_gen_add_i64(res, a, t);
+
+tcg_gen_xor_i64(t, t, b);
+tcg_gen_or_i64(qc, qc, t);
+}
+
 static void gen_suqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
TCGv_vec a, TCGv_vec b)
 {
@@ -231,6 +263,7 @@ void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
   .write_aofs = true,
   .vece = MO_32 },
 { .fniv = gen_suqadd_vec,
+  .fni8 = gen_suqadd_d,
   .fno = gen_helper_gvec_suqadd_d,
   .opt_opc = vecop_list,
   .write_aofs = true,
@@ -242,6 +275,43 @@ void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
 }
 
+void gen_usqadd_bhs(TCGv_i64 res, TCGv_i64 qc,
+TCGv_i64 a, TCGv_i64 b, MemOp esz)
+{
+TCGv_i64 max = tcg_constant_i64(MAKE_64BIT_MASK(0, 8 << esz));
+TCGv_i64 zero = tcg_constant_i64(0);
+TCGv_i64 tmp = tcg_temp_new_i64();
+
+tcg_gen_add_i64(tmp, a, b);
+

[PATCH v3 04/33] target/arm: Convert SUQADD and USQADD to gvec

2024-05-28 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  16 +
 target/arm/tcg/translate-a64.h |   6 ++
 target/arm/tcg/gengvec64.c | 110 
 target/arm/tcg/translate-a64.c | 113 ++---
 target/arm/tcg/vec_helper.c|  64 +++
 5 files changed, 245 insertions(+), 64 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index f830531dd3..de2c5c9aef 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -836,6 +836,22 @@ DEF_HELPER_FLAGS_5(gvec_sqsub_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_sqsub_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_b, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_usqadd_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_b, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_suqadd_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fmlal_a32, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index 91750f0ca9..b5cb26f8a2 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -197,6 +197,12 @@ void gen_gvec_eor3(unsigned vece, uint32_t d, uint32_t n, 
uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
 void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
uint32_t a, uint32_t oprsz, uint32_t maxsz);
+void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
+uint32_t rn_ofs, uint32_t rm_ofs,
+uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_usqadd_qc(unsigned vece, uint32_t rd_ofs,
+uint32_t rn_ofs, uint32_t rm_ofs,
+uint32_t opr_sz, uint32_t max_sz);
 
 void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int 
imm);
 void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int 
imm);
diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c
index 093b498b13..b3afabd38b 100644
--- a/target/arm/tcg/gengvec64.c
+++ b/target/arm/tcg/gengvec64.c
@@ -188,3 +188,113 @@ void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, 
uint32_t m,
 tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, );
 }
 
+static void gen_suqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
+   TCGv_vec a, TCGv_vec b)
+{
+TCGv_vec max =
+tcg_constant_vec_matching(t, vece, (1ull << ((8 << vece) - 1)) - 1);
+TCGv_vec u = tcg_temp_new_vec_matching(t);
+
+/* Maximum value that can be added to @a without overflow. */
+tcg_gen_sub_vec(vece, u, max, a);
+
+/* Constrain addend so that the next addition never overflows. */
+tcg_gen_umin_vec(vece, u, u, b);
+tcg_gen_add_vec(vece, t, u, a);
+
+/* Compute QC by comparing the adjusted @b. */
+tcg_gen_xor_vec(vece, u, u, b);
+tcg_gen_or_vec(vece, qc, qc, u);
+}
+
+void gen_gvec_suqadd_qc(unsigned vece, uint32_t rd_ofs,
+uint32_t rn_ofs, uint32_t rm_ofs,
+uint32_t opr_sz, uint32_t max_sz)
+{
+static const TCGOpcode vecop_list[] = {
+INDEX_op_add_vec, INDEX_op_sub_vec, INDEX_op_umin_vec, 0
+};
+static const GVecGen4 ops[4] = {
+{ .fniv = gen_suqadd_vec,
+  .fno = gen_helper_gvec_suqadd_b,
+  .opt_opc = vecop_list,
+  .write_aofs = true,
+  .vece = MO_8 },
+{ .fniv = gen_suqadd_vec,
+  .fno = gen_helper_gvec_suqadd_h,
+  .opt_opc = vecop_list,
+  .write_aofs = true,
+  .vece = MO_16 },
+{ .fniv = gen_suqadd_vec,
+  .fno = gen_helper_gvec_suqadd_s,
+  .opt_opc = vecop_list,
+  .write_aofs = true,
+  .vece = MO_32 },
+{ .fniv = gen_suqadd_vec,
+  .fno = gen_helper_gvec_suqadd_d,
+  .opt_opc = vecop_list,
+  .write_aofs = true,
+  .vece = MO_64 },
+};
+
+tcg_debug_assert(opr_sz <= sizeof_field(CPUARMState, vfp.qc));
+tcg_gen_gvec_4(rd_ofs, offsetof(CPUARMState, vfp.qc),
+   rn_ofs, rm_ofs, opr_sz, max_sz, [vece]);
+}
+
+static void gen_usqadd_vec(unsigned vece, TCGv_vec t, TCGv_vec qc,
+   TCGv_vec a, TCGv_vec b)

Re: [PATCH 5/5] contrib/plugins: add ips plugin example for cost modeling

2024-05-28 Thread Pierrick Bouvier


On 5/28/24 12:57, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 5/28/24 12:14, Alex Bennée wrote:

Pierrick Bouvier  writes:


This plugin uses the new time control interface to make decisions
about the state of time during the emulation. The algorithm is
currently very simple. The user specifies an ips rate which applies
per core. If the core runs ahead of its allocated execution time the
plugin sleeps for a bit to let real time catch up. Either way time is
updated for the emulation as a function of total executed instructions
with some adjustments for cores that idle.

Examples


Slow down execution of /bin/true:
$ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin 
/bin/true |& grep total | sed -e 's/.*: //')
$ time ./build/qemu-x86_64 -plugin 
./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
real 4.000s

Boot a Linux kernel simulating a 250MHz cpu:
$ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
"console=ttyS0" -plugin 
./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
check time until kernel panic on serial0

Signed-off-by: Pierrick Bouvier 
---
   contrib/plugins/ips.c| 239 +++
   contrib/plugins/Makefile |   1 +
   2 files changed, 240 insertions(+)
   create mode 100644 contrib/plugins/ips.c

diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
new file mode 100644
index 000..cf3159df391
--- /dev/null
+++ b/contrib/plugins/ips.c
@@ -0,0 +1,239 @@
+/*
+ * ips rate limiting plugin.
+ *
+ * This plugin can be used to restrict the execution of a system to a
+ * particular number of Instructions Per Second (ips). This controls
+ * time as seen by the guest so while wall-clock time may be longer
+ * from the guests point of view time will pass at the normal rate.
+ *
+ * This uses the new plugin API which allows the plugin to control
+ * system time.
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+#include 
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+/* how many times do we update time per sec */
+#define NUM_TIME_UPDATE_PER_SEC 10
+#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
+
+static GMutex global_state_lock;
+
+static uint64_t insn_per_second = 1000 * 1000; /* ips per core, per second */
+static uint64_t insn_quantum; /* trap every N instructions */
+static bool precise_execution; /* count every instruction */
+static int64_t start_time_ns; /* time (ns since epoch) first vCPU started */
+static int64_t virtual_time_ns; /* last set virtual time */
+
+static const void *time_handle;
+
+typedef enum {
+UNKNOWN = 0,
+EXECUTING,
+IDLE,
+FINISHED
+} vCPUState;
+
+typedef struct {
+uint64_t counter;
+uint64_t track_insn;
+vCPUState state;
+/* timestamp when vCPU entered state */
+int64_t last_state_time;
+} vCPUTime;
+
+struct qemu_plugin_scoreboard *vcpus;
+
+/* return epoch time in ns */
+static int64_t now_ns(void)
+{
+return g_get_real_time() * 1000;
+}
+
+static uint64_t num_insn_during(int64_t elapsed_ns)
+{
+double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
+return num_secs * (double) insn_per_second;
+}
+
+static int64_t time_for_insn(uint64_t num_insn)
+{
+double num_secs = (double) num_insn / (double) insn_per_second;
+return num_secs * (double) NSEC_IN_ONE_SEC;
+}
+
+static int64_t uptime_ns(void)
+{
+int64_t now = now_ns();
+g_assert(now >= start_time_ns);
+return now - start_time_ns;
+}
+
+static void vcpu_set_state(vCPUTime *vcpu, vCPUState new_state)
+{
+vcpu->last_state_time = now_ns();
+vcpu->state = new_state;
+}
+
+static void update_system_time(vCPUTime *vcpu)
+{
+/* flush remaining instructions */
+vcpu->counter += vcpu->track_insn;
+vcpu->track_insn = 0;
+
+int64_t uptime = uptime_ns();
+uint64_t expected_insn = num_insn_during(uptime);
+
+if (vcpu->counter >= expected_insn) {
+/* this vcpu ran faster than expected, so it has to sleep */
+uint64_t insn_advance = vcpu->counter - expected_insn;
+uint64_t time_advance_ns = time_for_insn(insn_advance);
+int64_t sleep_us = time_advance_ns / 1000;
+g_usleep(sleep_us);
+}
+
+/* based on number of instructions, what should be the new time? */
+int64_t new_virtual_time = time_for_insn(vcpu->counter);
+
+g_mutex_lock(_state_lock);
+
+/* Time only moves forward. Another vcpu might have updated it already. */
+if (new_virtual_time > virtual_time_ns) {
+qemu_plugin_update_ns(time_handle, new_virtual_time);
+virtual_time_ns = new_virtual_time;
+}
+
+g_mutex_unlock(_state_lock);
+}
+
+static void set_start_time()
+{
+g_mutex_lock(_state_lock);
+if (!start_time_ns) {
+start_time_ns = now_ns();
+}
+g_mutex_unlock(_state_lock);
+}
+
+static void vcpu_init(qemu_plugin_id_t id,

Re: [PULL v2 0/7] Block jobs patches for 2024-04-29

2024-05-28 Thread Richard Henderson


On 5/28/24 06:57, Vladimir Sementsov-Ogievskiy wrote:

The following changes since commit ad10b4badc1dd5b28305f9b9f1168cf0aa3ae946:

   Merge tag 'pull-error-2024-05-27' ofhttps://repo.or.cz/qemu/armbru  into 
staging (2024-05-27 06:40:42 -0700)

are available in the Git repository at:

   https://gitlab.com/vsementsov/qemu.git  tags/pull-block-jobs-2024-04-29-v2

for you to fetch changes up to a149401048481247bcbaf6035a7a1308974fb464:

   iotests/pylintrc: allow up to 10 similar lines (2024-05-28 15:52:15 +0300)


Block jobs patches for 2024-04-29

v2: add "iotests/pylintrc: allow up to 10 similar lines" to fix
 check-python-minreqs

- backup: discard-source parameter
- blockcommit: Reopen base image as RO after abort


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH 5/5] contrib/plugins: add ips plugin example for cost modeling

2024-05-28 Thread Alex Bennée

Pierrick Bouvier  writes:

> On 5/28/24 12:14, Alex Bennée wrote:
>> Pierrick Bouvier  writes:
>> 
>>> This plugin uses the new time control interface to make decisions
>>> about the state of time during the emulation. The algorithm is
>>> currently very simple. The user specifies an ips rate which applies
>>> per core. If the core runs ahead of its allocated execution time the
>>> plugin sleeps for a bit to let real time catch up. Either way time is
>>> updated for the emulation as a function of total executed instructions
>>> with some adjustments for cores that idle.
>>>
>>> Examples
>>> 
>>>
>>> Slow down execution of /bin/true:
>>> $ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d 
>>> plugin /bin/true |& grep total | sed -e 's/.*: //')
>>> $ time ./build/qemu-x86_64 -plugin 
>>> ./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
>>> real 4.000s
>>>
>>> Boot a Linux kernel simulating a 250MHz cpu:
>>> $ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
>>> "console=ttyS0" -plugin 
>>> ./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
>>> check time until kernel panic on serial0
>>>
>>> Signed-off-by: Pierrick Bouvier 
>>> ---
>>>   contrib/plugins/ips.c| 239 +++
>>>   contrib/plugins/Makefile |   1 +
>>>   2 files changed, 240 insertions(+)
>>>   create mode 100644 contrib/plugins/ips.c
>>>
>>> diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
>>> new file mode 100644
>>> index 000..cf3159df391
>>> --- /dev/null
>>> +++ b/contrib/plugins/ips.c
>>> @@ -0,0 +1,239 @@
>>> +/*
>>> + * ips rate limiting plugin.
>>> + *
>>> + * This plugin can be used to restrict the execution of a system to a
>>> + * particular number of Instructions Per Second (ips). This controls
>>> + * time as seen by the guest so while wall-clock time may be longer
>>> + * from the guests point of view time will pass at the normal rate.
>>> + *
>>> + * This uses the new plugin API which allows the plugin to control
>>> + * system time.
>>> + *
>>> + * Copyright (c) 2023 Linaro Ltd
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>> +
>>> +/* how many times do we update time per sec */
>>> +#define NUM_TIME_UPDATE_PER_SEC 10
>>> +#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
>>> +
>>> +static GMutex global_state_lock;
>>> +
>>> +static uint64_t insn_per_second = 1000 * 1000; /* ips per core, per second 
>>> */
>>> +static uint64_t insn_quantum; /* trap every N instructions */
>>> +static bool precise_execution; /* count every instruction */
>>> +static int64_t start_time_ns; /* time (ns since epoch) first vCPU started 
>>> */
>>> +static int64_t virtual_time_ns; /* last set virtual time */
>>> +
>>> +static const void *time_handle;
>>> +
>>> +typedef enum {
>>> +UNKNOWN = 0,
>>> +EXECUTING,
>>> +IDLE,
>>> +FINISHED
>>> +} vCPUState;
>>> +
>>> +typedef struct {
>>> +uint64_t counter;
>>> +uint64_t track_insn;
>>> +vCPUState state;
>>> +/* timestamp when vCPU entered state */
>>> +int64_t last_state_time;
>>> +} vCPUTime;
>>> +
>>> +struct qemu_plugin_scoreboard *vcpus;
>>> +
>>> +/* return epoch time in ns */
>>> +static int64_t now_ns(void)
>>> +{
>>> +return g_get_real_time() * 1000;
>>> +}
>>> +
>>> +static uint64_t num_insn_during(int64_t elapsed_ns)
>>> +{
>>> +double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
>>> +return num_secs * (double) insn_per_second;
>>> +}
>>> +
>>> +static int64_t time_for_insn(uint64_t num_insn)
>>> +{
>>> +double num_secs = (double) num_insn / (double) insn_per_second;
>>> +return num_secs * (double) NSEC_IN_ONE_SEC;
>>> +}
>>> +
>>> +static int64_t uptime_ns(void)
>>> +{
>>> +int64_t now = now_ns();
>>> +g_assert(now >= start_time_ns);
>>> +return now - start_time_ns;
>>> +}
>>> +
>>> +static void vcpu_set_state(vCPUTime *vcpu, vCPUState new_state)
>>> +{
>>> +vcpu->last_state_time = now_ns();
>>> +vcpu->state = new_state;
>>> +}
>>> +
>>> +static void update_system_time(vCPUTime *vcpu)
>>> +{
>>> +/* flush remaining instructions */
>>> +vcpu->counter += vcpu->track_insn;
>>> +vcpu->track_insn = 0;
>>> +
>>> +int64_t uptime = uptime_ns();
>>> +uint64_t expected_insn = num_insn_during(uptime);
>>> +
>>> +if (vcpu->counter >= expected_insn) {
>>> +/* this vcpu ran faster than expected, so it has to sleep */
>>> +uint64_t insn_advance = vcpu->counter - expected_insn;
>>> +uint64_t time_advance_ns = time_for_insn(insn_advance);
>>> +int64_t sleep_us = time_advance_ns / 1000;
>>> +g_usleep(sleep_us);
>>> +}
>>> +
>>> +/* based on number of instructions, what should be the new time? */
>>> +int64_t new_virtual_time = time_for_insn(vcpu->counter);
>>> +
>>> +

Re: [PATCH 5/5] contrib/plugins: add ips plugin example for cost modeling

2024-05-28 Thread Pierrick Bouvier


On 5/28/24 12:14, Alex Bennée wrote:

Pierrick Bouvier  writes:


This plugin uses the new time control interface to make decisions
about the state of time during the emulation. The algorithm is
currently very simple. The user specifies an ips rate which applies
per core. If the core runs ahead of its allocated execution time the
plugin sleeps for a bit to let real time catch up. Either way time is
updated for the emulation as a function of total executed instructions
with some adjustments for cores that idle.

Examples


Slow down execution of /bin/true:
$ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin 
/bin/true |& grep total | sed -e 's/.*: //')
$ time ./build/qemu-x86_64 -plugin 
./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
real 4.000s

Boot a Linux kernel simulating a 250MHz cpu:
$ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
"console=ttyS0" -plugin 
./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
check time until kernel panic on serial0

Signed-off-by: Pierrick Bouvier 
---
  contrib/plugins/ips.c| 239 +++
  contrib/plugins/Makefile |   1 +
  2 files changed, 240 insertions(+)
  create mode 100644 contrib/plugins/ips.c

diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
new file mode 100644
index 000..cf3159df391
--- /dev/null
+++ b/contrib/plugins/ips.c
@@ -0,0 +1,239 @@
+/*
+ * ips rate limiting plugin.
+ *
+ * This plugin can be used to restrict the execution of a system to a
+ * particular number of Instructions Per Second (ips). This controls
+ * time as seen by the guest so while wall-clock time may be longer
+ * from the guests point of view time will pass at the normal rate.
+ *
+ * This uses the new plugin API which allows the plugin to control
+ * system time.
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+#include 
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+/* how many times do we update time per sec */
+#define NUM_TIME_UPDATE_PER_SEC 10
+#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
+
+static GMutex global_state_lock;
+
+static uint64_t insn_per_second = 1000 * 1000; /* ips per core, per second */
+static uint64_t insn_quantum; /* trap every N instructions */
+static bool precise_execution; /* count every instruction */
+static int64_t start_time_ns; /* time (ns since epoch) first vCPU started */
+static int64_t virtual_time_ns; /* last set virtual time */
+
+static const void *time_handle;
+
+typedef enum {
+UNKNOWN = 0,
+EXECUTING,
+IDLE,
+FINISHED
+} vCPUState;
+
+typedef struct {
+uint64_t counter;
+uint64_t track_insn;
+vCPUState state;
+/* timestamp when vCPU entered state */
+int64_t last_state_time;
+} vCPUTime;
+
+struct qemu_plugin_scoreboard *vcpus;
+
+/* return epoch time in ns */
+static int64_t now_ns(void)
+{
+return g_get_real_time() * 1000;
+}
+
+static uint64_t num_insn_during(int64_t elapsed_ns)
+{
+double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
+return num_secs * (double) insn_per_second;
+}
+
+static int64_t time_for_insn(uint64_t num_insn)
+{
+double num_secs = (double) num_insn / (double) insn_per_second;
+return num_secs * (double) NSEC_IN_ONE_SEC;
+}
+
+static int64_t uptime_ns(void)
+{
+int64_t now = now_ns();
+g_assert(now >= start_time_ns);
+return now - start_time_ns;
+}
+
+static void vcpu_set_state(vCPUTime *vcpu, vCPUState new_state)
+{
+vcpu->last_state_time = now_ns();
+vcpu->state = new_state;
+}
+
+static void update_system_time(vCPUTime *vcpu)
+{
+/* flush remaining instructions */
+vcpu->counter += vcpu->track_insn;
+vcpu->track_insn = 0;
+
+int64_t uptime = uptime_ns();
+uint64_t expected_insn = num_insn_during(uptime);
+
+if (vcpu->counter >= expected_insn) {
+/* this vcpu ran faster than expected, so it has to sleep */
+uint64_t insn_advance = vcpu->counter - expected_insn;
+uint64_t time_advance_ns = time_for_insn(insn_advance);
+int64_t sleep_us = time_advance_ns / 1000;
+g_usleep(sleep_us);
+}
+
+/* based on number of instructions, what should be the new time? */
+int64_t new_virtual_time = time_for_insn(vcpu->counter);
+
+g_mutex_lock(_state_lock);
+
+/* Time only moves forward. Another vcpu might have updated it already. */
+if (new_virtual_time > virtual_time_ns) {
+qemu_plugin_update_ns(time_handle, new_virtual_time);
+virtual_time_ns = new_virtual_time;
+}
+
+g_mutex_unlock(_state_lock);
+}
+
+static void set_start_time()
+{
+g_mutex_lock(_state_lock);
+if (!start_time_ns) {
+start_time_ns = now_ns();
+}
+g_mutex_unlock(_state_lock);
+}
+
+static void vcpu_init(qemu_plugin_id_t id, unsigned int cpu_index)
+{
+vCPUTime *vcpu =

Re: [PATCH v2 2/6] tests/qtest/migration-test: Fix and enable test_ignore_shared

2024-05-28 Thread Peter Xu

On Tue, May 28, 2024 at 10:42:06AM +1000, Nicholas Piggin wrote:
> This test is already starting to bitrot, so first remove it from ifdef
> and fix compile issues. ppc64 transfers about 2MB, so bump the size
> threshold too.
> 
> It was said to be broken on aarch64 but it may have been the limited shm
> size under gitlab CI. The test is now excluded from running on CI so it
> shouldn't cause too much annoyance.
> 
> So let's try enable it.
> 
> Cc: Yury Kotov 
> Cc: Dr. David Alan Gilbert 

Dave's new email is:

d...@treblig.org

Please feel free to use it in a repost.

Thanks,

> Signed-off-by: Nicholas Piggin 
> ---
>  tests/qtest/migration-test.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 04bf1c0092..8247ed98f2 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1893,14 +1893,15 @@ static void 
> test_precopy_unix_tls_x509_override_host(void)
>  #endif /* CONFIG_TASN1 */
>  #endif /* CONFIG_GNUTLS */
>  
> -#if 0
> -/* Currently upset on aarch64 TCG */
>  static void test_ignore_shared(void)
>  {
>  g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
>  QTestState *from, *to;
> +MigrateStart args = {
> +.use_shmem = true,
> +};
>  
> -if (test_migrate_start(, , uri, false, true, NULL, NULL)) {
> +if (test_migrate_start(, , uri, )) {
>  return;
>  }
>  
> @@ -1925,11 +1926,11 @@ static void test_ignore_shared(void)
>  wait_for_migration_complete(from);
>  
>  /* Check whether shared RAM has been really skipped */
> -g_assert_cmpint(read_ram_property_int(from, "transferred"), <, 1024 * 
> 1024);
> +g_assert_cmpint(read_ram_property_int(from, "transferred"), <,
> +   4 * 1024 * 1024);
>  
>  test_migrate_end(from, to, true);
>  }
> -#endif
>  
>  static void *
>  test_migrate_xbzrle_start(QTestState *from,
> @@ -3580,7 +3581,8 @@ int main(int argc, char **argv)
>  #endif /* CONFIG_TASN1 */
>  #endif /* CONFIG_GNUTLS */
>  
> -/* migration_test_add("/migration/ignore_shared", test_ignore_shared); */
> +migration_test_add("/migration/ignore_shared", test_ignore_shared);
> +
>  #ifndef _WIN32
>  migration_test_add("/migration/precopy/fd/tcp",
> test_migrate_precopy_fd_socket);
> -- 
> 2.43.0
> 

-- 
Peter Xu

Re: [PATCH 5/5] contrib/plugins: add ips plugin example for cost modeling

2024-05-28 Thread Alex Bennée

Pierrick Bouvier  writes:

> This plugin uses the new time control interface to make decisions
> about the state of time during the emulation. The algorithm is
> currently very simple. The user specifies an ips rate which applies
> per core. If the core runs ahead of its allocated execution time the
> plugin sleeps for a bit to let real time catch up. Either way time is
> updated for the emulation as a function of total executed instructions
> with some adjustments for cores that idle.
>
> Examples
> 
>
> Slow down execution of /bin/true:
> $ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d 
> plugin /bin/true |& grep total | sed -e 's/.*: //')
> $ time ./build/qemu-x86_64 -plugin 
> ./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
> real 4.000s
>
> Boot a Linux kernel simulating a 250MHz cpu:
> $ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
> "console=ttyS0" -plugin 
> ./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
> check time until kernel panic on serial0
>
> Signed-off-by: Pierrick Bouvier 
> ---
>  contrib/plugins/ips.c| 239 +++
>  contrib/plugins/Makefile |   1 +
>  2 files changed, 240 insertions(+)
>  create mode 100644 contrib/plugins/ips.c
>
> diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
> new file mode 100644
> index 000..cf3159df391
> --- /dev/null
> +++ b/contrib/plugins/ips.c
> @@ -0,0 +1,239 @@
> +/*
> + * ips rate limiting plugin.
> + *
> + * This plugin can be used to restrict the execution of a system to a
> + * particular number of Instructions Per Second (ips). This controls
> + * time as seen by the guest so while wall-clock time may be longer
> + * from the guests point of view time will pass at the normal rate.
> + *
> + * This uses the new plugin API which allows the plugin to control
> + * system time.
> + *
> + * Copyright (c) 2023 Linaro Ltd
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +/* how many times do we update time per sec */
> +#define NUM_TIME_UPDATE_PER_SEC 10
> +#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
> +
> +static GMutex global_state_lock;
> +
> +static uint64_t insn_per_second = 1000 * 1000; /* ips per core, per second */
> +static uint64_t insn_quantum; /* trap every N instructions */
> +static bool precise_execution; /* count every instruction */
> +static int64_t start_time_ns; /* time (ns since epoch) first vCPU started */
> +static int64_t virtual_time_ns; /* last set virtual time */
> +
> +static const void *time_handle;
> +
> +typedef enum {
> +UNKNOWN = 0,
> +EXECUTING,
> +IDLE,
> +FINISHED
> +} vCPUState;
> +
> +typedef struct {
> +uint64_t counter;
> +uint64_t track_insn;
> +vCPUState state;
> +/* timestamp when vCPU entered state */
> +int64_t last_state_time;
> +} vCPUTime;
> +
> +struct qemu_plugin_scoreboard *vcpus;
> +
> +/* return epoch time in ns */
> +static int64_t now_ns(void)
> +{
> +return g_get_real_time() * 1000;
> +}
> +
> +static uint64_t num_insn_during(int64_t elapsed_ns)
> +{
> +double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
> +return num_secs * (double) insn_per_second;
> +}
> +
> +static int64_t time_for_insn(uint64_t num_insn)
> +{
> +double num_secs = (double) num_insn / (double) insn_per_second;
> +return num_secs * (double) NSEC_IN_ONE_SEC;
> +}
> +
> +static int64_t uptime_ns(void)
> +{
> +int64_t now = now_ns();
> +g_assert(now >= start_time_ns);
> +return now - start_time_ns;
> +}
> +
> +static void vcpu_set_state(vCPUTime *vcpu, vCPUState new_state)
> +{
> +vcpu->last_state_time = now_ns();
> +vcpu->state = new_state;
> +}
> +
> +static void update_system_time(vCPUTime *vcpu)
> +{
> +/* flush remaining instructions */
> +vcpu->counter += vcpu->track_insn;
> +vcpu->track_insn = 0;
> +
> +int64_t uptime = uptime_ns();
> +uint64_t expected_insn = num_insn_during(uptime);
> +
> +if (vcpu->counter >= expected_insn) {
> +/* this vcpu ran faster than expected, so it has to sleep */
> +uint64_t insn_advance = vcpu->counter - expected_insn;
> +uint64_t time_advance_ns = time_for_insn(insn_advance);
> +int64_t sleep_us = time_advance_ns / 1000;
> +g_usleep(sleep_us);
> +}
> +
> +/* based on number of instructions, what should be the new time? */
> +int64_t new_virtual_time = time_for_insn(vcpu->counter);
> +
> +g_mutex_lock(_state_lock);
> +
> +/* Time only moves forward. Another vcpu might have updated it already. 
> */
> +if (new_virtual_time > virtual_time_ns) {
> +qemu_plugin_update_ns(time_handle, new_virtual_time);
> +virtual_time_ns = new_virtual_time;
> +}
> +
> +g_mutex_unlock(_state_lock);
> +}
> +
> +static void set_start_time()
> +{

Re: [PATCH 1/5] sysemu: add set_virtual_time to accel ops

2024-05-28 Thread Pierrick Bouvier


On 5/28/24 10:11, Alex Bennée wrote:

Pierrick Bouvier  writes:


From: Alex Bennée 

We are about to remove direct calls to individual accelerators for
this information and will need a central point for plugins to hook
into time changes.

From: Alex Bennée 
Signed-off-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 


Just a note, when patches written by other people come via your tree you
should add your s-o-b tag to indicate:

   "I'm legally okay to contribute this and happy for it to go into QEMU"



Thanks for clarifying, I didn't know it was needed as a committer (vs 
author), and checkpatch.pl does not seem to check for this.


I'll add this when reposting the series, if we have some comments.

Re: [PATCH 0/6] accel: Restrict TCG plugin (un)registration to TCG accel

2024-05-28 Thread Pierrick Bouvier

On 5/28/24 07:59, Philippe Mathieu-Daudé wrote: > Philippe Mathieu-Daudé 
(6):

   system/runstate: Remove unused 'qemu/plugin.h' header
   accel/tcg: Move common declarations to 'internal-common.h'
   accel: Clarify accel_cpu_common_[un]realize() use unassigned vCPU
   accel: Introduce accel_cpu_common_[un]realize_assigned() handlers
   accel: Restrict TCG plugin (un)registration to TCG accel
   accel/tcg: Move qemu_plugin_vcpu_init__async() to plugins/



Reviewed-by: Pierrick Bouvier

Re: [RFC PATCH 4/4] ci: Add the new migration device tests

2024-05-28 Thread Peter Xu

On Tue, May 28, 2024 at 03:10:48PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Mon, May 27, 2024 at 08:59:00PM -0300, Fabiano Rosas wrote:
> >> Peter Xu  writes:
> >> 
> >> > On Thu, May 23, 2024 at 05:19:22PM -0300, Fabiano Rosas wrote:
> >> >> We have two new migration tests that check cross version
> >> >> compatibility. One uses the vmstate-static-checker.py script to
> >> >> compare the vmstate structures from two different QEMU versions. The
> >> >> other runs a simple migration with a few devices present in the VM, to
> >> >> catch obvious breakages.
> >> >> 
> >> >> Add both tests to the migration-compat-common job.
> >> >> 
> >> >> Signed-off-by: Fabiano Rosas 
> >> >> ---
> >> >>  .gitlab-ci.d/buildtest.yml | 43 +++---
> >> >>  1 file changed, 36 insertions(+), 7 deletions(-)
> >> >> 
> >> >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >> >> index 91c57efded..bc7ac35983 100644
> >> >> --- a/.gitlab-ci.d/buildtest.yml
> >> >> +++ b/.gitlab-ci.d/buildtest.yml
> >> >> @@ -202,18 +202,47 @@ build-previous-qemu:
> >> >>needs:
> >> >>  - job: build-previous-qemu
> >> >>  - job: build-system-opensuse
> >> >> -  # The old QEMU could have bugs unrelated to migration that are
> >> >> -  # already fixed in the current development branch, so this test
> >> >> -  # might fail.
> >> >> +  # This test is allowed to fail because:
> >> >> +  #
> >> >> +  # - The old QEMU could have bugs unrelated to migration that are
> >> >> +  #   already fixed in the current development branch.
> >> >
> >> > Did you ever hit a real failure with this?  I'm wondering whether we can
> >> > remove this allow_failure thing.
> >> >
> >> 
> >> I haven't. But when it fails we'll go through an entire release cycle
> >> with this thing showing red for every person that runs the CI. Remember,
> >> this is a CI failure to which there's no fix aside from waiting for the
> >> release to happen. Even if we're quick to react and disable the job, I
> >> feel it might create some confusion already.
> >
> > My imagination was if needed we'll get complains and we add that until
> > then for that broken release only, and remove in the next release again.
> >
> >> 
> >> >> +  #
> >> >> +  # - The vmstate-static-checker script trips on renames and other
> >> >> +  #   backward-compatible changes to the vmstate structs.
> >> >
> >> > I think I keep my preference per last time we talked on this. :)
> >> 
> >> Sorry, I'm not trying to force this in any way, I just wrote these to
> >> use in the pull-request and thought I'd put it out there. At the very
> >> least we can have your concerns documented. =)
> >
> > Yep that's fine.  I think we should keep such discussion on the list,
> > especially we have different opinions, while none of us got convinced yet
> > so far. :)
> >
> >> 
> >> > I still think it's too early to involve a test that can report false
> >> > negative.
> >> 
> >> (1)
> >> Well, we haven't seen any false negatives, we've seen fields being
> >> renamed. If that happens, then we'll ask the person to update the
> >> script. Is that not acceptable to you? Or are you thinking about other
> >> sorts of issues?
> >
> > Then question is how to update the script. So far it's treated as failure
> > on rename, even if it's benign. Right now we have this:
> >
> > print("Section \"" + sec + "\",", end=' ')
> > print("Description \"" + desc + "\":", end=' ')
> > print("expected field \"" + s_item["field"] + "\",", end=' ')
> > print("got \"" + d_item["field"] + "\"; skipping rest")
> > bump_taint()
> > break
> >
> > Do you want to introduce a list of renamed vmsd fields in this script and
> > maintain that?  IMHO it's an overkill and unnecessary burden to other
> > developers.
> >
> 
> That's not _my_ idea, we already have that (see below). There's not much
> reason to rename fields like that, the vmstate is obviously something
> that should be kept stable, so having to do a rename in a script is way
> better than having to figure out the fix for the compatibility break.
> 
> def check_fields_match(name, s_field, d_field):
> if s_field == d_field:
> return True
> 
> # Some fields changed names between qemu versions.  This list
> # is used to allow such changes in each section / description.
> changed_names = {
> 'apic': ['timer', 'timer_expiry'],
> 'e1000': ['dev', 'parent_obj'],
> 'ehci': ['dev', 'pcidev'],
> 'I440FX': ['dev', 'parent_obj'],
> 'ich9_ahci': ['card', 'parent_obj'],
> 'ich9-ahci': ['ahci', 'ich9_ahci'],
> 'ioh3420': ['PCIDevice', 'PCIEDevice'],
> 'ioh-3240-express-root-port': ['port.br.dev',
>'parent_obj.parent_obj.parent_obj',
>'port.br.dev.exp.aer_log',
> 
>

[PATCH 1/1] tests/avocado: update sbsa-ref firmware

2024-05-28 Thread Marcin Juszkiewicz

Partial support for NUMA setup:
- cpu nodes
- memory nodes

Used versions:

- Trusted Firmware v2.11.0
- Tianocore EDK2 stable202405
- Tianocore EDK2 Platforms code commit 4bbd0ed

Firmware is built using Debian 'bookworm' cross toolchain (gcc 12.2.0).
---
 tests/avocado/machine_aarch64_sbsaref.py | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tests/avocado/machine_aarch64_sbsaref.py 
b/tests/avocado/machine_aarch64_sbsaref.py
index 98c76c1ff7..6bb82f2a03 100644
--- a/tests/avocado/machine_aarch64_sbsaref.py
+++ b/tests/avocado/machine_aarch64_sbsaref.py
@@ -37,18 +37,18 @@ def fetch_firmware(self):
 
 Used components:
 
-- Trusted Firmware 2.10.2
-- Tianocore EDK2 stable202402
-- Tianocore EDK2-platforms commit 085c2fb
+- Trusted Firmware 2.11.0
+- Tianocore EDK2 stable202405
+- Tianocore EDK2-platforms commit 4bbd0ed
 
 """
 
 # Secure BootRom (TF-A code)
 fs0_xz_url = (
 "https://artifacts.codelinaro.org/artifactory/linaro-419-sbsa-ref/;
-"20240313-116475/edk2/SBSA_FLASH0.fd.xz"
+"20240528-140808/edk2/SBSA_FLASH0.fd.xz"
 )
-fs0_xz_hash = 
"637593749cc307dea7dc13265c32e5d020267552f22b18a31850b8429fc5e159"
+fs0_xz_hash = 
"fa6004900b67172914c908b78557fec4d36a5f784f4c3dd08f49adb75e1892a9"
 tar_xz_path = self.fetch_asset(fs0_xz_url, asset_hash=fs0_xz_hash,
   algorithm='sha256')
 archive.extract(tar_xz_path, self.workdir)
@@ -57,9 +57,9 @@ def fetch_firmware(self):
 # Non-secure rom (UEFI and EFI variables)
 fs1_xz_url = (
 "https://artifacts.codelinaro.org/artifactory/linaro-419-sbsa-ref/;
-    "20240313-116475/edk2/SBSA_FLASH1.fd.xz"
+"20240528-140808/edk2/SBSA_FLASH1.fd.xz"
 )
-fs1_xz_hash = 
"cb0a5e8cf5e303c5d3dc106cfd5943ffe9714b86afddee7164c69ee1dd41991c"
+fs1_xz_hash = 
"5f3747d4000bc416d9641e33ff4ac60c3cc8cb74ca51b6e932e58531c62eb6f7"
 tar_xz_path = self.fetch_asset(fs1_xz_url, asset_hash=fs1_xz_hash,
   algorithm='sha256')
 archive.extract(tar_xz_path, self.workdir)
@@ -98,15 +98,15 @@ def test_sbsaref_edk2_firmware(self):
 
 # AP Trusted ROM
 wait_for_console_pattern(self, "Booting Trusted Firmware")
-wait_for_console_pattern(self, "BL1: v2.10.2(release):")
+wait_for_console_pattern(self, "BL1: v2.11.0(release):")
 wait_for_console_pattern(self, "BL1: Booting BL2")
 
 # Trusted Boot Firmware
-wait_for_console_pattern(self, "BL2: v2.10.2(release)")
+wait_for_console_pattern(self, "BL2: v2.11.0(release)")
 wait_for_console_pattern(self, "Booting BL31")
 
 # EL3 Runtime Software
-wait_for_console_pattern(self, "BL31: v2.10.2(release)")
+wait_for_console_pattern(self, "BL31: v2.11.0(release)")
 
 # Non-trusted Firmware
 wait_for_console_pattern(self, "UEFI firmware (version 1.0")
-- 
2.45.1

Re: [PULL v2 00/42] target-arm queue

2024-05-28 Thread Richard Henderson


On 5/28/24 07:07, Peter Maydell wrote:

Hi; most of this is the first half of the A64 simd decodetree
conversion; the rest is a mix of fixes from the last couple of weeks.

v2 uses patches from the v2 decodetree series to avoid a few
regressions in some A32 insns.

(Richard: I'm still planning to review the second half of the
v2 decodetree series; I just wanted to get the respin of this
pullreq out today...)

thanks
-- PMM

The following changes since commit ad10b4badc1dd5b28305f9b9f1168cf0aa3ae946:

   Merge tag 'pull-error-2024-05-27' ofhttps://repo.or.cz/qemu/armbru  into 
staging (2024-05-27 06:40:42 -0700)

are available in the Git repository at:

   https://git.linaro.org/people/pmaydell/qemu-arm.git  
tags/pull-target-arm-20240528

for you to fetch changes up to f240df3c31b40e4cf1af1f156a88efc1a1df406c:

   target/arm: Convert disas_simd_3same_logic to decodetree (2024-05-28 
14:29:01 +0100)


target-arm queue:
  * xlnx_dpdma: fix descriptor endianness bug
  * hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers
  * hw/arm/npcm7xx: remove setting of mp-affinity
  * hw/char: Correct STM32L4x5 usart register CR2 field ADD_0 size
  * hw/intc/arm_gic: Fix handling of NS view of GICC_APR
  * hw/input/tsc2005: Fix -Wchar-subscripts warning in tsc2005_txrx()
  * hw: arm: Remove use of tabs in some source files
  * docs/system: Remove ADC from raspi documentation
  * target/arm: Start of the conversion of A64 SIMD to decodetree


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH V1 08/26] migration: vmstate_info_void_ptr

2024-05-28 Thread Peter Xu

On Tue, May 28, 2024 at 11:10:16AM -0400, Steven Sistare wrote:
> On 5/27/2024 2:31 PM, Peter Xu wrote:
> > On Mon, Apr 29, 2024 at 08:55:17AM -0700, Steve Sistare wrote:
> > > Define VMSTATE_VOID_PTR so the value of a pointer (but not its target)
> > > can be saved in the migration stream.  This will be needed for CPR.
> > > 
> > > Signed-off-by: Steve Sistare 
> > 
> > This is really tricky.
> > 
> >  From a first glance, I don't think migrating a VA is valid at all for
> > migration even if with exec.. and looks insane to me for a cross-process
> > migration, which seems to be allowed to use as a generic VMSD helper.. as
> > VA is the address space barrier for different processes and I think it
> > normally even apply to generic execve(), and we're trying to jailbreak for
> > some reason..
> > 
> > It definitely won't work for any generic migration as sizeof(void*) can be
> > different afaict between hosts, e.g. 32bit -> 64bit migrations.
> > 
> > Some description would be really helpful in this commit message,
> > e.g. explain the users and why.  Do we need to poison that for generic VMSD
> > use (perhaps with prefixed underscores)?  I think I'll need to read on the
> > rest to tell..
> 
> Short answer: we never dereference the void* in the new process.  And must 
> not.
> 
> Longer answer:
> 
> During CPR for vfio, each mapped DMA region is re-registered in the new
> process using the new VA.  The ioctl to re-register identifies the mapping
> by IOVA and length.
> 
> The same requirement holds for CPR of iommufd devices.  However, in the
> iommufd framework, IOVA does not uniquely identify a dma mapping, and we
> need to use the old VA as the unique identifier.  The new process
> re-registers each mapping, passing the old VA and new VA to the kernel.
> The old VA is never dereferenced in the new process, we just need its value.
> 
> I suspected that the void* which must not be dereferenced might make people
> uncomfortable.  I have an older version of my code which adds a uint64_t
> field to RAMBlock for recording and migrating the old VA.  The saving and
> loading code is slightly less elegant, but no big deal.  Would you prefer
> that?

I see, thanks for explaining.  Yes that sounds better to me.  Re the
ugliness: is that about a pre_save() plus one extra uint64_t field?  In
that case it looks better comparing to migrating "void*".

I'm trying to read some context on the vaddr remap thing from you, and I
found this:

https://lore.kernel.org/all/y90bvbnrvraceq%2f...@nvidia.com/

So it will work with iommufd now?  Meanwhile, what's the status for mdev?
Looks like it isn't supported yet for both.

Thanks,

-- 
Peter Xu

Re: block snapshot issue with RBD

2024-05-28 Thread Jin Cao


Hi Ilya

On 5/28/24 11:13 AM, Ilya Dryomov wrote:

On Mon, May 27, 2024 at 9:06 PM Jin Cao  wrote:


Supplementary info: VM is paused after "migrate" command. After being
resumed with "cont", snapshot_delete_blkdev_internal works again, which
is confusing, as disk snapshot generally recommend I/O is paused, and a
frozen VM satisfy this requirement.


Hi Jin,

This doesn't seem to be related to RBD.  Given that the same error is
observed when using the RBD driver with the raw format, I would dig in
the direction of migration somehow "installing" the raw format (which
is on-disk compatible with the rbd format).



Thanks for the hint.


Also, did you mean to say "snapshot_blkdev_internal" instead of
"snapshot_delete_blkdev_internal" in both instances?


Sorry for my copy-and-paste mistake. Yes, it's snapshot_blkdev_internal.

--
Sincerely,
Jin Cao



Thanks,

 Ilya



--
Sincerely
Jin Cao

On 5/27/24 10:56 AM, Jin Cao wrote:

CC block and migration related address.

On 5/27/24 12:03 AM, Jin Cao wrote:

Hi,

I encountered RBD block snapshot issue after doing migration.

Steps
-

1. Start QEMU with:
./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu
host,migratable=on -m 2G -boot menu=on,strict=on
rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user
-cdrom /home/my/path/of/cloud-init.iso -monitor stdio

2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal.
It works as expected: the snapshot is visable with command`rbd snap ls
pool_name/image_name`.

3. Do pseudo migration with monitor cmd: migrate -d exec:cat>/tmp/vm.out

4. Do block snapshot again with snapshot_delete_blkdev_internal, then
I get:
 Error: Block format 'raw' used by device 'ide0-hd0' does not
support internal snapshots

I was hoping to do the second block snapshot successfully, and it
feels abnormal the RBD block snapshot function is disrupted after
migration.

BTW, I get the same block snapshot error when I start QEMU with:
  "-drive format=raw,file=rbd:pool_name/image_name"

My questions is: how could I proceed with RBD block snapshot after the
pseudo migration?

Re: [PATCH V2 0/3] improve -overcommit cpu-pm=on|off

2024-05-28 Thread Chen, Zide

On 5/28/2024 2:23 AM, Igor Mammedov wrote:
> On Fri, 24 May 2024 13:00:14 -0700
> Zide Chen  wrote:
> 
>> Currently, if running "-overcommit cpu-pm=on" on hosts that don't
>> have MWAIT support, the MWAIT/MONITOR feature is advertised to the
>> guest and executing MWAIT/MONITOR on the guest triggers #UD.
> 
> this is missing proper description how do you trigger issue
> with reproducer and detailed description why guest sees MWAIT
> when it's not supported by host.

If "overcommit cpu-pm=on" and "-cpu hpst" are present, as shown in the
following, CPUID_EXT_MONITOR is set after x86_cpu_filter_features(), so
that it doesn't have a chance to check MWAIT against host features and
will be advertised to the guest regardless of whether it's supported by
the host or not.

x86_cpu_realizefn()
  x86_cpu_filter_features()
  cpu_exec_realizefn()
kvm_cpu_realizefn
  host_cpu_realizefn
host_cpu_enable_cpu_pm
  env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;

If it's not supported by the host, executing MONITOR or MWAIT
instructions from the guest triggers #UD, no matter MWAIT_EXITING
control is set or not.

Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-05-28 Thread Gregory Price

On Thu, May 16, 2024 at 10:05:33AM -0700, fan wrote:
> On Fri, Apr 19, 2024 at 02:24:36PM -0400, Gregory Price wrote:
> > On Thu, Apr 18, 2024 at 04:10:51PM -0700, nifan@gmail.com wrote:
> > > A git tree of this series can be found here (with one extra commit on top
> > > for printing out accepted/pending extent list): 
> > > https://github.com/moking/qemu/tree/dcd-v7
> > > 
> > > v6->v7:
> > > 
> > > 1. Fixed the dvsec range register issue mentioned in the the cover letter 
> > > in v6.
> > >Only relevant bits are set to mark the device ready (Patch 6). 
> > > (Jonathan)
> > > 2. Moved the if statement in cxl_setup_memory from Patch 6 to Patch 4. 
> > > (Jonathan)
> > > 3. Used MIN instead of if statement to get record_count in Patch 7. 
> > > (Jonathan)
> > > 4. Added "Reviewed-by" tag to Patch 7.
> > > 5. Modified cxl_dc_extent_release_dry_run so the updated extent list can 
> > > be
> > >reused in cmd_dcd_release_dyn_cap to simplify the process in Patch 8. 
> > > (Jørgen) 
> > > 6. Added comments to indicate further "TODO" items in 
> > > cmd_dcd_add_dyn_cap_rsp.
> > > (Jonathan)
> > > 7. Avoided irrelevant code reformat in Patch 8. (Jonathan)
> > > 8. Modified QMP interfaces for adding/releasing DC extents to allow 
> > > passing
> > >tags, selection policy, flags in the interface. (Jonathan, Gregory)
> > > 9. Redesigned the pending list so extents in the same requests are grouped
> > > together. A new data structure is introduced to represent "extent 
> > > group"
> > > in pending list.  (Jonathan)
> > > 10. Added support in QMP interface for "More" flag. 
> > > 11. Check "Forced removal" flag for release request and not let it pass 
> > > through.
> > > 12. Removed the dynamic capacity log type from CxlEventLog definition in 
> > > cxl.json
> > >to avoid the side effect it may introduce to inject error to DC event 
> > > log.
> > >(Jonathan)
> > > 13. Hard coded the event log type to dynamic capacity event log in QMP
> > > interfaces. (Jonathan)
> > > 14. Adding space in between "-1]". (Jonathan)
> > > 15. Some minor comment fixes.
> > > 
> > > The code is tested with similar setup and has passed similar tests as 
> > > listed
> > > in the cover letter of v5[1] and v6[2].
> > > Also, the code is tested with the latest DCD kernel patchset[3].
> > > 
> > > [1] Qemu DCD patchset v5: 
> > > https://lore.kernel.org/linux-cxl/20240304194331.1586191-1-nifan@gmail.com/T/#t
> > > [2] Qemu DCD patchset v6: 
> > > https://lore.kernel.org/linux-cxl/20240325190339.696686-1-nifan@gmail.com/T/#t
> > > [3] DCD kernel patches: 
> > > https://lore.kernel.org/linux-cxl/20240324-dcd-type2-upstream-v1-0-b7b00d623...@intel.com/T/#m11c571e21c4fe17c7d04ec5c2c7bc7cbf2cd07e3
> > >
> > 
> > added review to all patches, will hopefully be able to add a Tested-by
> > tag early next week, along with a v1 RFC for MHD bit-tracking.
> > 
> > We've been testing v5/v6 for a bit, so I expect as soon as we get the
> > MHD code ported over to v7 i'll ship a tested-by tag pretty quick.
> > 
> > The super-set release will complicate a few things but this doesn't
> > look like a blocker on our end, just a change to how we track bits in a
> > shared bit/bytemap.
> > 
> 
> Hi Gregory,
> I am planning to address all the concerns in this series and send out v8
> next week. Jonathan mentioned you have few related patches built on top
> of this series, can you point me to the latest version so I can look
> into it? Also, would you like me to carry them over to send together
> with my series in next version? It could be easier for you to avoid the
> potential rebase needed for your patches?
> 
> Let me know.
> 
> Thanks,
> Fan
>

I apologize for missing this, I was out of the country for a few weeks.
I'm still catching up on the work history.

I think i saw in passing that you picked up the CCI changes, and those
were the ones causing conflicts - so that's perfect.  I can always
rebase on that.

~ Gregory

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Fabiano Rosas

Peter Xu  writes:

> On Tue, May 28, 2024 at 09:35:22AM -0400, Peter Xu wrote:
>> On Tue, May 28, 2024 at 02:27:57PM +1000, Nicholas Piggin wrote:
>> > There is no need to use /dev/shm for file-backed memory devices, and
>> > it is too small to be usable in gitlab CI. Switch to using a regular
>> > file in /tmp/ which will usually have more space available.
>> > 
>> > Signed-off-by: Nicholas Piggin 
>> > ---
>> > Am I missing something? AFAIKS there is not even any point using
>> > /dev/shm aka tmpfs anyway, there is not much special about it as a
>> > filesystem. This applies on top of the series just sent, and passes
>> > gitlab CI qtests including aarch64.
>> 
>> I think it's just that /dev/shm guarantees shmem usage, while the var
>> "tmpfs" implies g_dir_make_tmp() which may be another non-ram based file
>> system, while that'll be slightly different comparing to what a real user
>> would use - we don't suggest user to put guest RAM on things like btrfs.
>> 
>> One real implication is if we add a postcopy test it'll fail with
>> g_dir_make_tmp() when it is not pointing to a shmem mount, as
>> UFFDIO_REGISTER will fail there.  But that test doesn't yet exist as the
>> QEMU paths should be the same even if Linux will trigger different paths
>> when different types of mem is used (anonymous v.s. shmem).
>> 
>> If the goal here is to properly handle the case where tmpfs doesn't have
>> enough space, how about what I suggested in the other email?
>> 
>> https://lore.kernel.org/r/ZlSppKDE6wzjCF--@x1n
>> 
>> IOW, try populate the shmem region before starting the guest, skip if
>> population failed.  Would that work?
>
> Let me append some more info here..
>
> I think madvise() isn't needed as fallocate() should do the population work
> already, afaiu, then it means we pass the shmem path to QEMU and QEMU
> should notice this memory-backend-file existed, open() directly.
>
> I quicked walk the QEMU memory code and so far it looks all applicable, so
> that QEMU should just start the guest with the pre-populated shmem page
> caches.
>
> There's one trick where qemu_ram_mmap() will map some extra pages, on x86
> 4k, and I don't yet know why we did that..
>
> /*
>  * Note: this always allocates at least one extra page of virtual address
>  * space, even if size is already aligned.
>  */
> total = size + align;

At the end of the function:

/*
 * Leave a single PROT_NONE page allocated after the RAM block, to serve as
 * a guard page guarding against potential buffer overflows.
 */

Re: [PATCH 0/6] accel: Restrict TCG plugin (un)registration to TCG accel

2024-05-28 Thread Richard Henderson


On 5/28/24 07:59, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (6):
   system/runstate: Remove unused 'qemu/plugin.h' header
   accel/tcg: Move common declarations to 'internal-common.h'
   accel: Clarify accel_cpu_common_[un]realize() use unassigned vCPU
   accel: Introduce accel_cpu_common_[un]realize_assigned() handlers
   accel: Restrict TCG plugin (un)registration to TCG accel
   accel/tcg: Move qemu_plugin_vcpu_init__async() to plugins/


Reviewed-by: Richard Henderson 

r~

1 2 3 4 >

1 - 100 of 375 matches

Mail list logo