Re: [PATCH] Warn user if the vga flag is passed but no vga device is created
Gautam Agrawal writes: > This patch is in regards to this > issue:https://gitlab.com/qemu-project/qemu/-/issues/581#. > A global boolean variable "vga_interface_created"(declared in > softmmu/globals.c) > has been used to track the creation of vga interface. If the vga flag is > passed in the command > line "default_vga"(declared in softmmu/vl.c) variable is set to 0. To warn > user, the condition > checks if vga_interface_created is false and default_vga is equal to 0. > > The warning "No vga device is created" is logged if vga flag is passed > but no vga device is created. This patch has been tested for > x86_64, i386, sparc, sparc64 and arm boards. Suggest to include a reproducer here, e.g. $ qemu-system-x86_64 -S -display none -M microvm -vga std qemu-system-x86_64: warning: No vga device is created See below for my critique of the warning message. > > Signed-off-by: Gautam Agrawal > --- > hw/isa/isa-bus.c| 1 + > hw/pci/pci.c| 1 + > hw/sparc/sun4m.c| 2 ++ > hw/sparc64/sun4u.c | 1 + > include/sysemu/sysemu.h | 1 + > softmmu/globals.c | 1 + > softmmu/vl.c| 3 +++ > 7 files changed, 10 insertions(+) > > diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c > index 0ad1c5fd65..cd5ad3687d 100644 > --- a/hw/isa/isa-bus.c > +++ b/hw/isa/isa-bus.c > @@ -166,6 +166,7 @@ bool isa_realize_and_unref(ISADevice *dev, ISABus *bus, > Error **errp) > > ISADevice *isa_vga_init(ISABus *bus) > { > +vga_interface_created = true; > switch (vga_interface_type) { > case VGA_CIRRUS: > return isa_create_simple(bus, "isa-cirrus-vga"); > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > index dae9119bfe..fab9c80f8d 100644 > --- a/hw/pci/pci.c > +++ b/hw/pci/pci.c > @@ -2038,6 +2038,7 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus > *rootbus, > > PCIDevice *pci_vga_init(PCIBus *bus) > { > +vga_interface_created = true; > switch (vga_interface_type) { > case VGA_CIRRUS: > return pci_create_simple(bus, -1, "cirrus-vga"); > diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c > index 7f3a7c0027..f45e29acc8 100644 > --- a/hw/sparc/sun4m.c > +++ b/hw/sparc/sun4m.c > @@ -921,6 +921,7 @@ static void sun4m_hw_init(MachineState *machine) > /* sbus irq 5 */ > cg3_init(hwdef->tcx_base, slavio_irq[11], 0x0010, > graphic_width, graphic_height, graphic_depth); > +vga_interface_created = true; > } else { > /* If no display specified, default to TCX */ > if (graphic_depth != 8 && graphic_depth != 24) { > @@ -936,6 +937,7 @@ static void sun4m_hw_init(MachineState *machine) > > tcx_init(hwdef->tcx_base, slavio_irq[11], 0x0010, > graphic_width, graphic_height, graphic_depth); > +vga_interface_created = true; > } > } > > diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c > index cda7df36e3..75334dba71 100644 > --- a/hw/sparc64/sun4u.c > +++ b/hw/sparc64/sun4u.c > @@ -633,6 +633,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem, > switch (vga_interface_type) { > case VGA_STD: > pci_create_simple(pci_busA, PCI_DEVFN(2, 0), "VGA"); > +vga_interface_created = true; > break; > case VGA_NONE: > break; > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h > index b9421e03ff..a558b895e4 100644 > --- a/include/sysemu/sysemu.h > +++ b/include/sysemu/sysemu.h > @@ -32,6 +32,7 @@ typedef enum { > } VGAInterfaceType; > > extern int vga_interface_type; > +extern bool vga_interface_created; > > extern int graphic_width; > extern int graphic_height; > diff --git a/softmmu/globals.c b/softmmu/globals.c > index 3ebd718e35..1a5f8d42ad 100644 > --- a/softmmu/globals.c > +++ b/softmmu/globals.c > @@ -40,6 +40,7 @@ int nb_nics; > NICInfo nd_table[MAX_NICS]; > int autostart = 1; > int vga_interface_type = VGA_NONE; > +bool vga_interface_created = false; > Chardev *parallel_hds[MAX_PARALLEL_PORTS]; > int win2k_install_hack; > int singlestep; > diff --git a/softmmu/vl.c b/softmmu/vl.c > index 6f646531a0..cb79fa1f42 100644 > --- a/softmmu/vl.c > +++ b/softmmu/vl.c > @@ -2734,6 +2734,9 @@ static void qemu_machine_creation_done(void) > if (foreach_device_config(DEV_GDB, gdbserver_start) < 0) { > exit(1); > } > +if (!vga_interface_created && !default_vga) { > +warn_report("No vga device is created"); True, but this leaves the user guessing why. Pointing to the option would help: qemu-system-x86_64: warning: -vga std: No vga device is created To get this, use loc_save() to save the option's location along @vga_model, then bracket the warn_report() with loc_push_restore() and loc_pop(). The option to ask the board to create a video device is spelled -vga for historical reasons. Some of its arguments aren't VGA devices, e.g. tcx. -help is phrased accordingly:
Re: [PATCH v5 0/3] util/thread-pool: Expose minimun and maximum size
Nicolas Saenz Julienne writes: > As discussed on the previous RFC[1] the thread-pool's dynamic thread > management doesn't play well with real-time and latency sensitive > systems. This series introduces a set of controls that'll permit > achieving more deterministic behaviours, for example by fixing the > pool's size. > > We first introduce a new common interface to event loop configuration by > moving iothread's already available properties into an abstract class > called 'EventLooopBackend' and have both 'IOThread' and the newly > created 'MainLoop' inherit the properties from that class. > > With this new configuration interface in place it's relatively simple to > introduce new options to fix the even loop's thread pool sizes. The > resulting QAPI looks like this: > > -object main-loop,id=main-loop,thread-pool-min=1,thread-pool-max=1 > > Note that all patches are bisect friendly and pass all the tests. > > [1] > https://patchwork.ozlabs.org/project/qemu-devel/patch/20220202175234.656711-1-nsaen...@redhat.com/ > > @Stefan I kept your Signed-off-by, since the changes trivial/not > thread-pool related With the doc nit in PATCH 2 addressed, QAPI schema Acked-by: Markus Armbruster
Re: [PATCH v5 2/3] util/main-loop: Introduce the main loop into QOM
Nicolas Saenz Julienne writes: > 'event-loop-base' provides basic property handling for all 'AioContext' > based event loops. So let's define a new 'MainLoopClass' that inherits > from it. This will permit tweaking the main loop's properties through > qapi as well as through the command line using the '-object' keyword[1]. > Only one instance of 'MainLoopClass' might be created at any time. > > 'EventLoopBaseClass' learns a new callback, 'can_be_deleted()' so as to > mark 'MainLoop' as non-deletable. > > [1] For example: > -object main-loop,id=main-loop,aio-max-batch= > > Signed-off-by: Nicolas Saenz Julienne > Reviewed-by: Stefan Hajnoczi [...] > diff --git a/qapi/qom.json b/qapi/qom.json > index a2439533c5..51f3acaad8 100644 > --- a/qapi/qom.json > +++ b/qapi/qom.json > @@ -540,6 +540,16 @@ > '*poll-grow': 'int', > '*poll-shrink': 'int' } } > > +## > +# @MainLoopProperties: > +# > +# Properties for the main-loop object. > +# Please add # Since: 7.1 > +## > +{ 'struct': 'MainLoopProperties', > + 'base': 'EventLoopBaseProperties', > + 'data': {} } > + > ## > # @MemoryBackendProperties: > # > @@ -830,6 +840,7 @@ > { 'name': 'input-linux', >'if': 'CONFIG_LINUX' }, > 'iothread', > +'main-loop', > { 'name': 'memory-backend-epc', >'if': 'CONFIG_LINUX' }, > 'memory-backend-file', > @@ -895,6 +906,7 @@ >'input-linux':{ 'type': 'InputLinuxProperties', >'if': 'CONFIG_LINUX' }, >'iothread': 'IothreadProperties', > + 'main-loop': 'MainLoopProperties', >'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties', >'if': 'CONFIG_LINUX' }, >'memory-backend-file':'MemoryBackendFileProperties', [...]
[PATCH v8 5/5] hw/acpi/aml-build: Use existing CPU topology to build PPTT table
When the PPTT table is built, the CPU topology is re-calculated, but it's unecessary because the CPU topology has been populated in virt_possible_cpu_arch_ids() on arm/virt machine. This reworks build_pptt() to avoid by reusing the existing IDs in ms->possible_cpus. Currently, the only user of build_pptt() is arm/virt machine. Signed-off-by: Gavin Shan Tested-by: Yanan Wang Reviewed-by: Yanan Wang Acked-by: Igor Mammedov --- hw/acpi/aml-build.c | 111 +++- 1 file changed, 48 insertions(+), 63 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 4086879ebf..e6bfac95c7 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -2002,86 +2002,71 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms, const char *oem_id, const char *oem_table_id) { MachineClass *mc = MACHINE_GET_CLASS(ms); -GQueue *list = g_queue_new(); -guint pptt_start = table_data->len; -guint parent_offset; -guint length, i; -int uid = 0; -int socket; +CPUArchIdList *cpus = ms->possible_cpus; +int64_t socket_id = -1, cluster_id = -1, core_id = -1; +uint32_t socket_offset = 0, cluster_offset = 0, core_offset = 0; +uint32_t pptt_start = table_data->len; +int n; AcpiTable table = { .sig = "PPTT", .rev = 2, .oem_id = oem_id, .oem_table_id = oem_table_id }; acpi_table_begin(, table_data); -for (socket = 0; socket < ms->smp.sockets; socket++) { -g_queue_push_tail(list, -GUINT_TO_POINTER(table_data->len - pptt_start)); -build_processor_hierarchy_node( -table_data, -/* - * Physical package - represents the boundary - * of a physical package - */ -(1 << 0), -0, socket, NULL, 0); -} - -if (mc->smp_props.clusters_supported) { -length = g_queue_get_length(list); -for (i = 0; i < length; i++) { -int cluster; - -parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list)); -for (cluster = 0; cluster < ms->smp.clusters; cluster++) { -g_queue_push_tail(list, -GUINT_TO_POINTER(table_data->len - pptt_start)); -build_processor_hierarchy_node( -table_data, -(0 << 0), /* not a physical package */ -parent_offset, cluster, NULL, 0); -} +/* + * This works with the assumption that cpus[n].props.*_id has been + * sorted from top to down levels in mc->possible_cpu_arch_ids(). + * Otherwise, the unexpected and duplicated containers will be + * created. + */ +for (n = 0; n < cpus->len; n++) { +if (cpus->cpus[n].props.socket_id != socket_id) { +assert(cpus->cpus[n].props.socket_id > socket_id); +socket_id = cpus->cpus[n].props.socket_id; +cluster_id = -1; +core_id = -1; +socket_offset = table_data->len - pptt_start; +build_processor_hierarchy_node(table_data, +(1 << 0), /* Physical package */ +0, socket_id, NULL, 0); } -} -length = g_queue_get_length(list); -for (i = 0; i < length; i++) { -int core; - -parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list)); -for (core = 0; core < ms->smp.cores; core++) { -if (ms->smp.threads > 1) { -g_queue_push_tail(list, -GUINT_TO_POINTER(table_data->len - pptt_start)); -build_processor_hierarchy_node( -table_data, -(0 << 0), /* not a physical package */ -parent_offset, core, NULL, 0); -} else { -build_processor_hierarchy_node( -table_data, -(1 << 1) | /* ACPI Processor ID valid */ -(1 << 3), /* Node is a Leaf */ -parent_offset, uid++, NULL, 0); +if (mc->smp_props.clusters_supported) { +if (cpus->cpus[n].props.cluster_id != cluster_id) { +assert(cpus->cpus[n].props.cluster_id > cluster_id); +cluster_id = cpus->cpus[n].props.cluster_id; +core_id = -1; +cluster_offset = table_data->len - pptt_start; +build_processor_hierarchy_node(table_data, +(0 << 0), /* Not a physical package */ +socket_offset, cluster_id, NULL, 0); } +} else { +cluster_offset = socket_offset; } -} -length = g_queue_get_length(list); -for (i = 0; i < length; i++) { -int thread; +if (ms->smp.threads == 1) { +build_processor_hierarchy_node(table_data, +(1 << 1) | /* ACPI Processor ID valid */ +(1 << 3), /* Node is
[PATCH v8 4/5] hw/arm/virt: Fix CPU's default NUMA node ID
When CPU-to-NUMA association isn't explicitly provided by users, the default one is given by mc->get_default_cpu_node_id(). However, the CPU topology isn't fully considered in the default association and this causes CPU topology broken warnings on booting Linux guest. For example, the following warning messages are observed when the Linux guest is booted with the following command lines. /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host \ -cpu host \ -smp 6,sockets=2,cores=3,threads=1 \ -m 1024M,slots=16,maxmem=64G\ -object memory-backend-ram,id=mem0,size=128M\ -object memory-backend-ram,id=mem1,size=128M\ -object memory-backend-ram,id=mem2,size=128M\ -object memory-backend-ram,id=mem3,size=128M\ -object memory-backend-ram,id=mem4,size=128M\ -object memory-backend-ram,id=mem4,size=384M\ -numa node,nodeid=0,memdev=mem0 \ -numa node,nodeid=1,memdev=mem1 \ -numa node,nodeid=2,memdev=mem2 \ -numa node,nodeid=3,memdev=mem3 \ -numa node,nodeid=4,memdev=mem4 \ -numa node,nodeid=5,memdev=mem5 : alternatives: patching kernel code BUG: arch topology borken the CLS domain not a subset of the MC domain BUG: arch topology borken the DIE domain not a subset of the NODE domain With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately. That's incorrect because CPU#0/1/2 should be associated with same NUMA node because they're seated in same socket. This fixes the issue by considering the socket ID when the default CPU-to-NUMA association is provided in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are no CPUs associated with NODE#2/3/4/5. Signed-off-by: Gavin Shan Reviewed-by: Igor Mammedov Reviewed-by: Yanan Wang --- hw/arm/virt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0fd7f9a6a1..091054662c 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2552,7 +2552,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index) static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx) { -return idx % ms->numa_state->num_nodes; +int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id; + +return socket_id % ms->numa_state->num_nodes; } static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) -- 2.23.0
[PATCH v8 3/5] hw/arm/virt: Consider SMP configuration in CPU topology
Currently, the SMP configuration isn't considered when the CPU topology is populated. In this case, it's impossible to provide the default CPU-to-NUMA mapping or association based on the socket ID of the given CPU. This takes account of SMP configuration when the CPU topology is populated. The die ID for the given CPU isn't assigned since it's not supported on arm/virt machine. Besides, the used SMP configuration in qtest/numa-test/aarch64_numa_cpu() is corrcted to avoid testing failure Signed-off-by: Gavin Shan Reviewed-by: Yanan Wang --- hw/arm/virt.c | 15 ++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 5bdd98e4a1..0fd7f9a6a1 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2560,6 +2560,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) int n; unsigned int max_cpus = ms->smp.max_cpus; VirtMachineState *vms = VIRT_MACHINE(ms); +MachineClass *mc = MACHINE_GET_CLASS(vms); if (ms->possible_cpus) { assert(ms->possible_cpus->len == max_cpus); @@ -2573,8 +2574,20 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms) ms->possible_cpus->cpus[n].type = ms->cpu_type; ms->possible_cpus->cpus[n].arch_id = virt_cpu_mp_affinity(vms, n); + +assert(!mc->smp_props.dies_supported); +ms->possible_cpus->cpus[n].props.has_socket_id = true; +ms->possible_cpus->cpus[n].props.socket_id = +n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads); +ms->possible_cpus->cpus[n].props.has_cluster_id = true; +ms->possible_cpus->cpus[n].props.cluster_id = +(n / (ms->smp.cores * ms->smp.threads)) % ms->smp.clusters; +ms->possible_cpus->cpus[n].props.has_core_id = true; +ms->possible_cpus->cpus[n].props.core_id = +(n / ms->smp.threads) % ms->smp.cores; ms->possible_cpus->cpus[n].props.has_thread_id = true; -ms->possible_cpus->cpus[n].props.thread_id = n; +ms->possible_cpus->cpus[n].props.thread_id = +n % ms->smp.threads; } return ms->possible_cpus; } -- 2.23.0
[PATCH v8 2/5] qtest/numa-test: Specify CPU topology in aarch64_numa_cpu()
The CPU topology isn't enabled on arm/virt machine yet, but we're going to do it in next patch. After the CPU topology is enabled by next patch, "thrad-id=1" becomes invalid because the CPU core is preferred on arm/virt machine. It means these two CPUs have 0/1 as their core IDs, but their thread IDs are all 0. It will trigger test failure as the following message indicates: [14/21 qemu:qtest+qtest-aarch64 / qtest-aarch64/numa-test ERROR 1.48s killed by signal 6 SIGABRT >>> G_TEST_DBUS_DAEMON=/home/gavin/sandbox/qemu.main/tests/dbus-vmstate-daemon.sh \ QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \ QTEST_QEMU_BINARY=./qemu-system-aarch64 \ QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=83 \ /home/gavin/sandbox/qemu.main/build/tests/qtest/numa-test --tap -k ―― stderr: qemu-system-aarch64: -numa cpu,node-id=0,thread-id=1: no match found This fixes the issue by providing comprehensive SMP configurations in aarch64_numa_cpu(). The SMP configurations aren't used before the CPU topology is enabled in next patch. Signed-off-by: Gavin Shan Reviewed-by: Yanan Wang --- tests/qtest/numa-test.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c index 90bf68a5b3..aeda8c774c 100644 --- a/tests/qtest/numa-test.c +++ b/tests/qtest/numa-test.c @@ -223,7 +223,8 @@ static void aarch64_numa_cpu(const void *data) QTestState *qts; g_autofree char *cli = NULL; -cli = make_cli(data, "-machine smp.cpus=2 " +cli = make_cli(data, "-machine " +"smp.cpus=2,smp.sockets=1,smp.clusters=1,smp.cores=1,smp.threads=2 " "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 " "-numa cpu,node-id=1,thread-id=0 " "-numa cpu,node-id=0,thread-id=1"); -- 2.23.0
[PATCH v8 1/5] qapi/machine.json: Add cluster-id
This adds cluster-id in CPU instance properties, which will be used by arm/virt machine. Besides, the cluster-id is also verified or dumped in various spots: * hw/core/machine.c::machine_set_cpu_numa_node() to associate CPU with its NUMA node. * hw/core/machine.c::machine_numa_finish_cpu_init() to record CPU slots with no NUMA mapping set. * hw/core/machine-hmp-cmds.c::hmp_hotpluggable_cpus() to dump cluster-id. Signed-off-by: Gavin Shan Reviewed-by: Yanan Wang --- hw/core/machine-hmp-cmds.c | 4 hw/core/machine.c | 16 qapi/machine.json | 6 -- 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c index 4e2f319aeb..5cb5eecbfc 100644 --- a/hw/core/machine-hmp-cmds.c +++ b/hw/core/machine-hmp-cmds.c @@ -77,6 +77,10 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict) if (c->has_die_id) { monitor_printf(mon, "die-id: \"%" PRIu64 "\"\n", c->die_id); } +if (c->has_cluster_id) { +monitor_printf(mon, "cluster-id: \"%" PRIu64 "\"\n", + c->cluster_id); +} if (c->has_core_id) { monitor_printf(mon, "core-id: \"%" PRIu64 "\"\n", c->core_id); } diff --git a/hw/core/machine.c b/hw/core/machine.c index cb9bbc844d..700c1e76b8 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -682,6 +682,11 @@ void machine_set_cpu_numa_node(MachineState *machine, return; } +if (props->has_cluster_id && !slot->props.has_cluster_id) { +error_setg(errp, "cluster-id is not supported"); +return; +} + if (props->has_socket_id && !slot->props.has_socket_id) { error_setg(errp, "socket-id is not supported"); return; @@ -701,6 +706,11 @@ void machine_set_cpu_numa_node(MachineState *machine, continue; } +if (props->has_cluster_id && +props->cluster_id != slot->props.cluster_id) { +continue; +} + if (props->has_die_id && props->die_id != slot->props.die_id) { continue; } @@ -995,6 +1005,12 @@ static char *cpu_slot_to_string(const CPUArchId *cpu) } g_string_append_printf(s, "die-id: %"PRId64, cpu->props.die_id); } +if (cpu->props.has_cluster_id) { +if (s->len) { +g_string_append_printf(s, ", "); +} +g_string_append_printf(s, "cluster-id: %"PRId64, cpu->props.cluster_id); +} if (cpu->props.has_core_id) { if (s->len) { g_string_append_printf(s, ", "); diff --git a/qapi/machine.json b/qapi/machine.json index d25a481ce4..4c417e32a5 100644 --- a/qapi/machine.json +++ b/qapi/machine.json @@ -868,10 +868,11 @@ # @node-id: NUMA node ID the CPU belongs to # @socket-id: socket number within node/board the CPU belongs to # @die-id: die number within socket the CPU belongs to (since 4.1) -# @core-id: core number within die the CPU belongs to +# @cluster-id: cluster number within die the CPU belongs to (since 7.1) +# @core-id: core number within cluster the CPU belongs to # @thread-id: thread number within core the CPU belongs to # -# Note: currently there are 5 properties that could be present +# Note: currently there are 6 properties that could be present # but management should be prepared to pass through other # properties with device_add command to allow for future # interface extension. This also requires the filed names to be kept in @@ -883,6 +884,7 @@ 'data': { '*node-id': 'int', '*socket-id': 'int', '*die-id': 'int', +'*cluster-id': 'int', '*core-id': 'int', '*thread-id': 'int' } -- 2.23.0
[PATCH v8 0/5] hw/arm/virt: Fix CPU's default NUMA node ID
When the CPU-to-NUMA association isn't provided by user, the default NUMA node ID for the specific CPU is returned from virt_get_default_cpu_node_id(). Unfortunately, the default NUMA node ID breaks socket boundary and leads to the broken CPU topology warning message in Linux guest. This series intends to fix the issue. PATCH[1/5] Add cluster-id to CPU instance property PATCH[2/5] Fixes test failure in qtest/numa-test/aarch64_numa_cpu() PATCH[3/5] Uses SMP configuration to populate CPU topology PATCH[4/5] Fixes the broken CPU topology by considering the socket boundary when the default NUMA node ID is given PATCH[5/5] Uses the populated CPU topology to build PPTT table, instead of calculate it again Changelog = v8: * Separate PATCH[v8 2/5] to fix test failure in qtest/ numa-test/aarch64_numa_cpu() (Igor) * Improvents to coding style, changelog and comments (Yanan) v6/v7: * Fixed description for 'cluster-id' and 'core-id' (Yanan) * Remove '% ms->smp.sockets' in socket ID calculation(Yanan) * Fixed tests/qtest/numa-test/aarch64_numa_cpu() (Yanan) * Initialized offset variables in build_pptt() (Jonathan) * Added comments about the expected and sorted layout of cpus[n].props.*_id and assert() on the exceptional cases (Igor) v4/v5: * Split PATCH[v3 1/3] to PATCH[v5 1/4] and PATCH[v5 2/4]. Verify or dump 'clsuter-id' in various spots (Yanan) * s/within cluster/within cluster\/die/ for 'core-id' in qapi/machine.json (Igor) * Apply '% ms->smp.{sockets, clusters, cores, threads} in virt_possible_cpu_arch_ids() as x86 does (Igor) * Use [0 - possible_cpus->len] as ACPI processor UID to build PPTT table and PATCH[v3 4/4] is dropped (Igor) * Simplified build_pptt() to add all entries in one loop on ms->possible_cpus (Igor) v3: * Split PATCH[v2 1/3] to PATCH[v3 1/4] and PATCH[v3 2/4] (Yanan) * Don't take account of die ID in CPU topology population and added assert(!mc->smp_props.dies_supported)(Yanan/Igor) * Assign cluster_id and use it when building PPTT table (Yanan/Igor) v2: * Populate the CPU topology in virt_possible_cpu_arch_ids() so that it can be reused in virt_get_default_cpu_node_id() (Igor) * Added PATCH[2/3] to use the existing CPU topology when the PPTT table is built(Igor) * Added PATCH[3/3] to take thread ID as ACPI processor ID in MADT and SRAT table (Gavin) Gavin Shan (5): qapi/machine.json: Add cluster-id qtest/numa-test: Specify CPU topology in aarch64_numa_cpu() hw/arm/virt: Consider SMP configuration in CPU topology hw/arm/virt: Fix CPU's default NUMA node ID hw/acpi/aml-build: Use existing CPU topology to build PPTT table hw/acpi/aml-build.c| 111 - hw/arm/virt.c | 19 ++- hw/core/machine-hmp-cmds.c | 4 ++ hw/core/machine.c | 16 ++ qapi/machine.json | 6 +- tests/qtest/numa-test.c| 3 +- 6 files changed, 91 insertions(+), 68 deletions(-) -- 2.23.0
答复: [PATCH] hw/sd/sdhci: Block Size Register bits [14:12] is lost
ping https://patchew.org/QEMU/20220321055618.4026-1-lu@verisilicon.com/ Please help review the patch. Thanks. B.R. -邮件原件- 发件人: Gao, Lu 发送时间: Monday, March 21, 2022 1:56 PM 收件人: qemu-devel@nongnu.org 抄送: Gao, Lu; Wen, Jianxian; Philippe Mathieu-Daudé; Bin Meng; open list:SD (Secure Card) 主题: [PATCH] hw/sd/sdhci: Block Size Register bits [14:12] is lost Block Size Register bits [14:12] is SDMA Buffer Boundary, it is missed in register write, but it is needed in SDMA transfer. e.g. it will be used in sdhci_sdma_transfer_multi_blocks to calculate boundary_ variables. Missing this field will cause wrong operation for different SDMA Buffer Boundary settings. Signed-off-by: Lu Gao Signed-off-by: Jianxian Wen --- hw/sd/sdhci.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index e0bbc90344..350ceb487d 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -321,6 +321,8 @@ static void sdhci_poweron_reset(DeviceState *dev) static void sdhci_data_transfer(void *opaque); +#define BLOCK_SIZE_MASK (4 * KiB - 1) + static void sdhci_send_command(SDHCIState *s) { SDRequest request; @@ -371,7 +373,8 @@ static void sdhci_send_command(SDHCIState *s) sdhci_update_irq(s); -if (!timeout && s->blksize && (s->cmdreg & SDHC_CMD_DATA_PRESENT)) { +if (!timeout && (s->blksize & BLOCK_SIZE_MASK) && +(s->cmdreg & SDHC_CMD_DATA_PRESENT)) { s->data_count = 0; sdhci_data_transfer(s); } @@ -406,7 +409,6 @@ static void sdhci_end_transfer(SDHCIState *s) /* * Programmed i/o data transfer */ -#define BLOCK_SIZE_MASK (4 * KiB - 1) /* Fill host controller's read buffer with BLKSIZE bytes of data from card */ static void sdhci_read_block_from_card(SDHCIState *s) @@ -1137,7 +1139,8 @@ sdhci_write(void *opaque, hwaddr offset, uint64_t val, unsigned size) s->sdmasysad = (s->sdmasysad & mask) | value; MASKED_WRITE(s->sdmasysad, mask, value); /* Writing to last byte of sdmasysad might trigger transfer */ -if (!(mask & 0xFF00) && s->blkcnt && s->blksize && +if (!(mask & 0xFF00) && s->blkcnt && +(s->blksize & BLOCK_SIZE_MASK) && SDHC_DMA_TYPE(s->hostctl1) == SDHC_CTRL_SDMA) { if (s->trnmod & SDHC_TRNS_MULTI) { sdhci_sdma_transfer_multi_blocks(s); @@ -1151,7 +1154,11 @@ sdhci_write(void *opaque, hwaddr offset, uint64_t val, unsigned size) if (!TRANSFERRING_DATA(s->prnsts)) { uint16_t blksize = s->blksize; -MASKED_WRITE(s->blksize, mask, extract32(value, 0, 12)); +/* + * [14:12] SDMA Buffer Boundary + * [11:00] Transfer Block Size + */ +MASKED_WRITE(s->blksize, mask, extract32(value, 0, 15)); MASKED_WRITE(s->blkcnt, mask >> 16, value >> 16); /* Limit block size to the maximum buffer size */ -- 2.17.1
[PATCH v2 19/42] i386: Rewrite blendv helpers
Rewrite the blendv helpers so that they can easily be extended to support the AVX encodings, which make all 4 arguments explicit. No functional changes to the existing helpers Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 119 +- 1 file changed, 60 insertions(+), 59 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 3202c00572..9f388b02b9 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2141,73 +2141,74 @@ void glue(helper_palignr, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } -#define XMM0 (env->xmm_regs[0]) +#if SHIFT >= 1 + +#define BLEND_V128(elem, num, F, b) do {\ +d->elem(b + 0) = F(v->elem(b + 0), s->elem(b + 0), m->elem(b + 0)); \ +d->elem(b + 1) = F(v->elem(b + 1), s->elem(b + 1), m->elem(b + 1)); \ +if (num > 2) { \ +d->elem(b + 2) = F(v->elem(b + 2), s->elem(b + 2), m->elem(b + 2)); \ +d->elem(b + 3) = F(v->elem(b + 3), s->elem(b + 3), m->elem(b + 3)); \ +} \ +if (num > 4) { \ +d->elem(b + 4) = F(v->elem(b + 4), s->elem(b + 4), m->elem(b + 4)); \ +d->elem(b + 5) = F(v->elem(b + 5), s->elem(b + 5), m->elem(b + 5)); \ +d->elem(b + 6) = F(v->elem(b + 6), s->elem(b + 6), m->elem(b + 6)); \ +d->elem(b + 7) = F(v->elem(b + 7), s->elem(b + 7), m->elem(b + 7)); \ +} \ +if (num > 8) { \ +d->elem(b + 8) = F(v->elem(b + 8), s->elem(b + 8), m->elem(b + 8)); \ +d->elem(b + 9) = F(v->elem(b + 9), s->elem(b + 9), m->elem(b + 9)); \ +d->elem(b + 10) = F(v->elem(b + 10), s->elem(b + 10), m->elem(b + 10));\ +d->elem(b + 11) = F(v->elem(b + 11), s->elem(b + 11), m->elem(b + 11));\ +d->elem(b + 12) = F(v->elem(b + 12), s->elem(b + 12), m->elem(b + 12));\ +d->elem(b + 13) = F(v->elem(b + 13), s->elem(b + 13), m->elem(b + 13));\ +d->elem(b + 14) = F(v->elem(b + 14), s->elem(b + 14), m->elem(b + 14));\ +d->elem(b + 15) = F(v->elem(b + 15), s->elem(b + 15), m->elem(b + 15));\ +} \ +} while (0) -#if SHIFT == 1 #define SSE_HELPER_V(name, elem, num, F)\ -void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ { \ -d->elem(0) = F(d->elem(0), s->elem(0), XMM0.elem(0)); \ -d->elem(1) = F(d->elem(1), s->elem(1), XMM0.elem(1)); \ -if (num > 2) { \ -d->elem(2) = F(d->elem(2), s->elem(2), XMM0.elem(2)); \ -d->elem(3) = F(d->elem(3), s->elem(3), XMM0.elem(3)); \ -if (num > 4) { \ -d->elem(4) = F(d->elem(4), s->elem(4), XMM0.elem(4)); \ -d->elem(5) = F(d->elem(5), s->elem(5), XMM0.elem(5)); \ -d->elem(6) = F(d->elem(6), s->elem(6), XMM0.elem(6)); \ -d->elem(7) = F(d->elem(7), s->elem(7), XMM0.elem(7)); \ -if (num > 8) { \ -d->elem(8) = F(d->elem(8), s->elem(8), XMM0.elem(8)); \ -d->elem(9) = F(d->elem(9), s->elem(9), XMM0.elem(9)); \ -d->elem(10) = F(d->elem(10), s->elem(10), XMM0.elem(10)); \ -d->elem(11) = F(d->elem(11), s->elem(11), XMM0.elem(11)); \ -d->elem(12) = F(d->elem(12), s->elem(12), XMM0.elem(12)); \ -d->elem(13) = F(d->elem(13), s->elem(13), XMM0.elem(13)); \ -d->elem(14) = F(d->elem(14), s->elem(14), XMM0.elem(14)); \ -d->elem(15) = F(d->elem(15), s->elem(15), XMM0.elem(15)); \ -} \ -} \ -} \ -} +Reg *v = d; \ +Reg *m = >xmm_regs[0]; \ +BLEND_V128(elem, num, F, 0);\ +YMM_ONLY(BLEND_V128(elem, num, F, num);)\ +} + +#define BLEND_I128(elem, num, F, b) do {\ +d->elem(b + 0) = F(v->elem(b + 0), s->elem(b + 0), ((imm >> 0) & 1)); \ +d->elem(b + 1) = F(v->elem(b +
[PATCH v2 31/42] i386: Implement AVX variable shifts
These use the W bit to encode the operand width, but otherwise fairly straightforward. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 17 + target/i386/ops_sse_header.h | 6 ++ target/i386/tcg/translate.c | 17 + 3 files changed, 40 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 9b92b9790a..8f2bd48394 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3195,6 +3195,23 @@ void glue(helper_vpermilps_imm, SUFFIX)(CPUX86State *env, #endif } +#if SHIFT == 1 +#define FPSRLVD(x, c) (c < 32 ? ((x) >> c) : 0) +#define FPSRLVQ(x, c) (c < 64 ? ((x) >> c) : 0) +#define FPSRAVD(x, c) ((int32_t)(x) >> (c < 64 ? c : 31)) +#define FPSRAVQ(x, c) ((int64_t)(x) >> (c < 64 ? c : 63)) +#define FPSLLVD(x, c) (c < 32 ? ((x) << c) : 0) +#define FPSLLVQ(x, c) (c < 64 ? ((x) << c) : 0) +#endif + +SSE_HELPER_L(helper_vpsrlvd, FPSRLVD) +SSE_HELPER_L(helper_vpsravd, FPSRAVD) +SSE_HELPER_L(helper_vpsllvd, FPSLLVD) + +SSE_HELPER_Q(helper_vpsrlvq, FPSRLVQ) +SSE_HELPER_Q(helper_vpsravq, FPSRAVQ) +SSE_HELPER_Q(helper_vpsllvq, FPSLLVQ) + #if SHIFT == 2 void glue(helper_vbroadcastdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index c52169a030..20db6c4240 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -421,6 +421,12 @@ DEF_HELPER_4(glue(vpermilpd, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(vpermilps, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(vpermilpd_imm, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_4(glue(vpermilps_imm, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_4(glue(vpsrlvd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsravd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsllvd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsrlvq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsravq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpsllvq, SUFFIX), void, env, Reg, Reg, Reg) #if SHIFT == 2 DEF_HELPER_3(glue(vbroadcastdq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_1(vzeroall, void, env) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 358c3ecb0b..4990470083 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3293,6 +3293,9 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x40] = BINARY_OP(pmulld, SSE41, SSE_OPF_MMX), #define gen_helper_phminposuw_ymm NULL [0x41] = UNARY_OP(phminposuw, SSE41, 0), +[0x45] = BINARY_OP(vpsrlvd, AVX, SSE_OPF_AVX2), +[0x46] = BINARY_OP(vpsravd, AVX, SSE_OPF_AVX2), +[0x47] = BINARY_OP(vpsllvd, AVX, SSE_OPF_AVX2), /* vpbroadcastd */ [0x58] = UNARY_OP(vbroadcastl, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), /* vpbroadcastq */ @@ -3357,6 +3360,15 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { #undef BLENDV_OP #undef SPECIAL_OP +#define SSE_OP(name) \ +{gen_helper_ ## name ##_xmm, gen_helper_ ## name ##_ymm} +static const SSEFunc_0_eppp sse_op_table8[3][2] = { +SSE_OP(vpsrlvq), +SSE_OP(vpsravq), +SSE_OP(vpsllvq), +}; +#undef SSE_OP + /* VEX prefix not allowed */ #define CHECK_NO_VEX(s) do { \ if (s->prefix & PREFIX_VEX) \ @@ -4439,6 +4451,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, tcg_temp_free_ptr(mask); } else { SSEFunc_0_eppp fn = op6.fn[b1].op2; +if (REX_W(s)) { +if (b >= 0x45 && b <= 0x47) { +fn = sse_op_table8[b - 0x45][b1 - 1]; +} +} fn(cpu_env, s->ptr0, s->ptr2, s->ptr1); } } -- 2.36.0
[PATCH v2 12/42] i386: Misc integer AVX helper prep
More perparatory work for AVX support in various integer vector helpers No functional changes to existing helpers. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 133 +- 1 file changed, 104 insertions(+), 29 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index bb9cbf9ead..d0424140d9 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -557,19 +557,25 @@ SSE_HELPER_W(helper_pavgw, FAVG) void glue(helper_pmuludq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { -d->Q(0) = (uint64_t)s->L(0) * (uint64_t)d->L(0); -#if SHIFT == 1 -d->Q(1) = (uint64_t)s->L(2) * (uint64_t)d->L(2); +Reg *v = d; +d->Q(0) = (uint64_t)s->L(0) * (uint64_t)v->L(0); +#if SHIFT >= 1 +d->Q(1) = (uint64_t)s->L(2) * (uint64_t)v->L(2); +#if SHIFT == 2 +d->Q(2) = (uint64_t)s->L(4) * (uint64_t)v->L(4); +d->Q(3) = (uint64_t)s->L(6) * (uint64_t)v->L(6); +#endif #endif } void glue(helper_pmaddwd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { +Reg *v = d; int i; for (i = 0; i < (2 << SHIFT); i++) { -d->L(i) = (int16_t)s->W(2 * i) * (int16_t)d->W(2 * i) + -(int16_t)s->W(2 * i + 1) * (int16_t)d->W(2 * i + 1); +d->L(i) = (int16_t)s->W(2 * i) * (int16_t)v->W(2 * i) + +(int16_t)s->W(2 * i + 1) * (int16_t)v->W(2 * i + 1); } } @@ -583,31 +589,55 @@ static inline int abs1(int a) } } #endif + void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { +Reg *v = d; unsigned int val; val = 0; -val += abs1(d->B(0) - s->B(0)); -val += abs1(d->B(1) - s->B(1)); -val += abs1(d->B(2) - s->B(2)); -val += abs1(d->B(3) - s->B(3)); -val += abs1(d->B(4) - s->B(4)); -val += abs1(d->B(5) - s->B(5)); -val += abs1(d->B(6) - s->B(6)); -val += abs1(d->B(7) - s->B(7)); +val += abs1(v->B(0) - s->B(0)); +val += abs1(v->B(1) - s->B(1)); +val += abs1(v->B(2) - s->B(2)); +val += abs1(v->B(3) - s->B(3)); +val += abs1(v->B(4) - s->B(4)); +val += abs1(v->B(5) - s->B(5)); +val += abs1(v->B(6) - s->B(6)); +val += abs1(v->B(7) - s->B(7)); d->Q(0) = val; -#if SHIFT == 1 +#if SHIFT >= 1 val = 0; -val += abs1(d->B(8) - s->B(8)); -val += abs1(d->B(9) - s->B(9)); -val += abs1(d->B(10) - s->B(10)); -val += abs1(d->B(11) - s->B(11)); -val += abs1(d->B(12) - s->B(12)); -val += abs1(d->B(13) - s->B(13)); -val += abs1(d->B(14) - s->B(14)); -val += abs1(d->B(15) - s->B(15)); +val += abs1(v->B(8) - s->B(8)); +val += abs1(v->B(9) - s->B(9)); +val += abs1(v->B(10) - s->B(10)); +val += abs1(v->B(11) - s->B(11)); +val += abs1(v->B(12) - s->B(12)); +val += abs1(v->B(13) - s->B(13)); +val += abs1(v->B(14) - s->B(14)); +val += abs1(v->B(15) - s->B(15)); d->Q(1) = val; +#if SHIFT == 2 +val = 0; +val += abs1(v->B(16) - s->B(16)); +val += abs1(v->B(17) - s->B(17)); +val += abs1(v->B(18) - s->B(18)); +val += abs1(v->B(19) - s->B(19)); +val += abs1(v->B(20) - s->B(20)); +val += abs1(v->B(21) - s->B(21)); +val += abs1(v->B(22) - s->B(22)); +val += abs1(v->B(23) - s->B(23)); +d->Q(2) = val; +val = 0; +val += abs1(v->B(24) - s->B(24)); +val += abs1(v->B(25) - s->B(25)); +val += abs1(v->B(26) - s->B(26)); +val += abs1(v->B(27) - s->B(27)); +val += abs1(v->B(28) - s->B(28)); +val += abs1(v->B(29) - s->B(29)); +val += abs1(v->B(30) - s->B(30)); +val += abs1(v->B(31) - s->B(31)); +d->Q(3) = val; +#endif #endif } @@ -627,8 +657,12 @@ void glue(helper_movl_mm_T0, SUFFIX)(Reg *d, uint32_t val) { d->L(0) = val; d->L(1) = 0; -#if SHIFT == 1 +#if SHIFT >= 1 d->Q(1) = 0; +#if SHIFT == 2 +d->Q(2) = 0; +d->Q(3) = 0; +#endif #endif } @@ -636,8 +670,12 @@ void glue(helper_movl_mm_T0, SUFFIX)(Reg *d, uint32_t val) void glue(helper_movq_mm_T0, SUFFIX)(Reg *d, uint64_t val) { d->Q(0) = val; -#if SHIFT == 1 +#if SHIFT >= 1 d->Q(1) = 0; +#if SHIFT == 2 +d->Q(2) = 0; +d->Q(3) = 0; +#endif #endif } #endif @@ -1251,7 +1289,7 @@ uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) val |= (s->B(5) >> 2) & 0x20; val |= (s->B(6) >> 1) & 0x40; val |= (s->B(7)) & 0x80; -#if SHIFT == 1 +#if SHIFT >= 1 val |= (s->B(8) << 1) & 0x0100; val |= (s->B(9) << 2) & 0x0200; val |= (s->B(10) << 3) & 0x0400; @@ -1260,6 +1298,24 @@ uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) val |= (s->B(13) << 6) & 0x2000; val |= (s->B(14) << 7) & 0x4000; val |= (s->B(15) << 8) & 0x8000; +#if SHIFT == 2 +val |= ((uint32_t)s->B(16) << 9) & 0x0001; +val |= ((uint32_t)s->B(17) << 10) & 0x0002; +val |= ((uint32_t)s->B(18) << 11) & 0x0004; +val |= ((uint32_t)s->B(19) << 12) & 0x0008; +val |= ((uint32_t)s->B(20) << 13) & 0x0010; +val |= ((uint32_t)s->B(21) << 14) &
[PATCH v2 20/42] i386: AVX pclmulqdq
Make the pclmulqdq helper AVX ready Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 31 --- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 9f388b02b9..b7100fdce1 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2885,14 +2885,14 @@ target_ulong helper_crc32(uint32_t crc1, target_ulong msg, uint32_t len) #endif -void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, -uint32_t ctrl) +#if SHIFT == 1 +static void clmulq(uint64_t *dest_l, uint64_t *dest_h, + uint64_t a, uint64_t b) { -uint64_t ah, al, b, resh, resl; +uint64_t al, ah, resh, resl; ah = 0; -al = d->Q((ctrl & 1) != 0); -b = s->Q((ctrl & 16) != 0); +al = a; resh = resl = 0; while (b) { @@ -2905,8 +2905,25 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, b >>= 1; } -d->Q(0) = resl; -d->Q(1) = resh; +*dest_l = resl; +*dest_h = resh; +} +#endif + +void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, +uint32_t ctrl) +{ +Reg *v = d; +uint64_t a, b; + +a = v->Q((ctrl & 1) != 0); +b = s->Q((ctrl & 16) != 0); +clmulq(>Q(0), >Q(1), a, b); +#if SHIFT == 2 +a = v->Q(((ctrl & 1) != 0) + 2); +b = s->Q(((ctrl & 16) != 0) + 2); +clmulq(>Q(2), >Q(3), a, b); +#endif } void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -- 2.36.0
[PATCH v2 32/42] i386: Implement VTEST
Noting special here Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 28 target/i386/ops_sse_header.h | 2 ++ target/i386/tcg/translate.c | 2 ++ 3 files changed, 32 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 8f2bd48394..edf14a25d7 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3212,6 +3212,34 @@ SSE_HELPER_Q(helper_vpsrlvq, FPSRLVQ) SSE_HELPER_Q(helper_vpsravq, FPSRAVQ) SSE_HELPER_Q(helper_vpsllvq, FPSLLVQ) +void glue(helper_vtestps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +uint32_t zf = (s->L(0) & d->L(0)) | (s->L(1) & d->L(1)); +uint32_t cf = (s->L(0) & ~d->L(0)) | (s->L(1) & ~d->L(1)); + +zf |= (s->L(2) & d->L(2)) | (s->L(3) & d->L(3)); +cf |= (s->L(2) & ~d->L(2)) | (s->L(3) & ~d->L(3)); +#if SHIFT == 2 +zf |= (s->L(4) & d->L(4)) | (s->L(5) & d->L(5)); +cf |= (s->L(4) & ~d->L(4)) | (s->L(5) & ~d->L(5)); +zf |= (s->L(6) & d->L(6)) | (s->L(7) & d->L(7)); +cf |= (s->L(6) & ~d->L(6)) | (s->L(7) & ~d->L(7)); +#endif +CC_SRC = ((zf >> 31) ? 0 : CC_Z) | ((cf >> 31) ? 0 : CC_C); +} + +void glue(helper_vtestpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +uint64_t zf = (s->Q(0) & d->Q(0)) | (s->Q(1) & d->Q(1)); +uint64_t cf = (s->Q(0) & ~d->Q(0)) | (s->Q(1) & ~d->Q(1)); + +#if SHIFT == 2 +zf |= (s->Q(2) & d->Q(2)) | (s->Q(3) & d->Q(3)); +cf |= (s->Q(2) & ~d->Q(2)) | (s->Q(3) & ~d->Q(3)); +#endif +CC_SRC = ((zf >> 63) ? 0 : CC_Z) | ((cf >> 63) ? 0 : CC_C); +} + #if SHIFT == 2 void glue(helper_vbroadcastdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 20db6c4240..8b93b8e6d6 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -427,6 +427,8 @@ DEF_HELPER_4(glue(vpsllvd, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(vpsrlvq, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(vpsravq, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(vpsllvq, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_3(glue(vtestps, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(vtestpd, SUFFIX), void, env, Reg, Reg) #if SHIFT == 2 DEF_HELPER_3(glue(vbroadcastdq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_1(vzeroall, void, env) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 4990470083..2fbb7bfcad 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3253,6 +3253,8 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x0b] = BINARY_OP_MMX(pmulhrsw, SSSE3), [0x0c] = BINARY_OP(vpermilps, AVX, 0), [0x0d] = BINARY_OP(vpermilpd, AVX, 0), +[0x0e] = CMP_OP(vtestps, AVX), +[0x0f] = CMP_OP(vtestpd, AVX), [0x10] = BLENDV_OP(pblendvb, SSE41, SSE_OPF_MMX), [0x14] = BLENDV_OP(blendvps, SSE41, 0), [0x15] = BLENDV_OP(blendvpd, SSE41, 0), -- 2.36.0
[PATCH v2 14/42] i386: Add size suffix to vector FP helpers
For AVX we're going to need both 128 bit (xmm) and 256 bit (ymm) variants of floating point helpers. Add the register type suffix to the existing *PS and *PD helpers (SS and SD variants are only valid on 128 bit vectors) No functional changes. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 48 ++-- target/i386/ops_sse_header.h | 48 ++-- target/i386/tcg/translate.c | 37 +-- 3 files changed, 67 insertions(+), 66 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index c645d2ddbf..fc8fd57aa5 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -699,7 +699,7 @@ void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order) SHUFFLE4(W, s, s, 0); } #else -void helper_shufps(Reg *d, Reg *s, int order) +void glue(helper_shufps, SUFFIX)(Reg *d, Reg *s, int order) { Reg *v = d; uint32_t r0, r1, r2, r3; @@ -710,7 +710,7 @@ void helper_shufps(Reg *d, Reg *s, int order) #endif } -void helper_shufpd(Reg *d, Reg *s, int order) +void glue(helper_shufpd, SUFFIX)(Reg *d, Reg *s, int order) { Reg *v = d; uint64_t r0, r1; @@ -767,7 +767,7 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) /* XXX: not accurate */ #define SSE_HELPER_S(name, F) \ -void helper_ ## name ## ps(CPUX86State *env, Reg *d, Reg *s)\ +void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ { \ d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ d->ZMM_S(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ @@ -780,7 +780,7 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ } \ \ -void helper_ ## name ## pd(CPUX86State *env, Reg *d, Reg *s)\ +void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ { \ d->ZMM_D(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ d->ZMM_D(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ @@ -816,7 +816,7 @@ SSE_HELPER_S(sqrt, FPU_SQRT) /* float to float conversions */ -void helper_cvtps2pd(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtps2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { float32 s0, s1; @@ -826,7 +826,7 @@ void helper_cvtps2pd(CPUX86State *env, Reg *d, Reg *s) d->ZMM_D(1) = float32_to_float64(s1, >sse_status); } -void helper_cvtpd2ps(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtpd2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), >sse_status); d->ZMM_S(1) = float64_to_float32(s->ZMM_D(1), >sse_status); @@ -844,7 +844,7 @@ void helper_cvtsd2ss(CPUX86State *env, Reg *d, Reg *s) } /* integer to float */ -void helper_cvtdq2ps(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtdq2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = int32_to_float32(s->ZMM_L(0), >sse_status); d->ZMM_S(1) = int32_to_float32(s->ZMM_L(1), >sse_status); @@ -852,7 +852,7 @@ void helper_cvtdq2ps(CPUX86State *env, Reg *d, Reg *s) d->ZMM_S(3) = int32_to_float32(s->ZMM_L(3), >sse_status); } -void helper_cvtdq2pd(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_cvtdq2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int32_t l0, l1; @@ -929,7 +929,7 @@ WRAP_FLOATCONV(int64_t, float32_to_int64_round_to_zero, float32, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64, float64, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64_round_to_zero, float64, INT64_MIN) -void helper_cvtps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvtps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float32_to_int32(s->ZMM_S(0), >sse_status); d->ZMM_L(1) = x86_float32_to_int32(s->ZMM_S(1), >sse_status); @@ -937,7 +937,7 @@ void helper_cvtps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) d->ZMM_L(3) = x86_float32_to_int32(s->ZMM_S(3), >sse_status); } -void helper_cvtpd2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvtpd2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float64_to_int32(s->ZMM_D(0), >sse_status); d->ZMM_L(1) = x86_float64_to_int32(s->ZMM_D(1), >sse_status); @@ -979,7 +979,7 @@ int64_t helper_cvtsd2sq(CPUX86State *env, ZMMReg *s) #endif /* float to integer truncated */ -void helper_cvttps2dq(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_cvttps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) =
[PATCH v2 27/42] i386: Translate 256 bit AVX instructions
All the work for the helper functions is already done, we just need to build them, and a few macro tweaks to poulate the lookup tables. For sse_op_table6 and sse_op_table7 we use #defines to fill in the entries where and opcode only supports one vector size, rather than complicating the main table. Several of the open-coded mov type instruction need special handling, but most of the rest falls out from the infrastructure we already added. Also clear the top half of the register after 128 bit VEX register writes. In the current code this correlates with VEX.L == 0, but there are exceptios later. Signed-off-by: Paul Brook --- target/i386/helper.h | 2 + target/i386/tcg/fpu_helper.c | 3 + target/i386/tcg/translate.c | 370 +-- 3 files changed, 319 insertions(+), 56 deletions(-) diff --git a/target/i386/helper.h b/target/i386/helper.h index ac3b4d1ee3..3da5df98b9 100644 --- a/target/i386/helper.h +++ b/target/i386/helper.h @@ -218,6 +218,8 @@ DEF_HELPER_3(movq, void, env, ptr, ptr) #include "ops_sse_header.h" #define SHIFT 1 #include "ops_sse_header.h" +#define SHIFT 2 +#include "ops_sse_header.h" DEF_HELPER_3(rclb, tl, env, tl, tl) DEF_HELPER_3(rclw, tl, env, tl, tl) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index b391b69635..74cf86c986 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -3053,3 +3053,6 @@ void helper_movq(CPUX86State *env, void *d, void *s) #define SHIFT 1 #include "ops_sse.h" + +#define SHIFT 2 +#include "ops_sse.h" diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 278ed8ed1c..bcd6d47fd0 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2742,6 +2742,29 @@ static inline void gen_ldo_env_A0(DisasContext *s, int offset) tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(1))); } +static inline void gen_ldo_env_A0_ymmh(DisasContext *s, int offset) +{ +int mem_index = s->mem_index; +tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, mem_index, MO_LEUQ); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_addi_tl(s->tmp0, s->A0, 8); +tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3))); +} + +/* Load 256-bit ymm register value */ +static inline void gen_ldy_env_A0(DisasContext *s, int offset) +{ +int mem_index = s->mem_index; +gen_ldo_env_A0(s, offset); +tcg_gen_addi_tl(s->tmp0, s->A0, 16); +tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_addi_tl(s->tmp0, s->A0, 24); +tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3))); +} + static inline void gen_sto_env_A0(DisasContext *s, int offset) { int mem_index = s->mem_index; @@ -2752,6 +2775,29 @@ static inline void gen_sto_env_A0(DisasContext *s, int offset) tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); } +static inline void gen_sto_env_A0_ymmh(DisasContext *s, int offset) +{ +int mem_index = s->mem_index; +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0, mem_index, MO_LEUQ); +tcg_gen_addi_tl(s->tmp0, s->A0, 8); +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3))); +tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +} + +/* Store 256-bit ymm register value */ +static inline void gen_sty_env_A0(DisasContext *s, int offset) +{ +int mem_index = s->mem_index; +gen_sto_env_A0(s, offset); +tcg_gen_addi_tl(s->tmp0, s->A0, 16); +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +tcg_gen_addi_tl(s->tmp0, s->A0, 24); +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(3))); +tcg_gen_qemu_st_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ); +} + static inline void gen_op_movo(DisasContext *s, int d_offset, int s_offset) { tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(0))); @@ -2760,6 +2806,14 @@ static inline void gen_op_movo(DisasContext *s, int d_offset, int s_offset) tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(1))); } +static inline void gen_op_movo_ymmh(DisasContext *s, int d_offset, int s_offset) +{ +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(3))); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(3))); +} + static inline void
[PATCH v2 15/42] i386: Floating point atithmetic helper AVX prep
Prepare the "easy" floating point vector helpers for AVX No functional changes to existing helpers. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 144 ++ 1 file changed, 119 insertions(+), 25 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index fc8fd57aa5..d308a1ec40 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -762,40 +762,66 @@ void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) } #endif -#if SHIFT == 1 +#if SHIFT >= 1 /* FPU ops */ /* XXX: not accurate */ -#define SSE_HELPER_S(name, F) \ -void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ +#define SSE_HELPER_P(name, F) \ +void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, \ +Reg *d, Reg *s) \ { \ -d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ -d->ZMM_S(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ -d->ZMM_S(2) = F(32, d->ZMM_S(2), s->ZMM_S(2)); \ -d->ZMM_S(3) = F(32, d->ZMM_S(3), s->ZMM_S(3)); \ +Reg *v = d; \ +d->ZMM_S(0) = F(32, v->ZMM_S(0), s->ZMM_S(0)); \ +d->ZMM_S(1) = F(32, v->ZMM_S(1), s->ZMM_S(1)); \ +d->ZMM_S(2) = F(32, v->ZMM_S(2), s->ZMM_S(2)); \ +d->ZMM_S(3) = F(32, v->ZMM_S(3), s->ZMM_S(3)); \ +YMM_ONLY( \ +d->ZMM_S(4) = F(32, v->ZMM_S(4), s->ZMM_S(4)); \ +d->ZMM_S(5) = F(32, v->ZMM_S(5), s->ZMM_S(5)); \ +d->ZMM_S(6) = F(32, v->ZMM_S(6), s->ZMM_S(6)); \ +d->ZMM_S(7) = F(32, v->ZMM_S(7), s->ZMM_S(7)); \ +) \ } \ \ -void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s)\ +void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, \ +Reg *d, Reg *s) \ { \ -d->ZMM_S(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ -} \ +Reg *v = d; \ +d->ZMM_D(0) = F(64, v->ZMM_D(0), s->ZMM_D(0)); \ +d->ZMM_D(1) = F(64, v->ZMM_D(1), s->ZMM_D(1)); \ +YMM_ONLY( \ +d->ZMM_D(2) = F(64, v->ZMM_D(2), s->ZMM_D(2)); \ +d->ZMM_D(3) = F(64, v->ZMM_D(3), s->ZMM_D(3)); \ +) \ +} + +#if SHIFT == 1 + +#define SSE_HELPER_S(name, F) \ +SSE_HELPER_P(name, F) \ \ -void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ +void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s)\ { \ -d->ZMM_D(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ -d->ZMM_D(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ +Reg *v = d; \ +d->ZMM_S(0) = F(32, v->ZMM_S(0), s->ZMM_S(0)); \ } \ \ -void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s)\ +void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s)\ { \ -d->ZMM_D(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ +Reg *v = d; \ +d->ZMM_D(0) = F(64, v->ZMM_D(0), s->ZMM_D(0)); \ } +#else + +#define SSE_HELPER_S(name, F) SSE_HELPER_P(name, F) + +#endif + #define FPU_ADD(size, a, b) float ## size ## _add(a, b, >sse_status) #define FPU_SUB(size, a, b) float ## size ## _sub(a, b, >sse_status) #define FPU_MUL(size, a, b) float ## size ## _mul(a, b, >sse_status) #define FPU_DIV(size, a, b) float ## size ## _div(a, b,
[PATCH v2 26/42] i386: Utility function for 128 bit AVX
VEX encoded instructions that write to a (128 bit) xmm register clear the rest (upper half) of the corresonding (256 bit) ymm register. When legacy SSE encodings are used the rest of the ymm register is left unchanged. Add a utility fuction so that we don't have to keep duplicating this logic. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 12 1 file changed, 12 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index d148a2319d..278ed8ed1c 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2780,6 +2780,18 @@ static inline void gen_op_movq_env_0(DisasContext *s, int d_offset) #define ZMM_OFFSET(reg) offsetof(CPUX86State, xmm_regs[reg]) +/* + * Clear the top half of the ymm register after a VEX.128 instruction + * This could be optimized by tracking this in env->hflags + */ +static void gen_clear_ymmh(DisasContext *s, int reg) +{ +if (s->prefix & PREFIX_VEX) { +gen_op_movq_env_0(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(2))); +gen_op_movq_env_0(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(3))); +} +} + typedef void (*SSEFunc_i_ep)(TCGv_i32 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_l_ep)(TCGv_i64 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_0_epi)(TCGv_ptr env, TCGv_ptr reg, TCGv_i32 val); -- 2.36.0
[PATCH v2 25/42] i386: VEX.V encodings (3 operand)
Enable translation of VEX encoded AVX instructions. The big change is the addition of an additional register operand in the VEX.V field. This is usually (but not always!) used to explictly encode the first source operand. The changes to ops_sse.h and ops_sse_header.h are purely mechanical, with pervious changes ensuring that the relevant helper functions are ready to handle the non destructive source operand. We now have a grater variety of operand patterns for the vector helper functions. The SSE_OPF_* flags we added to the opcode lookup tables are used to select between these. This includes e.g. pshufX and cmpX instructions which were previously overriden by opcode. One gotcha is the "scalar" vector instructions. The SSE encodings write a single element to the destination and leave the remainder of the register unchanged. The VEX encodings which copy the remainder of the destination from first source operand. If the operation only has a single source value, then the VEX.V encodes an additional operand from which is coped to the the remainder of destination. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 214 +-- target/i386/ops_sse_header.h | 149 ++--- target/i386/tcg/translate.c | 399 +-- 3 files changed, 463 insertions(+), 299 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index e48dfc2fc5..ad3312d353 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -97,9 +97,8 @@ #define FPSLL(x, c) ((x) << shift) #endif -void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 15) { d->Q(0) = 0; @@ -114,9 +113,8 @@ void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 15) { d->Q(0) = 0; @@ -131,9 +129,8 @@ void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 15) { shift = 15; @@ -143,9 +140,8 @@ void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) SHIFT_HELPER_BODY(4 << SHIFT, W, FPSRAW); } -void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 31) { d->Q(0) = 0; @@ -160,9 +156,8 @@ void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 31) { d->Q(0) = 0; @@ -177,9 +172,8 @@ void glue(helper_pslld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 31) { shift = 31; @@ -189,9 +183,8 @@ void glue(helper_psrad, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) SHIFT_HELPER_BODY(2 << SHIFT, L, FPSRAL); } -void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 63) { d->Q(0) = 0; @@ -206,9 +199,8 @@ void glue(helper_psrlq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } } -void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift; if (c->Q(0) > 63) { d->Q(0) = 0; @@ -224,9 +216,8 @@ void glue(helper_psllq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } #if SHIFT >= 1 -void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift, i; shift = c->L(0); @@ -249,9 +240,8 @@ void glue(helper_psrldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) #endif } -void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) +void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, Reg *c) { -Reg *s = d; int shift, i; shift = c->L(0); @@ -321,9 +311,8 @@ void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } #define SSE_HELPER_B(name, F) \ -void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +void
[PATCH v2 36/42] i386: Implement VINSERT128/VEXTRACT128
128-bit vinsert/vextract instructions. The integer and loating point variants have the same semantics. This is where we encounter an instruction encoded with VEX.L == 1 and a 128 bit (xmm) destination operand. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 78 + 1 file changed, 78 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 5a11d3c083..4072fa28d3 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2814,6 +2814,24 @@ static inline void gen_op_movo_ymmh(DisasContext *s, int d_offset, int s_offset) tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(3))); } +static inline void gen_op_movo_ymm_l2h(DisasContext *s, + int d_offset, int s_offset) +{ +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(0))); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(1))); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(3))); +} + +static inline void gen_op_movo_ymm_h2l(DisasContext *s, + int d_offset, int s_offset) +{ +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(2))); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(0))); +tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset + offsetof(ZMMReg, ZMM_Q(3))); +tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset + offsetof(ZMMReg, ZMM_Q(1))); +} + static inline void gen_op_movq(DisasContext *s, int d_offset, int s_offset) { tcg_gen_ld_i64(s->tmp1_i64, cpu_env, s_offset); @@ -3353,9 +3371,13 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { [0x15] = SPECIAL_OP(SSE41), /* pextrw */ [0x16] = SPECIAL_OP(SSE41), /* pextrd/pextrq */ [0x17] = SPECIAL_OP(SSE41), /* extractps */ +[0x18] = SPECIAL_OP(AVX), /* vinsertf128 */ +[0x19] = SPECIAL_OP(AVX), /* vextractf128 */ [0x20] = SPECIAL_OP(SSE41), /* pinsrb */ [0x21] = SPECIAL_OP(SSE41), /* insertps */ [0x22] = SPECIAL_OP(SSE41), /* pinsrd/pinsrq */ +[0x38] = SPECIAL_OP(AVX), /* vinserti128 */ +[0x39] = SPECIAL_OP(AVX), /* vextracti128 */ [0x40] = BINARY_OP(dpps, SSE41, 0), #define gen_helper_dppd_ymm NULL [0x41] = BINARY_OP(dppd, SSE41, 0), @@ -5145,6 +5167,62 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } gen_clear_ymmh(s, reg); break; +case 0x38: /* vinserti128 */ +CHECK_AVX2_256(s); +/* fall through */ +case 0x18: /* vinsertf128 */ +CHECK_AVX(s); +if ((s->prefix & PREFIX_VEX) == 0 || s->vex_l == 0) { +goto illegal_op; +} +if (mod == 3) { +if (val & 1) { +gen_op_movo_ymm_l2h(s, ZMM_OFFSET(reg), +ZMM_OFFSET(rm)); +} else { +gen_op_movo(s, ZMM_OFFSET(reg), ZMM_OFFSET(rm)); +} +} else { +if (val & 1) { +gen_ldo_env_A0_ymmh(s, ZMM_OFFSET(reg)); +} else { +gen_ldo_env_A0(s, ZMM_OFFSET(reg)); +} +} +if (reg != reg_v) { +if (val & 1) { +gen_op_movo(s, ZMM_OFFSET(reg), ZMM_OFFSET(reg_v)); +} else { +gen_op_movo_ymmh(s, ZMM_OFFSET(reg), + ZMM_OFFSET(reg_v)); +} +} +break; +case 0x39: /* vextracti128 */ +CHECK_AVX2_256(s); +/* fall through */ +case 0x19: /* vextractf128 */ +CHECK_AVX_V0(s); +if ((s->prefix & PREFIX_VEX) == 0 || s->vex_l == 0) { +goto illegal_op; +} +if (mod == 3) { +op1_offset = ZMM_OFFSET(rm); +if (val & 1) { +gen_op_movo_ymm_h2l(s, ZMM_OFFSET(rm), +ZMM_OFFSET(reg)); +} else { +gen_op_movo(s, ZMM_OFFSET(rm), ZMM_OFFSET(reg)); +} +gen_clear_ymmh(s, rm); +} else{ +if (val & 1) { +gen_sto_env_A0_ymmh(s, ZMM_OFFSET(reg)); +
[PATCH v2 21/42] i386: AVX+AES helpers
Make the AES vector helpers AVX ready No functional changes to existing helpers Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 63 ++-- target/i386/ops_sse_header.h | 55 ++- 2 files changed, 85 insertions(+), 33 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index b7100fdce1..48cec40074 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2929,64 +2929,92 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; -Reg st = *d; +Reg st = *d; // v Reg rk = *s; for (i = 0 ; i < 4 ; i++) { -d->L(i) = rk.L(i) ^ bswap32(AES_Td0[st.B(AES_ishifts[4*i+0])] ^ -AES_Td1[st.B(AES_ishifts[4*i+1])] ^ -AES_Td2[st.B(AES_ishifts[4*i+2])] ^ -AES_Td3[st.B(AES_ishifts[4*i+3])]); +d->L(i) = rk.L(i) ^ bswap32(AES_Td0[st.B(AES_ishifts[4 * i + 0])] ^ +AES_Td1[st.B(AES_ishifts[4 * i + 1])] ^ +AES_Td2[st.B(AES_ishifts[4 * i + 2])] ^ +AES_Td3[st.B(AES_ishifts[4 * i + 3])]); } +#if SHIFT == 2 +for (i = 0 ; i < 4 ; i++) { +d->L(i + 4) = rk.L(i + 4) ^ bswap32( +AES_Td0[st.B(AES_ishifts[4 * i + 0] + 16)] ^ +AES_Td1[st.B(AES_ishifts[4 * i + 1] + 16)] ^ +AES_Td2[st.B(AES_ishifts[4 * i + 2] + 16)] ^ +AES_Td3[st.B(AES_ishifts[4 * i + 3] + 16)]); +} +#endif } void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; -Reg st = *d; +Reg st = *d; // v Reg rk = *s; for (i = 0; i < 16; i++) { d->B(i) = rk.B(i) ^ (AES_isbox[st.B(AES_ishifts[i])]); } +#if SHIFT == 2 +for (i = 0; i < 16; i++) { +d->B(i + 16) = rk.B(i + 16) ^ (AES_isbox[st.B(AES_ishifts[i] + 16)]); +} +#endif } void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; -Reg st = *d; +Reg st = *d; // v Reg rk = *s; for (i = 0 ; i < 4 ; i++) { -d->L(i) = rk.L(i) ^ bswap32(AES_Te0[st.B(AES_shifts[4*i+0])] ^ -AES_Te1[st.B(AES_shifts[4*i+1])] ^ -AES_Te2[st.B(AES_shifts[4*i+2])] ^ -AES_Te3[st.B(AES_shifts[4*i+3])]); +d->L(i) = rk.L(i) ^ bswap32(AES_Te0[st.B(AES_shifts[4 * i + 0])] ^ +AES_Te1[st.B(AES_shifts[4 * i + 1])] ^ +AES_Te2[st.B(AES_shifts[4 * i + 2])] ^ +AES_Te3[st.B(AES_shifts[4 * i + 3])]); } +#if SHIFT == 2 +for (i = 0 ; i < 4 ; i++) { +d->L(i + 4) = rk.L(i + 4) ^ bswap32( +AES_Te0[st.B(AES_shifts[4 * i + 0] + 16)] ^ +AES_Te1[st.B(AES_shifts[4 * i + 1] + 16)] ^ +AES_Te2[st.B(AES_shifts[4 * i + 2] + 16)] ^ +AES_Te3[st.B(AES_shifts[4 * i + 3] + 16)]); +} +#endif } void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; -Reg st = *d; +Reg st = *d; // v Reg rk = *s; for (i = 0; i < 16; i++) { d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i])]); } - +#if SHIFT == 2 +for (i = 0; i < 16; i++) { +d->B(i + 16) = rk.B(i + 16) ^ (AES_sbox[st.B(AES_shifts[i] + 16)]); +} +#endif } +#if SHIFT == 1 void glue(helper_aesimc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { int i; Reg tmp = *s; for (i = 0 ; i < 4 ; i++) { -d->L(i) = bswap32(AES_imc[tmp.B(4*i+0)][0] ^ - AES_imc[tmp.B(4*i+1)][1] ^ - AES_imc[tmp.B(4*i+2)][2] ^ - AES_imc[tmp.B(4*i+3)][3]); +d->L(i) = bswap32(AES_imc[tmp.B(4 * i + 0)][0] ^ + AES_imc[tmp.B(4 * i + 1)][1] ^ + AES_imc[tmp.B(4 * i + 2)][2] ^ + AES_imc[tmp.B(4 * i + 3)][3]); } } @@ -3004,6 +3032,7 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, d->L(3) = (d->L(2) << 24 | d->L(2) >> 8) ^ ctrl; } #endif +#endif #undef SSE_HELPER_S diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index b8b0666f61..203afbb5a1 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -47,7 +47,7 @@ DEF_HELPER_3(glue(pslld, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(psrlq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(psllq, SUFFIX), void, env, Reg, Reg) -#if SHIFT == 1 +#if SHIFT >= 1 DEF_HELPER_3(glue(psrldq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(pslldq, SUFFIX), void, env, Reg, Reg) #endif @@ -105,7 +105,7 @@ SSE_HELPER_L(pcmpeql,
[PATCH v2 35/42] i386: Implement VPERM
A set of shuffle operations that operate on complete 256 bit registers. The integer and floating point variants have identical semantics. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 73 target/i386/ops_sse_header.h | 3 ++ target/i386/tcg/translate.c | 9 + 3 files changed, 85 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 14a2d1bf78..04d2006cd8 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3407,6 +3407,79 @@ void helper_vzeroupper_hi8(CPUX86State *env) } } #endif + +void helper_vpermdq_ymm(CPUX86State *env, +Reg *d, Reg *v, Reg *s, uint32_t order) +{ +uint64_t r0, r1, r2, r3; + +switch (order & 3) { +case 0: +r0 = v->Q(0); +r1 = v->Q(1); +break; +case 1: +r0 = v->Q(2); +r1 = v->Q(3); +break; +case 2: +r0 = s->Q(0); +r1 = s->Q(1); +break; +case 3: +r0 = s->Q(2); +r1 = s->Q(3); +break; +} +switch ((order >> 4) & 3) { +case 0: +r2 = v->Q(0); +r3 = v->Q(1); +break; +case 1: +r2 = v->Q(2); +r3 = v->Q(3); +break; +case 2: +r2 = s->Q(0); +r3 = s->Q(1); +break; +case 3: +r2 = s->Q(2); +r3 = s->Q(3); +break; +} +d->Q(0) = r0; +d->Q(1) = r1; +d->Q(2) = r2; +d->Q(3) = r3; +} + +void helper_vpermq_ymm(CPUX86State *env, Reg *d, Reg *s, uint32_t order) +{ +uint64_t r0, r1, r2, r3; +r0 = s->Q(order & 3); +r1 = s->Q((order >> 2) & 3); +r2 = s->Q((order >> 4) & 3); +r3 = s->Q((order >> 6) & 3); +d->Q(0) = r0; +d->Q(1) = r1; +d->Q(2) = r2; +d->Q(3) = r3; +} + +void helper_vpermd_ymm(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ +uint32_t r[8]; +int i; + +for (i = 0; i < 8; i++) { +r[i] = s->L(v->L(i) & 7); +} +for (i = 0; i < 8; i++) { +d->L(i) = r[i]; +} +} #endif #endif diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index e5d8ea9bb7..099e6e8ffc 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -457,6 +457,9 @@ DEF_HELPER_1(vzeroupper, void, env) DEF_HELPER_1(vzeroall_hi8, void, env) DEF_HELPER_1(vzeroupper_hi8, void, env) #endif +DEF_HELPER_5(vpermdq_ymm, void, env, Reg, Reg, Reg, i32) +DEF_HELPER_4(vpermq_ymm, void, env, Reg, Reg, i32) +DEF_HELPER_4(vpermd_ymm, void, env, Reg, Reg, Reg) #endif #endif diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index fe1ab58d07..5a11d3c083 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3258,6 +3258,8 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x10] = BLENDV_OP(pblendvb, SSE41, SSE_OPF_MMX), [0x14] = BLENDV_OP(blendvps, SSE41, 0), [0x15] = BLENDV_OP(blendvpd, SSE41, 0), +#define gen_helper_vpermd_xmm NULL +[0x16] = BINARY_OP(vpermd, AVX, SSE_OPF_AVX2), /* vpermps */ [0x17] = CMP_OP(ptest, SSE41), /* TODO:Some vbroadcast variants require AVX2 */ [0x18] = UNARY_OP(vbroadcastl, AVX, SSE_OPF_SCALAR), /* vbroadcastss */ @@ -3287,6 +3289,7 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x33] = UNARY_OP(pmovzxwd, SSE41, SSE_OPF_MMX), [0x34] = UNARY_OP(pmovzxwq, SSE41, SSE_OPF_MMX), [0x35] = UNARY_OP(pmovzxdq, SSE41, SSE_OPF_MMX), +[0x36] = BINARY_OP(vpermd, AVX, SSE_OPF_AVX2), /* vpermd */ [0x37] = BINARY_OP(pcmpgtq, SSE41, SSE_OPF_MMX), [0x38] = BINARY_OP(pminsb, SSE41, SSE_OPF_MMX), [0x39] = BINARY_OP(pminsd, SSE41, SSE_OPF_MMX), @@ -3329,8 +3332,13 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { /* prefix [66] 0f 3a */ static const struct SSEOpHelper_table7 sse_op_table7[256] = { +#define gen_helper_vpermq_xmm NULL +[0x00] = UNARY_OP(vpermq, AVX, SSE_OPF_AVX2), +[0x01] = UNARY_OP(vpermq, AVX, SSE_OPF_AVX2), /* vpermpd */ [0x04] = UNARY_OP(vpermilps_imm, AVX, 0), [0x05] = UNARY_OP(vpermilpd_imm, AVX, 0), +#define gen_helper_vpermdq_xmm NULL +[0x06] = BINARY_OP(vpermdq, AVX, 0), /* vperm2f128 */ [0x08] = UNARY_OP(roundps, SSE41, 0), [0x09] = UNARY_OP(roundpd, SSE41, 0), #define gen_helper_roundss_ymm NULL @@ -3353,6 +3361,7 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { [0x41] = BINARY_OP(dppd, SSE41, 0), [0x42] = BINARY_OP(mpsadbw, SSE41, SSE_OPF_MMX), [0x44] = BINARY_OP(pclmulqdq, PCLMULQDQ, 0), +[0x46] = BINARY_OP(vpermdq, AVX, SSE_OPF_AVX2), /* vperm2i128 */ #define gen_helper_pcmpestrm_ymm NULL [0x60] = CMP_OP(pcmpestrm, SSE42), #define gen_helper_pcmpestri_ymm NULL -- 2.36.0
[PATCH v2 39/42] i386: Enable AVX cpuid bits when using TCG
Include AVX and AVX2 in the guest cpuid features supported by TCG Signed-off-by: Paul Brook --- target/i386/cpu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 99343be926..bd35233d5b 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -625,12 +625,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, CPUID_EXT_SSE41 | CPUID_EXT_SSE42 | CPUID_EXT_POPCNT | \ CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */ \ CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \ - CPUID_EXT_RDRAND) + CPUID_EXT_RDRAND | CPUID_EXT_AVX) /* missing: CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, CPUID_EXT_SMX, CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID, CPUID_EXT_FMA, CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, CPUID_EXT_DCA, - CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER, CPUID_EXT_AVX, + CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER, CPUID_EXT_F16C */ #ifdef TARGET_X86_64 @@ -653,9 +653,9 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ADX | \ CPUID_7_0_EBX_PCOMMIT | CPUID_7_0_EBX_CLFLUSHOPT |\ CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_MPX | CPUID_7_0_EBX_FSGSBASE | \ - CPUID_7_0_EBX_ERMS) + CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_AVX2) /* missing: - CPUID_7_0_EBX_HLE, CPUID_7_0_EBX_AVX2, + CPUID_7_0_EBX_HLE CPUID_7_0_EBX_INVPCID, CPUID_7_0_EBX_RTM, CPUID_7_0_EBX_RDSEED */ #define TCG_7_0_ECX_FEATURES (CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU | \ -- 2.36.0
[PATCH v2 42/42] i386: Add sha512-avx test
Include sha512 built with avx[2] in the tcg tests. Signed-off-by: Paul Brook --- tests/tcg/i386/Makefile.target | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target index eb06f7eb89..a0335fff6d 100644 --- a/tests/tcg/i386/Makefile.target +++ b/tests/tcg/i386/Makefile.target @@ -79,7 +79,14 @@ sha512-sse: sha512.c run-sha512-sse: QEMU_OPTS+=-cpu max run-plugin-sha512-sse-with-%: QEMU_OPTS+=-cpu max -TESTS+=sha512-sse +sha512-avx: CFLAGS=-mavx2 -mavx -O3 +sha512-avx: sha512.c + $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS) + +run-sha512-avx: QEMU_OPTS+=-cpu max +run-plugin-sha512-avx-with-%: QEMU_OPTS+=-cpu max + +TESTS+=sha512-sse sha512-avx test-avx.h: test-avx.py x86.csv $(PYTHON) $(I386_SRC)/test-avx.py $(I386_SRC)/x86.csv $@ -- 2.36.0
[PATCH v2 16/42] i386: Dot product AVX helper prep
Make the dpps and dppd helpers AVX-ready I can't see any obvious reason why dppd shouldn't work on 256 bit ymm registers, but both AMD and Intel agree that it's xmm only. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 54 --- 1 file changed, 46 insertions(+), 8 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index d308a1ec40..4137e6e1fa 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2366,8 +2366,10 @@ SSE_HELPER_I(helper_blendps, L, 4, FBLENDP) SSE_HELPER_I(helper_blendpd, Q, 2, FBLENDP) SSE_HELPER_I(helper_pblendw, W, 8, FBLENDP) -void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) +void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, + uint32_t mask) { +Reg *v = d; float32 prod, iresult, iresult2; /* @@ -2375,23 +2377,23 @@ void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) * to correctly round the intermediate results */ if (mask & (1 << 4)) { -iresult = float32_mul(d->ZMM_S(0), s->ZMM_S(0), >sse_status); +iresult = float32_mul(v->ZMM_S(0), s->ZMM_S(0), >sse_status); } else { iresult = float32_zero; } if (mask & (1 << 5)) { -prod = float32_mul(d->ZMM_S(1), s->ZMM_S(1), >sse_status); +prod = float32_mul(v->ZMM_S(1), s->ZMM_S(1), >sse_status); } else { prod = float32_zero; } iresult = float32_add(iresult, prod, >sse_status); if (mask & (1 << 6)) { -iresult2 = float32_mul(d->ZMM_S(2), s->ZMM_S(2), >sse_status); +iresult2 = float32_mul(v->ZMM_S(2), s->ZMM_S(2), >sse_status); } else { iresult2 = float32_zero; } if (mask & (1 << 7)) { -prod = float32_mul(d->ZMM_S(3), s->ZMM_S(3), >sse_status); +prod = float32_mul(v->ZMM_S(3), s->ZMM_S(3), >sse_status); } else { prod = float32_zero; } @@ -2402,26 +2404,62 @@ void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) d->ZMM_S(1) = (mask & (1 << 1)) ? iresult : float32_zero; d->ZMM_S(2) = (mask & (1 << 2)) ? iresult : float32_zero; d->ZMM_S(3) = (mask & (1 << 3)) ? iresult : float32_zero; +#if SHIFT == 2 +if (mask & (1 << 4)) { +iresult = float32_mul(v->ZMM_S(4), s->ZMM_S(4), >sse_status); +} else { +iresult = float32_zero; +} +if (mask & (1 << 5)) { +prod = float32_mul(v->ZMM_S(5), s->ZMM_S(5), >sse_status); +} else { +prod = float32_zero; +} +iresult = float32_add(iresult, prod, >sse_status); +if (mask & (1 << 6)) { +iresult2 = float32_mul(v->ZMM_S(6), s->ZMM_S(6), >sse_status); +} else { +iresult2 = float32_zero; +} +if (mask & (1 << 7)) { +prod = float32_mul(v->ZMM_S(7), s->ZMM_S(7), >sse_status); +} else { +prod = float32_zero; +} +iresult2 = float32_add(iresult2, prod, >sse_status); +iresult = float32_add(iresult, iresult2, >sse_status); + +d->ZMM_S(4) = (mask & (1 << 0)) ? iresult : float32_zero; +d->ZMM_S(5) = (mask & (1 << 1)) ? iresult : float32_zero; +d->ZMM_S(6) = (mask & (1 << 2)) ? iresult : float32_zero; +d->ZMM_S(7) = (mask & (1 << 3)) ? iresult : float32_zero; +#endif } -void glue(helper_dppd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) +#if SHIFT == 1 +/* Oddly, there is no ymm version of dppd */ +void glue(helper_dppd, SUFFIX)(CPUX86State *env, + Reg *d, Reg *s, uint32_t mask) { +Reg *v = d; float64 iresult; if (mask & (1 << 4)) { -iresult = float64_mul(d->ZMM_D(0), s->ZMM_D(0), >sse_status); +iresult = float64_mul(v->ZMM_D(0), s->ZMM_D(0), >sse_status); } else { iresult = float64_zero; } + if (mask & (1 << 5)) { iresult = float64_add(iresult, - float64_mul(d->ZMM_D(1), s->ZMM_D(1), + float64_mul(v->ZMM_D(1), s->ZMM_D(1), >sse_status), >sse_status); } d->ZMM_D(0) = (mask & (1 << 0)) ? iresult : float64_zero; d->ZMM_D(1) = (mask & (1 << 1)) ? iresult : float64_zero; } +#endif void glue(helper_mpsadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t offset) -- 2.36.0
[PATCH v2 34/42] i386: Implement VGATHER
These are scatter load instructions that need introduce a new "Vector SIB" encoding. Also a bit of hair to handle different index sizes and scaling factors, but overall the combinatorial explosion doesn't end up too bad. The other thing of note is probably that these also modify the mask operand. Thankfully the operands may not overlap, and we do not have to make the whole thing appear atomic. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 65 +++ target/i386/ops_sse_header.h | 16 target/i386/tcg/translate.c | 74 3 files changed, 155 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index ffcba3d02c..14a2d1bf78 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3288,6 +3288,71 @@ void glue(helper_vpmaskmovq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) #endif } +#define VGATHER_HELPER(scale) \ +void glue(helper_vpgatherdd ## scale, SUFFIX)(CPUX86State *env, \ +Reg *d, Reg *v, Reg *s, target_ulong a0)\ +{ \ +int i; \ +for (i = 0; i < (2 << SHIFT); i++) {\ +if (v->L(i) >> 31) {\ +target_ulong addr = a0 \ ++ ((target_ulong)(int32_t)s->L(i) << scale);\ +d->L(i) = cpu_ldl_data_ra(env, addr, GETPC()); \ +} \ +v->L(i) = 0;\ +} \ +} \ +void glue(helper_vpgatherdq ## scale, SUFFIX)(CPUX86State *env, \ +Reg *d, Reg *v, Reg *s, target_ulong a0)\ +{ \ +int i; \ +for (i = 0; i < (1 << SHIFT); i++) {\ +if (v->Q(i) >> 63) {\ +target_ulong addr = a0 \ ++ ((target_ulong)(int32_t)s->L(i) << scale);\ +d->Q(i) = cpu_ldq_data_ra(env, addr, GETPC()); \ +} \ +v->Q(i) = 0;\ +} \ +} \ +void glue(helper_vpgatherqd ## scale, SUFFIX)(CPUX86State *env, \ +Reg *d, Reg *v, Reg *s, target_ulong a0)\ +{ \ +int i; \ +for (i = 0; i < (1 << SHIFT); i++) {\ +if (v->L(i) >> 31) {\ +target_ulong addr = a0 \ ++ ((target_ulong)(int64_t)s->Q(i) << scale);\ +d->L(i) = cpu_ldl_data_ra(env, addr, GETPC()); \ +} \ +v->L(i) = 0;\ +} \ +d->Q(SHIFT) = 0;\ +v->Q(SHIFT) = 0;\ +YMM_ONLY( \ +d->Q(3) = 0;\ +v->Q(3) = 0;\ +) \ +} \ +void glue(helper_vpgatherqq ## scale, SUFFIX)(CPUX86State *env, \ +Reg *d, Reg *v, Reg *s, target_ulong a0)\ +{ \ +int i; \ +for (i = 0; i < (1 << SHIFT); i++) {\ +if (v->Q(i) >> 63) {\ +target_ulong addr = a0 \ ++ ((target_ulong)(int64_t)s->Q(i) << scale);\ +d->Q(i) = cpu_ldq_data_ra(env, addr, GETPC()); \ +} \ +v->Q(i) = 0;\ +}
[PATCH v2 23/42] i386: AVX comparison helpers
AVX includes additional a more extensive set of comparison predicates, some of some of which our softfloat implementation does not expose directly. Rewrite the helpers in terms of floatN_compare Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 149 --- target/i386/ops_sse_header.h | 47 --- target/i386/tcg/translate.c | 49 +--- 3 files changed, 177 insertions(+), 68 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 48cec40074..e48dfc2fc5 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1394,57 +1394,112 @@ void glue(helper_addsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #endif } -/* XXX: unordered */ -#define SSE_HELPER_CMP(name, F) \ -void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ -{ \ -d->ZMM_L(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ -d->ZMM_L(1) = F(32, d->ZMM_S(1), s->ZMM_S(1)); \ -d->ZMM_L(2) = F(32, d->ZMM_S(2), s->ZMM_S(2)); \ -d->ZMM_L(3) = F(32, d->ZMM_S(3), s->ZMM_S(3)); \ -} \ -\ -void helper_ ## name ## ss(CPUX86State *env, Reg *d, Reg *s)\ -{ \ -d->ZMM_L(0) = F(32, d->ZMM_S(0), s->ZMM_S(0)); \ -} \ -\ -void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)\ +#define SSE_HELPER_CMP_P(name, F, C)\ +void glue(helper_ ## name ## ps, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s)\ { \ -d->ZMM_Q(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ -d->ZMM_Q(1) = F(64, d->ZMM_D(1), s->ZMM_D(1)); \ +Reg *v = d; \ +d->ZMM_L(0) = F(32, C, v->ZMM_S(0), s->ZMM_S(0)); \ +d->ZMM_L(1) = F(32, C, v->ZMM_S(1), s->ZMM_S(1)); \ +d->ZMM_L(2) = F(32, C, v->ZMM_S(2), s->ZMM_S(2)); \ +d->ZMM_L(3) = F(32, C, v->ZMM_S(3), s->ZMM_S(3)); \ +YMM_ONLY( \ +d->ZMM_L(4) = F(32, C, v->ZMM_S(4), s->ZMM_S(4)); \ +d->ZMM_L(5) = F(32, C, v->ZMM_S(5), s->ZMM_S(5)); \ +d->ZMM_L(6) = F(32, C, v->ZMM_S(6), s->ZMM_S(6)); \ +d->ZMM_L(7) = F(32, C, v->ZMM_S(7), s->ZMM_S(7)); \ +) \ } \ \ -void helper_ ## name ## sd(CPUX86State *env, Reg *d, Reg *s)\ +void glue(helper_ ## name ## pd, SUFFIX)(CPUX86State *env, \ + Reg *d, Reg *s)\ { \ -d->ZMM_Q(0) = F(64, d->ZMM_D(0), s->ZMM_D(0)); \ -} - -#define FPU_CMPEQ(size, a, b) \ -(float ## size ## _eq_quiet(a, b, >sse_status) ? -1 : 0) -#define FPU_CMPLT(size, a, b) \ -(float ## size ## _lt(a, b, >sse_status) ? -1 : 0) -#define FPU_CMPLE(size, a, b) \ -(float ## size ## _le(a, b, >sse_status) ? -1 : 0) -#define FPU_CMPUNORD(size, a, b)\ -(float ## size ## _unordered_quiet(a, b, >sse_status) ? -1 : 0) -#define FPU_CMPNEQ(size, a, b) \ -(float ## size ## _eq_quiet(a, b, >sse_status) ? 0 : -1) -#define FPU_CMPNLT(size, a, b) \ -(float ## size ## _lt(a, b, >sse_status) ? 0 : -1) -#define FPU_CMPNLE(size, a, b) \ -(float ## size ## _le(a, b, >sse_status) ? 0 : -1) -#define FPU_CMPORD(size, a, b) \ -(float ## size ## _unordered_quiet(a, b, >sse_status) ? 0 : -1) - -SSE_HELPER_CMP(cmpeq, FPU_CMPEQ) -SSE_HELPER_CMP(cmplt, FPU_CMPLT) -SSE_HELPER_CMP(cmple, FPU_CMPLE) -SSE_HELPER_CMP(cmpunord, FPU_CMPUNORD) -SSE_HELPER_CMP(cmpneq, FPU_CMPNEQ) -SSE_HELPER_CMP(cmpnlt, FPU_CMPNLT) -SSE_HELPER_CMP(cmpnle, FPU_CMPNLE)
[PATCH v2 24/42] i386: Move 3DNOW decoder
Handle 3DNOW instructions early to avoid complicating the AVX logic. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 64f026c0af..6c40df61d4 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3297,6 +3297,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, is_xmm = 1; } } +if (sse_op.flags & SSE_OPF_3DNOW) { +if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { +goto illegal_op; +} +} /* simple MMX/SSE operation */ if (s->flags & HF_TS_MASK) { gen_exception(s, EXCP07_PREX, pc_start - s->cs_base); @@ -4761,21 +4766,20 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, rm = (modrm & 7); op2_offset = offsetof(CPUX86State,fpregs[rm].mmx); } +if (sse_op.flags & SSE_OPF_3DNOW) { +/* 3DNow! data insns */ +val = x86_ldub_code(env, s); +SSEFunc_0_epp op_3dnow = sse_op_table5[val]; +if (!op_3dnow) { +goto unknown_op; +} +tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); +tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); +op_3dnow(cpu_env, s->ptr0, s->ptr1); +return; +} } switch(b) { -case 0x0f: /* 3DNow! data insns */ -val = x86_ldub_code(env, s); -sse_fn_epp = sse_op_table5[val]; -if (!sse_fn_epp) { -goto unknown_op; -} -if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { -goto illegal_op; -} -tcg_gen_addi_ptr(s->ptr0, cpu_env, op1_offset); -tcg_gen_addi_ptr(s->ptr1, cpu_env, op2_offset); -sse_fn_epp(cpu_env, s->ptr0, s->ptr1); -break; case 0x70: /* pshufx insn */ case 0xc6: /* pshufx insn */ val = x86_ldub_code(env, s); -- 2.36.0
[PATCH v2 22/42] i386: Update ops_sse_helper.h ready for 256 bit AVX
Update ops_sse_helper.h ready for 256 bit AVX helpers Signed-off-by: Paul Brook --- target/i386/ops_sse_header.h | 67 +--- 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 203afbb5a1..63b63eb532 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -105,7 +105,7 @@ SSE_HELPER_L(pcmpeql, FCMPEQ) SSE_HELPER_W(pmullw, FMULLW) #if SHIFT == 0 -DEF_HELPER_3(glue(pmulhrw, SUFFIX), FMULHRW) +DEF_HELPER_3(glue(pmulhrw, SUFFIX), void, env, Reg, Reg) #endif SSE_HELPER_W(pmulhuw, FMULHUW) SSE_HELPER_W(pmulhw, FMULHW) @@ -137,23 +137,39 @@ DEF_HELPER_3(glue(pshufhw, SUFFIX), void, Reg, Reg, int) /* FPU ops */ /* XXX: not accurate */ -DEF_HELPER_3(glue(shufps, SUFFIX), void, Reg, Reg, int) -DEF_HELPER_3(glue(shufpd, SUFFIX), void, Reg, Reg, int) +#define SSE_HELPER_P4(name) \ +DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ +DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) + +#define SSE_HELPER_P3(name, ...)\ +DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ +DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) -#define SSE_HELPER_S(name, F)\ -DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg)\ -DEF_HELPER_3(name ## ss, void, env, Reg, Reg)\ -DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg)\ +#if SHIFT == 1 +#define SSE_HELPER_S4(name) \ +SSE_HELPER_P4(name) \ +DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ DEF_HELPER_3(name ## sd, void, env, Reg, Reg) +#define SSE_HELPER_S3(name) \ +SSE_HELPER_P3(name) \ +DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ +DEF_HELPER_3(name ## sd, void, env, Reg, Reg) +#else +#define SSE_HELPER_S4(name, ...) SSE_HELPER_P4(name) +#define SSE_HELPER_S3(name, ...) SSE_HELPER_P3(name) +#endif + +DEF_HELPER_3(glue(shufps, SUFFIX), void, Reg, Reg, int) +DEF_HELPER_3(glue(shufpd, SUFFIX), void, Reg, Reg, int) -SSE_HELPER_S(add, FPU_ADD) -SSE_HELPER_S(sub, FPU_SUB) -SSE_HELPER_S(mul, FPU_MUL) -SSE_HELPER_S(div, FPU_DIV) -SSE_HELPER_S(min, FPU_MIN) -SSE_HELPER_S(max, FPU_MAX) -SSE_HELPER_S(sqrt, FPU_SQRT) +SSE_HELPER_S4(add) +SSE_HELPER_S4(sub) +SSE_HELPER_S4(mul) +SSE_HELPER_S4(div) +SSE_HELPER_S4(min) +SSE_HELPER_S4(max) +SSE_HELPER_S3(sqrt) DEF_HELPER_3(glue(cvtps2pd, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(cvtpd2ps, SUFFIX), void, env, Reg, Reg) @@ -208,18 +224,12 @@ DEF_HELPER_4(extrq_i, void, env, ZMMReg, int, int) DEF_HELPER_3(insertq_r, void, env, ZMMReg, ZMMReg) DEF_HELPER_4(insertq_i, void, env, ZMMReg, int, int) #endif -DEF_HELPER_3(glue(haddps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(haddpd, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(hsubps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(hsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(addsubps, SUFFIX), void, env, ZMMReg, ZMMReg) -DEF_HELPER_3(glue(addsubpd, SUFFIX), void, env, ZMMReg, ZMMReg) - -#define SSE_HELPER_CMP(name, F) \ -DEF_HELPER_3(glue(name ## ps, SUFFIX), void, env, Reg, Reg) \ -DEF_HELPER_3(name ## ss, void, env, Reg, Reg) \ -DEF_HELPER_3(glue(name ## pd, SUFFIX), void, env, Reg, Reg) \ -DEF_HELPER_3(name ## sd, void, env, Reg, Reg) + +SSE_HELPER_P4(hadd) +SSE_HELPER_P4(hsub) +SSE_HELPER_P4(addsub) + +#define SSE_HELPER_CMP(name, F) SSE_HELPER_S4(name) SSE_HELPER_CMP(cmpeq, FPU_CMPEQ) SSE_HELPER_CMP(cmplt, FPU_CMPLT) @@ -381,6 +391,9 @@ DEF_HELPER_4(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, i32) #undef SSE_HELPER_W #undef SSE_HELPER_L #undef SSE_HELPER_Q -#undef SSE_HELPER_S +#undef SSE_HELPER_S3 +#undef SSE_HELPER_S4 +#undef SSE_HELPER_P3 +#undef SSE_HELPER_P4 #undef SSE_HELPER_CMP #undef UNPCK_OP -- 2.36.0
[PATCH v2 18/42] i386: Misc AVX helper prep
Fixup various vector helpers that either trivially exten to 256 bit, or don't have 256 bit variants. No functional changes to existing helpers Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 159 -- 1 file changed, 139 insertions(+), 20 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index d128af6cc8..3202c00572 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -641,6 +641,7 @@ void glue(helper_psadbw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #endif } +#if SHIFT < 2 void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, target_ulong a0) { @@ -652,6 +653,7 @@ void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, } } } +#endif void glue(helper_movl_mm_T0, SUFFIX)(Reg *d, uint32_t val) { @@ -882,6 +884,13 @@ void glue(helper_cvtps2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) s0 = s->ZMM_S(0); s1 = s->ZMM_S(1); +#if SHIFT == 2 +float32 s2, s3; +s2 = s->ZMM_S(2); +s3 = s->ZMM_S(3); +d->ZMM_D(2) = float32_to_float64(s2, >sse_status); +d->ZMM_D(3) = float32_to_float64(s3, >sse_status); +#endif d->ZMM_D(0) = float32_to_float64(s0, >sse_status); d->ZMM_D(1) = float32_to_float64(s1, >sse_status); } @@ -890,9 +899,17 @@ void glue(helper_cvtpd2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), >sse_status); d->ZMM_S(1) = float64_to_float32(s->ZMM_D(1), >sse_status); +#if SHIFT == 2 +d->ZMM_S(2) = float64_to_float32(s->ZMM_D(2), >sse_status); +d->ZMM_S(3) = float64_to_float32(s->ZMM_D(3), >sse_status); +d->Q(2) = 0; +d->Q(3) = 0; +#else d->Q(1) = 0; +#endif } +#if SHIFT == 1 void helper_cvtss2sd(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_D(0) = float32_to_float64(s->ZMM_S(0), >sse_status); @@ -902,6 +919,7 @@ void helper_cvtsd2ss(CPUX86State *env, Reg *d, Reg *s) { d->ZMM_S(0) = float64_to_float32(s->ZMM_D(0), >sse_status); } +#endif /* integer to float */ void glue(helper_cvtdq2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) @@ -910,6 +928,12 @@ void glue(helper_cvtdq2ps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) d->ZMM_S(1) = int32_to_float32(s->ZMM_L(1), >sse_status); d->ZMM_S(2) = int32_to_float32(s->ZMM_L(2), >sse_status); d->ZMM_S(3) = int32_to_float32(s->ZMM_L(3), >sse_status); +#if SHIFT == 2 +d->ZMM_S(4) = int32_to_float32(s->ZMM_L(4), >sse_status); +d->ZMM_S(5) = int32_to_float32(s->ZMM_L(5), >sse_status); +d->ZMM_S(6) = int32_to_float32(s->ZMM_L(6), >sse_status); +d->ZMM_S(7) = int32_to_float32(s->ZMM_L(7), >sse_status); +#endif } void glue(helper_cvtdq2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) @@ -918,10 +942,18 @@ void glue(helper_cvtdq2pd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) l0 = (int32_t)s->ZMM_L(0); l1 = (int32_t)s->ZMM_L(1); +#if SHIFT == 2 +int32_t l2, l3; +l2 = (int32_t)s->ZMM_L(2); +l3 = (int32_t)s->ZMM_L(3); +d->ZMM_D(2) = int32_to_float64(l2, >sse_status); +d->ZMM_D(3) = int32_to_float64(l3, >sse_status); +#endif d->ZMM_D(0) = int32_to_float64(l0, >sse_status); d->ZMM_D(1) = int32_to_float64(l1, >sse_status); } +#if SHIFT == 1 void helper_cvtpi2ps(CPUX86State *env, ZMMReg *d, MMXReg *s) { d->ZMM_S(0) = int32_to_float32(s->MMX_L(0), >sse_status); @@ -956,8 +988,11 @@ void helper_cvtsq2sd(CPUX86State *env, ZMMReg *d, uint64_t val) } #endif +#endif + /* float to integer */ +#if SHIFT == 1 /* * x86 mandates that we return the indefinite integer value for the result * of any float-to-integer conversion that raises the 'invalid' exception. @@ -988,6 +1023,7 @@ WRAP_FLOATCONV(int64_t, float32_to_int64, float32, INT64_MIN) WRAP_FLOATCONV(int64_t, float32_to_int64_round_to_zero, float32, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64, float64, INT64_MIN) WRAP_FLOATCONV(int64_t, float64_to_int64_round_to_zero, float64, INT64_MIN) +#endif void glue(helper_cvtps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { @@ -995,15 +1031,29 @@ void glue(helper_cvtps2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) d->ZMM_L(1) = x86_float32_to_int32(s->ZMM_S(1), >sse_status); d->ZMM_L(2) = x86_float32_to_int32(s->ZMM_S(2), >sse_status); d->ZMM_L(3) = x86_float32_to_int32(s->ZMM_S(3), >sse_status); +#if SHIFT == 2 +d->ZMM_L(4) = x86_float32_to_int32(s->ZMM_S(4), >sse_status); +d->ZMM_L(5) = x86_float32_to_int32(s->ZMM_S(5), >sse_status); +d->ZMM_L(6) = x86_float32_to_int32(s->ZMM_S(6), >sse_status); +d->ZMM_L(7) = x86_float32_to_int32(s->ZMM_S(7), >sse_status); +#endif } void glue(helper_cvtpd2dq, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) { d->ZMM_L(0) = x86_float64_to_int32(s->ZMM_D(0), >sse_status); d->ZMM_L(1) = x86_float64_to_int32(s->ZMM_D(1), >sse_status); +#if SHIFT == 2 +d->ZMM_L(2) =
[PATCH v2 28/42] i386: Implement VZEROALL and VZEROUPPER
The use the same opcode as EMMS, which I guess makes some sort of sense. Fairly strightforward other than that. If we were wanting to optimize out gen_clear_ymmh then this would be one of the starting points. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 48 target/i386/ops_sse_header.h | 9 +++ target/i386/tcg/translate.c | 26 --- 3 files changed, 80 insertions(+), 3 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index ad3312d353..a1f50f0c8b 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3071,6 +3071,54 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #endif #endif +#if SHIFT == 2 +void helper_vzeroall(CPUX86State *env) +{ +int i; + +for (i = 0; i < 8; i++) { +env->xmm_regs[i].ZMM_Q(0) = 0; +env->xmm_regs[i].ZMM_Q(1) = 0; +env->xmm_regs[i].ZMM_Q(2) = 0; +env->xmm_regs[i].ZMM_Q(3) = 0; +} +} + +void helper_vzeroupper(CPUX86State *env) +{ +int i; + +for (i = 0; i < 8; i++) { +env->xmm_regs[i].ZMM_Q(2) = 0; +env->xmm_regs[i].ZMM_Q(3) = 0; +} +} + +#ifdef TARGET_X86_64 +void helper_vzeroall_hi8(CPUX86State *env) +{ +int i; + +for (i = 8; i < 16; i++) { +env->xmm_regs[i].ZMM_Q(0) = 0; +env->xmm_regs[i].ZMM_Q(1) = 0; +env->xmm_regs[i].ZMM_Q(2) = 0; +env->xmm_regs[i].ZMM_Q(3) = 0; +} +} + +void helper_vzeroupper_hi8(CPUX86State *env) +{ +int i; + +for (i = 8; i < 16; i++) { +env->xmm_regs[i].ZMM_Q(2) = 0; +env->xmm_regs[i].ZMM_Q(3) = 0; +} +} +#endif +#endif + #undef SSE_HELPER_S #undef SHIFT diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index cfcfba154b..48f0945917 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -411,6 +411,15 @@ DEF_HELPER_4(glue(aeskeygenassist, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif +#if SHIFT == 2 +DEF_HELPER_1(vzeroall, void, env) +DEF_HELPER_1(vzeroupper, void, env) +#ifdef TARGET_X86_64 +DEF_HELPER_1(vzeroall_hi8, void, env) +DEF_HELPER_1(vzeroupper_hi8, void, env) +#endif +#endif + #undef SHIFT #undef Reg #undef SUFFIX diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index bcd6d47fd0..ba70aeb039 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3455,9 +3455,29 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, return; } if (b == 0x77) { -/* emms */ -gen_helper_emms(cpu_env); -return; +if (s->prefix & PREFIX_VEX) { +CHECK_AVX(s); +if (s->vex_l) { +gen_helper_vzeroall(cpu_env); +#ifdef TARGET_X86_64 +if (CODE64(s)) { +gen_helper_vzeroall_hi8(cpu_env); +} +#endif +} else { +gen_helper_vzeroupper(cpu_env); +#ifdef TARGET_X86_64 +if (CODE64(s)) { +gen_helper_vzeroupper_hi8(cpu_env); +} +#endif +} +return; +} else { +/* emms */ +gen_helper_emms(cpu_env); +return; +} } /* prepare MMX state (XXX: optimize by storing fptt and fptags in the static cpu state) */ -- 2.36.0
[PATCH v2 13/42] i386: Destructive vector helpers for AVX
These helpers need to take special care to avoid overwriting source values before the wole result has been calculated. Currently they use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 707 ++ 1 file changed, 437 insertions(+), 270 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index d0424140d9..c645d2ddbf 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -680,71 +680,85 @@ void glue(helper_movq_mm_T0, SUFFIX)(Reg *d, uint64_t val) } #endif +#define SHUFFLE4(F, a, b, offset) do { \ +r0 = a->F((order & 3) + offset);\ +r1 = a->F(((order >> 2) & 3) + offset); \ +r2 = b->F(((order >> 4) & 3) + offset); \ +r3 = b->F(((order >> 6) & 3) + offset); \ +d->F(offset) = r0; \ +d->F(offset + 1) = r1; \ +d->F(offset + 2) = r2; \ +d->F(offset + 3) = r3; \ +} while (0) + #if SHIFT == 0 void glue(helper_pshufw, SUFFIX)(Reg *d, Reg *s, int order) { -Reg r; +uint16_t r0, r1, r2, r3; -r.W(0) = s->W(order & 3); -r.W(1) = s->W((order >> 2) & 3); -r.W(2) = s->W((order >> 4) & 3); -r.W(3) = s->W((order >> 6) & 3); -MOVE(*d, r); +SHUFFLE4(W, s, s, 0); } #else void helper_shufps(Reg *d, Reg *s, int order) { -Reg r; +Reg *v = d; +uint32_t r0, r1, r2, r3; -r.L(0) = d->L(order & 3); -r.L(1) = d->L((order >> 2) & 3); -r.L(2) = s->L((order >> 4) & 3); -r.L(3) = s->L((order >> 6) & 3); -MOVE(*d, r); +SHUFFLE4(L, v, s, 0); +#if SHIFT == 2 +SHUFFLE4(L, v, s, 4); +#endif } void helper_shufpd(Reg *d, Reg *s, int order) { -Reg r; +Reg *v = d; +uint64_t r0, r1; -r.Q(0) = d->Q(order & 1); -r.Q(1) = s->Q((order >> 1) & 1); -MOVE(*d, r); +r0 = v->Q(order & 1); +r1 = s->Q((order >> 1) & 1); +d->Q(0) = r0; +d->Q(1) = r1; +#if SHIFT == 2 +r0 = v->Q(((order >> 2) & 1) + 2); +r1 = s->Q(((order >> 3) & 1) + 2); +d->Q(2) = r0; +d->Q(3) = r1; +#endif } void glue(helper_pshufd, SUFFIX)(Reg *d, Reg *s, int order) { -Reg r; +uint32_t r0, r1, r2, r3; -r.L(0) = s->L(order & 3); -r.L(1) = s->L((order >> 2) & 3); -r.L(2) = s->L((order >> 4) & 3); -r.L(3) = s->L((order >> 6) & 3); -MOVE(*d, r); +SHUFFLE4(L, s, s, 0); +#if SHIFT == 2 +SHUFFLE4(L, s, s, 4); +#endif } void glue(helper_pshuflw, SUFFIX)(Reg *d, Reg *s, int order) { -Reg r; +uint16_t r0, r1, r2, r3; -r.W(0) = s->W(order & 3); -r.W(1) = s->W((order >> 2) & 3); -r.W(2) = s->W((order >> 4) & 3); -r.W(3) = s->W((order >> 6) & 3); -r.Q(1) = s->Q(1); -MOVE(*d, r); +SHUFFLE4(W, s, s, 0); +d->Q(1) = s->Q(1); +#if SHIFT == 2 +SHUFFLE4(W, s, s, 8); +d->Q(3) = s->Q(3); +#endif } void glue(helper_pshufhw, SUFFIX)(Reg *d, Reg *s, int order) { -Reg r; +uint16_t r0, r1, r2, r3; -r.Q(0) = s->Q(0); -r.W(4) = s->W(4 + (order & 3)); -r.W(5) = s->W(4 + ((order >> 2) & 3)); -r.W(6) = s->W(4 + ((order >> 4) & 3)); -r.W(7) = s->W(4 + ((order >> 6) & 3)); -MOVE(*d, r); +d->Q(0) = s->Q(0); +SHUFFLE4(W, s, s, 4); +#if SHIFT == 2 +d->Q(2) = s->Q(2); +SHUFFLE4(W, s, s, 12); +#endif } #endif @@ -1320,156 +1334,190 @@ uint32_t glue(helper_pmovmskb, SUFFIX)(CPUX86State *env, Reg *s) return val; } -void glue(helper_packsswb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ -Reg r; - -r.B(0) = satsb((int16_t)d->W(0)); -r.B(1) = satsb((int16_t)d->W(1)); -r.B(2) = satsb((int16_t)d->W(2)); -r.B(3) = satsb((int16_t)d->W(3)); -#if SHIFT == 1 -r.B(4) = satsb((int16_t)d->W(4)); -r.B(5) = satsb((int16_t)d->W(5)); -r.B(6) = satsb((int16_t)d->W(6)); -r.B(7) = satsb((int16_t)d->W(7)); -#endif -r.B((4 << SHIFT) + 0) = satsb((int16_t)s->W(0)); -r.B((4 << SHIFT) + 1) = satsb((int16_t)s->W(1)); -r.B((4 << SHIFT) + 2) = satsb((int16_t)s->W(2)); -r.B((4 << SHIFT) + 3) = satsb((int16_t)s->W(3)); -#if SHIFT == 1 -r.B(12) = satsb((int16_t)s->W(4)); -r.B(13) = satsb((int16_t)s->W(5)); -r.B(14) = satsb((int16_t)s->W(6)); -r.B(15) = satsb((int16_t)s->W(7)); -#endif -MOVE(*d, r); -} - -void glue(helper_packuswb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -{ -Reg r; - -r.B(0) = satub((int16_t)d->W(0)); -r.B(1) = satub((int16_t)d->W(1)); -r.B(2) = satub((int16_t)d->W(2)); -r.B(3) = satub((int16_t)d->W(3)); -#if SHIFT == 1 -r.B(4) = satub((int16_t)d->W(4)); -r.B(5) = satub((int16_t)d->W(5)); -r.B(6) = satub((int16_t)d->W(6)); -r.B(7) = satub((int16_t)d->W(7)); -#endif -r.B((4 << SHIFT) + 0) =
[PATCH v2 40/42] Enable all x86-64 cpu features in user mode
We don't have any migration concerns for usermode emulation, so we may as well enable all available CPU features by default. Signed-off-by: Paul Brook --- linux-user/x86_64/target_elf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/x86_64/target_elf.h b/linux-user/x86_64/target_elf.h index 7b76a90de8..3f628f8d66 100644 --- a/linux-user/x86_64/target_elf.h +++ b/linux-user/x86_64/target_elf.h @@ -9,6 +9,6 @@ #define X86_64_TARGET_ELF_H static inline const char *cpu_get_model(uint32_t eflags) { -return "qemu64"; +return "max"; } #endif -- 2.36.0
[PATCH v2 29/42] i386: Implement VBROADCAST
The catch here is that these are whole vector operations (not independent 128 bit lanes). We abuse the SSE_OPF_SCALAR flag to select the memory operand width appropriately. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 51 target/i386/ops_sse_header.h | 8 ++ target/i386/tcg/translate.c | 42 - 3 files changed, 100 insertions(+), 1 deletion(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index a1f50f0c8b..4115c9a257 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3071,7 +3071,57 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #endif #endif +#if SHIFT >= 1 +void glue(helper_vbroadcastb, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +uint8_t val = s->B(0); +int i; + +for (i = 0; i < 16 * SHIFT; i++) { +d->B(i) = val; +} +} + +void glue(helper_vbroadcastw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +uint16_t val = s->W(0); +int i; + +for (i = 0; i < 8 * SHIFT; i++) { +d->W(i) = val; +} +} + +void glue(helper_vbroadcastl, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +uint32_t val = s->L(0); +int i; + +for (i = 0; i < 8 * SHIFT; i++) { +d->L(i) = val; +} +} + +void glue(helper_vbroadcastq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +uint64_t val = s->Q(0); +d->Q(0) = val; +d->Q(1) = val; #if SHIFT == 2 +d->Q(2) = val; +d->Q(3) = val; +#endif +} + +#if SHIFT == 2 +void glue(helper_vbroadcastdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +{ +d->Q(0) = s->Q(0); +d->Q(1) = s->Q(1); +d->Q(2) = s->Q(0); +d->Q(3) = s->Q(1); +} + void helper_vzeroall(CPUX86State *env) { int i; @@ -3118,6 +3168,7 @@ void helper_vzeroupper_hi8(CPUX86State *env) } #endif #endif +#endif #undef SSE_HELPER_S diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 48f0945917..51e02cd4fa 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -411,7 +411,14 @@ DEF_HELPER_4(glue(aeskeygenassist, SUFFIX), void, env, Reg, Reg, i32) DEF_HELPER_5(glue(pclmulqdq, SUFFIX), void, env, Reg, Reg, Reg, i32) #endif +/* AVX helpers */ +#if SHIFT >= 1 +DEF_HELPER_3(glue(vbroadcastb, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(vbroadcastw, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(vbroadcastl, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_3(glue(vbroadcastq, SUFFIX), void, env, Reg, Reg) #if SHIFT == 2 +DEF_HELPER_3(glue(vbroadcastdq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_1(vzeroall, void, env) DEF_HELPER_1(vzeroupper, void, env) #ifdef TARGET_X86_64 @@ -419,6 +426,7 @@ DEF_HELPER_1(vzeroall_hi8, void, env) DEF_HELPER_1(vzeroupper_hi8, void, env) #endif #endif +#endif #undef SHIFT #undef Reg diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index ba70aeb039..59ab1dc562 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3255,6 +3255,11 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x14] = BLENDV_OP(blendvps, SSE41, 0), [0x15] = BLENDV_OP(blendvpd, SSE41, 0), [0x17] = CMP_OP(ptest, SSE41), +/* TODO:Some vbroadcast variants require AVX2 */ +[0x18] = UNARY_OP(vbroadcastl, AVX, SSE_OPF_SCALAR), /* vbroadcastss */ +[0x19] = UNARY_OP(vbroadcastq, AVX, SSE_OPF_SCALAR), /* vbroadcastsd */ +#define gen_helper_vbroadcastdq_xmm NULL +[0x1a] = UNARY_OP(vbroadcastdq, AVX, SSE_OPF_SCALAR), /* vbroadcastf128 */ [0x1c] = UNARY_OP_MMX(pabsb, SSSE3), [0x1d] = UNARY_OP_MMX(pabsw, SSSE3), [0x1e] = UNARY_OP_MMX(pabsd, SSSE3), @@ -3286,6 +3291,16 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x40] = BINARY_OP(pmulld, SSE41, SSE_OPF_MMX), #define gen_helper_phminposuw_ymm NULL [0x41] = UNARY_OP(phminposuw, SSE41, 0), +/* vpbroadcastd */ +[0x58] = UNARY_OP(vbroadcastl, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), +/* vpbroadcastq */ +[0x59] = UNARY_OP(vbroadcastq, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), +/* vbroadcasti128 */ +[0x5a] = UNARY_OP(vbroadcastdq, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), +/* vpbroadcastb */ +[0x78] = UNARY_OP(vbroadcastb, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), +/* vpbroadcastw */ +[0x79] = UNARY_OP(vbroadcastw, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), #define gen_helper_aesimc_ymm NULL [0xdb] = UNARY_OP(aesimc, AES, 0), [0xdc] = BINARY_OP(aesenc, AES, 0), @@ -4323,6 +4338,24 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, op2_offset = offsetof(CPUX86State, xmm_t0); gen_lea_modrm(env, s, modrm); switch (b) { +case 0x78: /* vpbroadcastb */ +size = 8; +break; +case 0x79: /* vpbroadcastw */ +size = 16; +break; +
[PATCH v2 11/42] i386: Rewrite simple integer vector helpers
Rewrite the "simple" vector integer helpers in preperation for AVX support. While the current code is able to use the same prototype for unary (a = F(b)) and binary (a = F(b, c)) operations, future changes will cause them to diverge. No functional changes to existing helpers Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 180 -- 1 file changed, 137 insertions(+), 43 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 9297c96d04..bb9cbf9ead 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -275,61 +275,148 @@ void glue(helper_pslldq, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) } #endif -#define SSE_HELPER_B(name, F) \ +#define SSE_HELPER_1(name, elem, num, F) \ void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ { \ -d->B(0) = F(d->B(0), s->B(0)); \ -d->B(1) = F(d->B(1), s->B(1)); \ -d->B(2) = F(d->B(2), s->B(2)); \ -d->B(3) = F(d->B(3), s->B(3)); \ -d->B(4) = F(d->B(4), s->B(4)); \ -d->B(5) = F(d->B(5), s->B(5)); \ -d->B(6) = F(d->B(6), s->B(6)); \ -d->B(7) = F(d->B(7), s->B(7)); \ +d->elem(0) = F(s->elem(0)); \ +d->elem(1) = F(s->elem(1)); \ +if ((num << SHIFT) > 2) { \ +d->elem(2) = F(s->elem(2)); \ +d->elem(3) = F(s->elem(3)); \ +} \ +if ((num << SHIFT) > 4) { \ +d->elem(4) = F(s->elem(4)); \ +d->elem(5) = F(s->elem(5)); \ +d->elem(6) = F(s->elem(6)); \ +d->elem(7) = F(s->elem(7)); \ +} \ +if ((num << SHIFT) > 8) { \ +d->elem(8) = F(s->elem(8)); \ +d->elem(9) = F(s->elem(9)); \ +d->elem(10) = F(s->elem(10)); \ +d->elem(11) = F(s->elem(11)); \ +d->elem(12) = F(s->elem(12)); \ +d->elem(13) = F(s->elem(13)); \ +d->elem(14) = F(s->elem(14)); \ +d->elem(15) = F(s->elem(15)); \ +} \ +if ((num << SHIFT) > 16) { \ +d->elem(16) = F(s->elem(16)); \ +d->elem(17) = F(s->elem(17)); \ +d->elem(18) = F(s->elem(18)); \ +d->elem(19) = F(s->elem(19)); \ +d->elem(20) = F(s->elem(20)); \ +d->elem(21) = F(s->elem(21)); \ +d->elem(22) = F(s->elem(22)); \ +d->elem(23) = F(s->elem(23)); \ +d->elem(24) = F(s->elem(24)); \ +d->elem(25) = F(s->elem(25)); \ +d->elem(26) = F(s->elem(26)); \ +d->elem(27) = F(s->elem(27)); \ +d->elem(28) = F(s->elem(28)); \ +d->elem(29) = F(s->elem(29)); \ +d->elem(30) = F(s->elem(30)); \ +d->elem(31) = F(s->elem(31)); \ +} \ +} + +#define SSE_HELPER_B(name, F) \ +void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) \ +{ \ +Reg *v = d; \ +d->B(0) = F(v->B(0), s->B(0)); \ +d->B(1) = F(v->B(1), s->B(1)); \ +d->B(2) = F(v->B(2), s->B(2)); \ +d->B(3) = F(v->B(3), s->B(3)); \ +d->B(4) = F(v->B(4), s->B(4)); \ +d->B(5) = F(v->B(5), s->B(5)); \ +d->B(6) = F(v->B(6), s->B(6)); \ +d->B(7) = F(v->B(7), s->B(7)); \
[PATCH v2 38/42] i386: Implement VPBLENDD
This is semantically eqivalent to VBLENDPS. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 1 + 1 file changed, 1 insertion(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 95ecdea8fe..73f3842c36 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3353,6 +3353,7 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { #define gen_helper_vpermq_xmm NULL [0x00] = UNARY_OP(vpermq, AVX, SSE_OPF_AVX2), [0x01] = UNARY_OP(vpermq, AVX, SSE_OPF_AVX2), /* vpermpd */ +[0x02] = BINARY_OP(blendps, AVX, SSE_OPF_AVX2), /* vpblendd */ [0x04] = UNARY_OP(vpermilps_imm, AVX, 0), [0x05] = UNARY_OP(vpermilpd_imm, AVX, 0), #define gen_helper_vpermdq_xmm NULL -- 2.36.0
[PATCH v2 17/42] i386: Destructive FP helpers for AVX
Perpare the horizontal atithmetic vector helpers for AVX These currently use a dummy Reg typed variable to store the result then assign the whole register. This will cause 128 bit operations to corrupt the upper half of the register, so replace it with explicit temporaries and element assignments. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 96 +++ 1 file changed, 70 insertions(+), 26 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 4137e6e1fa..d128af6cc8 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1196,44 +1196,88 @@ void helper_insertq_i(CPUX86State *env, ZMMReg *d, int index, int length) d->ZMM_Q(0) = helper_insertq(d->ZMM_Q(0), index, length); } -void glue(helper_haddps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_haddps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { -ZMMReg r; - -r.ZMM_S(0) = float32_add(d->ZMM_S(0), d->ZMM_S(1), >sse_status); -r.ZMM_S(1) = float32_add(d->ZMM_S(2), d->ZMM_S(3), >sse_status); -r.ZMM_S(2) = float32_add(s->ZMM_S(0), s->ZMM_S(1), >sse_status); -r.ZMM_S(3) = float32_add(s->ZMM_S(2), s->ZMM_S(3), >sse_status); -MOVE(*d, r); +Reg *v = d; +float32 r0, r1, r2, r3; + +r0 = float32_add(v->ZMM_S(0), v->ZMM_S(1), >sse_status); +r1 = float32_add(v->ZMM_S(2), v->ZMM_S(3), >sse_status); +r2 = float32_add(s->ZMM_S(0), s->ZMM_S(1), >sse_status); +r3 = float32_add(s->ZMM_S(2), s->ZMM_S(3), >sse_status); +d->ZMM_S(0) = r0; +d->ZMM_S(1) = r1; +d->ZMM_S(2) = r2; +d->ZMM_S(3) = r3; +#if SHIFT == 2 +r0 = float32_add(v->ZMM_S(4), v->ZMM_S(5), >sse_status); +r1 = float32_add(v->ZMM_S(6), v->ZMM_S(7), >sse_status); +r2 = float32_add(s->ZMM_S(4), s->ZMM_S(5), >sse_status); +r3 = float32_add(s->ZMM_S(6), s->ZMM_S(7), >sse_status); +d->ZMM_S(4) = r0; +d->ZMM_S(5) = r1; +d->ZMM_S(6) = r2; +d->ZMM_S(7) = r3; +#endif } -void glue(helper_haddpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_haddpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { -ZMMReg r; +Reg *v = d; +float64 r0, r1; -r.ZMM_D(0) = float64_add(d->ZMM_D(0), d->ZMM_D(1), >sse_status); -r.ZMM_D(1) = float64_add(s->ZMM_D(0), s->ZMM_D(1), >sse_status); -MOVE(*d, r); +r0 = float64_add(v->ZMM_D(0), v->ZMM_D(1), >sse_status); +r1 = float64_add(s->ZMM_D(0), s->ZMM_D(1), >sse_status); +d->ZMM_D(0) = r0; +d->ZMM_D(1) = r1; +#if SHIFT == 2 +r0 = float64_add(v->ZMM_D(2), v->ZMM_D(3), >sse_status); +r1 = float64_add(s->ZMM_D(2), s->ZMM_D(3), >sse_status); +d->ZMM_D(2) = r0; +d->ZMM_D(3) = r1; +#endif } -void glue(helper_hsubps, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_hsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { -ZMMReg r; - -r.ZMM_S(0) = float32_sub(d->ZMM_S(0), d->ZMM_S(1), >sse_status); -r.ZMM_S(1) = float32_sub(d->ZMM_S(2), d->ZMM_S(3), >sse_status); -r.ZMM_S(2) = float32_sub(s->ZMM_S(0), s->ZMM_S(1), >sse_status); -r.ZMM_S(3) = float32_sub(s->ZMM_S(2), s->ZMM_S(3), >sse_status); -MOVE(*d, r); +Reg *v = d; +float32 r0, r1, r2, r3; + +r0 = float32_sub(v->ZMM_S(0), v->ZMM_S(1), >sse_status); +r1 = float32_sub(v->ZMM_S(2), v->ZMM_S(3), >sse_status); +r2 = float32_sub(s->ZMM_S(0), s->ZMM_S(1), >sse_status); +r3 = float32_sub(s->ZMM_S(2), s->ZMM_S(3), >sse_status); +d->ZMM_S(0) = r0; +d->ZMM_S(1) = r1; +d->ZMM_S(2) = r2; +d->ZMM_S(3) = r3; +#if SHIFT == 2 +r0 = float32_sub(v->ZMM_S(4), v->ZMM_S(5), >sse_status); +r1 = float32_sub(v->ZMM_S(6), v->ZMM_S(7), >sse_status); +r2 = float32_sub(s->ZMM_S(4), s->ZMM_S(5), >sse_status); +r3 = float32_sub(s->ZMM_S(6), s->ZMM_S(7), >sse_status); +d->ZMM_S(4) = r0; +d->ZMM_S(5) = r1; +d->ZMM_S(6) = r2; +d->ZMM_S(7) = r3; +#endif } -void glue(helper_hsubpd, SUFFIX)(CPUX86State *env, ZMMReg *d, ZMMReg *s) +void glue(helper_hsubpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { -ZMMReg r; +Reg *v = d; +float64 r0, r1; -r.ZMM_D(0) = float64_sub(d->ZMM_D(0), d->ZMM_D(1), >sse_status); -r.ZMM_D(1) = float64_sub(s->ZMM_D(0), s->ZMM_D(1), >sse_status); -MOVE(*d, r); +r0 = float64_sub(v->ZMM_D(0), v->ZMM_D(1), >sse_status); +r1 = float64_sub(s->ZMM_D(0), s->ZMM_D(1), >sse_status); +d->ZMM_D(0) = r0; +d->ZMM_D(1) = r1; +#if SHIFT == 2 +r0 = float64_sub(v->ZMM_D(2), v->ZMM_D(3), >sse_status); +r1 = float64_sub(s->ZMM_D(2), s->ZMM_D(3), >sse_status); +d->ZMM_D(2) = r0; +d->ZMM_D(3) = r1; +#endif } void glue(helper_addsubps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) -- 2.36.0
[PATCH v2 30/42] i386: Implement VPERMIL
Some potentially surprising details when comparing vpermilpd v.s. vpermilps, but overall pretty straightforward. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 82 target/i386/ops_sse_header.h | 4 ++ target/i386/tcg/translate.c | 4 ++ 3 files changed, 90 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 4115c9a257..9b92b9790a 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3113,6 +3113,88 @@ void glue(helper_vbroadcastq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) #endif } +void glue(helper_vpermilpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ +uint64_t r0, r1; + +r0 = v->Q((s->Q(0) >> 1) & 1); +r1 = v->Q((s->Q(1) >> 1) & 1); +d->Q(0) = r0; +d->Q(1) = r1; +#if SHIFT == 2 +r0 = v->Q(((s->Q(2) >> 1) & 1) + 2); +r1 = v->Q(((s->Q(3) >> 1) & 1) + 2); +d->Q(2) = r0; +d->Q(3) = r1; +#endif +} + +void glue(helper_vpermilps, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ +uint32_t r0, r1, r2, r3; + +r0 = v->L(s->L(0) & 3); +r1 = v->L(s->L(1) & 3); +r2 = v->L(s->L(2) & 3); +r3 = v->L(s->L(3) & 3); +d->L(0) = r0; +d->L(1) = r1; +d->L(2) = r2; +d->L(3) = r3; +#if SHIFT == 2 +r0 = v->L((s->L(4) & 3) + 4); +r1 = v->L((s->L(5) & 3) + 4); +r2 = v->L((s->L(6) & 3) + 4); +r3 = v->L((s->L(7) & 3) + 4); +d->L(4) = r0; +d->L(5) = r1; +d->L(6) = r2; +d->L(7) = r3; +#endif +} + +void glue(helper_vpermilpd_imm, SUFFIX)(CPUX86State *env, +Reg *d, Reg *s, uint32_t order) +{ +uint64_t r0, r1; + +r0 = s->Q((order >> 0) & 1); +r1 = s->Q((order >> 1) & 1); +d->Q(0) = r0; +d->Q(1) = r1; +#if SHIFT == 2 +r0 = s->Q(((order >> 2) & 1) + 2); +r1 = s->Q(((order >> 3) & 1) + 2); +d->Q(2) = r0; +d->Q(3) = r1; +#endif +} + +void glue(helper_vpermilps_imm, SUFFIX)(CPUX86State *env, +Reg *d, Reg *s, uint32_t order) +{ +uint32_t r0, r1, r2, r3; + +r0 = s->L((order >> 0) & 3); +r1 = s->L((order >> 2) & 3); +r2 = s->L((order >> 4) & 3); +r3 = s->L((order >> 6) & 3); +d->L(0) = r0; +d->L(1) = r1; +d->L(2) = r2; +d->L(3) = r3; +#if SHIFT == 2 +r0 = s->L(((order >> 0) & 3) + 4); +r1 = s->L(((order >> 2) & 3) + 4); +r2 = s->L(((order >> 4) & 3) + 4); +r3 = s->L(((order >> 6) & 3) + 4); +d->L(4) = r0; +d->L(5) = r1; +d->L(6) = r2; +d->L(7) = r3; +#endif +} + #if SHIFT == 2 void glue(helper_vbroadcastdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 51e02cd4fa..c52169a030 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -417,6 +417,10 @@ DEF_HELPER_3(glue(vbroadcastb, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(vbroadcastw, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(vbroadcastl, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(vbroadcastq, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(vpermilpd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpermilps, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpermilpd_imm, SUFFIX), void, env, Reg, Reg, i32) +DEF_HELPER_4(glue(vpermilps_imm, SUFFIX), void, env, Reg, Reg, i32) #if SHIFT == 2 DEF_HELPER_3(glue(vbroadcastdq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_1(vzeroall, void, env) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 59ab1dc562..358c3ecb0b 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3251,6 +3251,8 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x09] = BINARY_OP_MMX(psignw, SSSE3), [0x0a] = BINARY_OP_MMX(psignd, SSSE3), [0x0b] = BINARY_OP_MMX(pmulhrsw, SSSE3), +[0x0c] = BINARY_OP(vpermilps, AVX, 0), +[0x0d] = BINARY_OP(vpermilpd, AVX, 0), [0x10] = BLENDV_OP(pblendvb, SSE41, SSE_OPF_MMX), [0x14] = BLENDV_OP(blendvps, SSE41, 0), [0x15] = BLENDV_OP(blendvpd, SSE41, 0), @@ -3311,6 +3313,8 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { /* prefix [66] 0f 3a */ static const struct SSEOpHelper_table7 sse_op_table7[256] = { +[0x04] = UNARY_OP(vpermilps_imm, AVX, 0), +[0x05] = UNARY_OP(vpermilpd_imm, AVX, 0), [0x08] = UNARY_OP(roundps, SSE41, 0), [0x09] = UNARY_OP(roundpd, SSE41, 0), #define gen_helper_roundss_ymm NULL -- 2.36.0
[PATCH v2 33/42] i386: Implement VMASKMOV
Decoding these is a bit messy, but at least the integer and float variants have the same semantics once decoded. We don't try and be clever with the load forms, instead load the whole vector then mask out the elements we want. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 48 target/i386/ops_sse_header.h | 4 +++ target/i386/tcg/translate.c | 34 + 3 files changed, 86 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index edf14a25d7..ffcba3d02c 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -3240,6 +3240,54 @@ void glue(helper_vtestpd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) CC_SRC = ((zf >> 63) ? 0 : CC_Z) | ((cf >> 63) ? 0 : CC_C); } +void glue(helper_vpmaskmovd_st, SUFFIX)(CPUX86State *env, +Reg *s, Reg *v, target_ulong a0) +{ +int i; + +for (i = 0; i < (2 << SHIFT); i++) { +if (v->L(i) >> 31) { +cpu_stl_data_ra(env, a0 + i * 4, s->L(i), GETPC()); +} +} +} + +void glue(helper_vpmaskmovq_st, SUFFIX)(CPUX86State *env, +Reg *s, Reg *v, target_ulong a0) +{ +int i; + +for (i = 0; i < (1 << SHIFT); i++) { +if (v->Q(i) >> 63) { +cpu_stq_data_ra(env, a0 + i * 8, s->Q(i), GETPC()); +} +} +} + +void glue(helper_vpmaskmovd, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ +d->L(0) = (v->L(0) >> 31) ? s->L(0) : 0; +d->L(1) = (v->L(1) >> 31) ? s->L(1) : 0; +d->L(2) = (v->L(2) >> 31) ? s->L(2) : 0; +d->L(3) = (v->L(3) >> 31) ? s->L(3) : 0; +#if SHIFT == 2 +d->L(4) = (v->L(4) >> 31) ? s->L(4) : 0; +d->L(5) = (v->L(5) >> 31) ? s->L(5) : 0; +d->L(6) = (v->L(6) >> 31) ? s->L(6) : 0; +d->L(7) = (v->L(7) >> 31) ? s->L(7) : 0; +#endif +} + +void glue(helper_vpmaskmovq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s) +{ +d->Q(0) = (v->Q(0) >> 63) ? s->Q(0) : 0; +d->Q(1) = (v->Q(1) >> 63) ? s->Q(1) : 0; +#if SHIFT == 2 +d->Q(2) = (v->Q(2) >> 63) ? s->Q(2) : 0; +d->Q(3) = (v->Q(3) >> 63) ? s->Q(3) : 0; +#endif +} + #if SHIFT == 2 void glue(helper_vbroadcastdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) { diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index 8b93b8e6d6..a7a6bf6b10 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -429,6 +429,10 @@ DEF_HELPER_4(glue(vpsravq, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_4(glue(vpsllvq, SUFFIX), void, env, Reg, Reg, Reg) DEF_HELPER_3(glue(vtestps, SUFFIX), void, env, Reg, Reg) DEF_HELPER_3(glue(vtestpd, SUFFIX), void, env, Reg, Reg) +DEF_HELPER_4(glue(vpmaskmovd_st, SUFFIX), void, env, Reg, Reg, tl) +DEF_HELPER_4(glue(vpmaskmovq_st, SUFFIX), void, env, Reg, Reg, tl) +DEF_HELPER_4(glue(vpmaskmovd, SUFFIX), void, env, Reg, Reg, Reg) +DEF_HELPER_4(glue(vpmaskmovq, SUFFIX), void, env, Reg, Reg, Reg) #if SHIFT == 2 DEF_HELPER_3(glue(vbroadcastdq, SUFFIX), void, env, Reg, Reg) DEF_HELPER_1(vzeroall, void, env) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 2fbb7bfcad..e00195d301 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3277,6 +3277,10 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x29] = BINARY_OP(pcmpeqq, SSE41, SSE_OPF_MMX), [0x2a] = SPECIAL_OP(SSE41), /* movntqda */ [0x2b] = BINARY_OP(packusdw, SSE41, SSE_OPF_MMX), +[0x2c] = BINARY_OP(vpmaskmovd, AVX, 0), /* vmaskmovps */ +[0x2d] = BINARY_OP(vpmaskmovq, AVX, 0), /* vmaskmovpd */ +[0x2e] = SPECIAL_OP(AVX), /* vmaskmovps */ +[0x2f] = SPECIAL_OP(AVX), /* vmaskmovpd */ [0x30] = UNARY_OP(pmovzxbw, SSE41, SSE_OPF_MMX), [0x31] = UNARY_OP(pmovzxbd, SSE41, SSE_OPF_MMX), [0x32] = UNARY_OP(pmovzxbq, SSE41, SSE_OPF_MMX), @@ -3308,6 +3312,9 @@ static const struct SSEOpHelper_table6 sse_op_table6[256] = { [0x78] = UNARY_OP(vbroadcastb, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), /* vpbroadcastw */ [0x79] = UNARY_OP(vbroadcastw, AVX, SSE_OPF_SCALAR | SSE_OPF_MMX), +/* vpmaskmovd, vpmaskmovq */ +[0x8c] = BINARY_OP(vpmaskmovd, AVX, SSE_OPF_AVX2), +[0x8e] = SPECIAL_OP(AVX), /* vpmaskmovd, vpmaskmovq */ #define gen_helper_aesimc_ymm NULL [0xdb] = UNARY_OP(aesimc, AES, 0), [0xdc] = BINARY_OP(aesenc, AES, 0), @@ -3369,6 +3376,11 @@ static const SSEFunc_0_eppp sse_op_table8[3][2] = { SSE_OP(vpsravq), SSE_OP(vpsllvq), }; + +static const SSEFunc_0_eppt sse_op_table9[2][2] = { +SSE_OP(vpmaskmovd_st), +SSE_OP(vpmaskmovq_st), +}; #undef SSE_OP /* VEX prefix not allowed */ @@ -4394,6 +4406,22 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_clear_ymmh(s, reg); } return; +case 0x2e: /* maskmovpd */ +b1 = 0; +
[PATCH v2 37/42] i386: Implement VBLENDV
The AVX variants of the BLENDV instructions use a different opcode prefix to support the additional operands. We already modified the helper functions in anticipation of this. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 4072fa28d3..95ecdea8fe 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3384,6 +3384,9 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { [0x42] = BINARY_OP(mpsadbw, SSE41, SSE_OPF_MMX), [0x44] = BINARY_OP(pclmulqdq, PCLMULQDQ, 0), [0x46] = BINARY_OP(vpermdq, AVX, SSE_OPF_AVX2), /* vperm2i128 */ +[0x4a] = BLENDV_OP(blendvps, AVX, 0), +[0x4b] = BLENDV_OP(blendvpd, AVX, 0), +[0x4c] = BLENDV_OP(pblendvb, AVX, SSE_OPF_MMX), #define gen_helper_pcmpestrm_ymm NULL [0x60] = CMP_OP(pcmpestrm, SSE42), #define gen_helper_pcmpestri_ymm NULL @@ -5268,6 +5271,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } /* SSE */ +if (op7.flags & SSE_OPF_BLENDV && !(s->prefix & PREFIX_VEX)) { +/* Only VEX encodings are valid for these blendv opcodes */ +goto illegal_op; +} op1_offset = ZMM_OFFSET(reg); if (mod == 3) { op2_offset = ZMM_OFFSET(rm | REX_B(s)); @@ -5316,8 +5323,15 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, op7.fn[b1].op1(cpu_env, s->ptr0, s->ptr1, tcg_const_i32(val)); } else { tcg_gen_addi_ptr(s->ptr2, cpu_env, v_offset); -op7.fn[b1].op2(cpu_env, s->ptr0, s->ptr2, s->ptr1, - tcg_const_i32(val)); +if (op7.flags & SSE_OPF_BLENDV) { +TCGv_ptr mask = tcg_temp_new_ptr(); +tcg_gen_addi_ptr(mask, cpu_env, ZMM_OFFSET(val >> 4)); +op7.fn[b1].op3(cpu_env, s->ptr0, s->ptr2, s->ptr1, mask); +tcg_temp_free_ptr(mask); +} else { +op7.fn[b1].op2(cpu_env, s->ptr0, s->ptr2, s->ptr1, + tcg_const_i32(val)); +} } if ((op7.flags & SSE_OPF_CMP) == 0 && s->vex_l == 0) { gen_clear_ymmh(s, reg); -- 2.36.0
[PATCH v2 10/42] i386: Rewrite vector shift helper
Rewrite the vector shift helpers in preperation for AVX support (3 operand form and 256 bit vectors). For now keep the existing two operand interface. No functional changes to existing helpers. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 250 ++ 1 file changed, 133 insertions(+), 117 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 23daab6b50..9297c96d04 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -63,199 +63,215 @@ #define MOVE(d, r) memcpy(&(d).B(0), &(r).B(0), SIZE) #endif -void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +#if SHIFT == 0 +#define SHIFT_HELPER_BODY(n, elem, F) do { \ +d->elem(0) = F(s->elem(0), shift); \ +if ((n) > 1) { \ +d->elem(1) = F(s->elem(1), shift); \ +} \ +if ((n) > 2) { \ +d->elem(2) = F(s->elem(2), shift); \ +d->elem(3) = F(s->elem(3), shift); \ +} \ +if ((n) > 4) { \ +d->elem(4) = F(s->elem(4), shift); \ +d->elem(5) = F(s->elem(5), shift); \ +d->elem(6) = F(s->elem(6), shift); \ +d->elem(7) = F(s->elem(7), shift); \ +} \ +if ((n) > 8) { \ +d->elem(8) = F(s->elem(8), shift); \ +d->elem(9) = F(s->elem(9), shift); \ +d->elem(10) = F(s->elem(10), shift);\ +d->elem(11) = F(s->elem(11), shift);\ +d->elem(12) = F(s->elem(12), shift);\ +d->elem(13) = F(s->elem(13), shift);\ +d->elem(14) = F(s->elem(14), shift);\ +d->elem(15) = F(s->elem(15), shift);\ +} \ +} while (0) + +#define FPSRL(x, c) ((x) >> shift) +#define FPSRAW(x, c) ((int16_t)(x) >> shift) +#define FPSRAL(x, c) ((int32_t)(x) >> shift) +#define FPSLL(x, c) ((x) << shift) +#endif + +void glue(helper_psrlw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { +Reg *s = d; int shift; - -if (s->Q(0) > 15) { +if (c->Q(0) > 15) { d->Q(0) = 0; -#if SHIFT == 1 -d->Q(1) = 0; -#endif +XMM_ONLY(d->Q(1) = 0;) +YMM_ONLY( +d->Q(2) = 0; +d->Q(3) = 0; +) } else { -shift = s->B(0); -d->W(0) >>= shift; -d->W(1) >>= shift; -d->W(2) >>= shift; -d->W(3) >>= shift; -#if SHIFT == 1 -d->W(4) >>= shift; -d->W(5) >>= shift; -d->W(6) >>= shift; -d->W(7) >>= shift; -#endif +shift = c->B(0); +SHIFT_HELPER_BODY(4 << SHIFT, W, FPSRL); } } -void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { +Reg *s = d; int shift; - -if (s->Q(0) > 15) { -shift = 15; +if (c->Q(0) > 15) { +d->Q(0) = 0; +XMM_ONLY(d->Q(1) = 0;) +YMM_ONLY( +d->Q(2) = 0; +d->Q(3) = 0; +) } else { -shift = s->B(0); +shift = c->B(0); +SHIFT_HELPER_BODY(4 << SHIFT, W, FPSLL); } -d->W(0) = (int16_t)d->W(0) >> shift; -d->W(1) = (int16_t)d->W(1) >> shift; -d->W(2) = (int16_t)d->W(2) >> shift; -d->W(3) = (int16_t)d->W(3) >> shift; -#if SHIFT == 1 -d->W(4) = (int16_t)d->W(4) >> shift; -d->W(5) = (int16_t)d->W(5) >> shift; -d->W(6) = (int16_t)d->W(6) >> shift; -d->W(7) = (int16_t)d->W(7) >> shift; -#endif } -void glue(helper_psllw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psraw, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { +Reg *s = d; int shift; - -if (s->Q(0) > 15) { -d->Q(0) = 0; -#if SHIFT == 1 -d->Q(1) = 0; -#endif +if (c->Q(0) > 15) { +shift = 15; } else { -shift = s->B(0); -d->W(0) <<= shift; -d->W(1) <<= shift; -d->W(2) <<= shift; -d->W(3) <<= shift; -#if SHIFT == 1 -d->W(4) <<= shift; -d->W(5) <<= shift; -d->W(6) <<= shift; -d->W(7) <<= shift; -#endif +shift = c->B(0); } +SHIFT_HELPER_BODY(4 << SHIFT, W, FPSRAW); } -void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *s) +void glue(helper_psrld, SUFFIX)(CPUX86State *env, Reg *d, Reg *c) { +Reg *s = d; int shift; - -if (s->Q(0) > 31) { +if (c->Q(0) > 31) { d->Q(0) = 0; -#if SHIFT == 1 -d->Q(1) = 0; -#endif +XMM_ONLY(d->Q(1) = 0;) +YMM_ONLY( +d->Q(2) = 0; +d->Q(3) = 0; +) } else { -shift = s->B(0); -d->L(0) >>= shift; -d->L(1) >>= shift; -#if SHIFT == 1 -d->L(2) >>= shift;
[PATCH v2 05/42] i386: Rework sse_op_table6/7
Add a flags field each row in sse_op_table6 and sse_op_table7. Initially this is only used as a replacement for the magic SSE41_SPECIAL pointer. The other flags will become relevant as the rest of the avx implementation is built out. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 232 1 file changed, 132 insertions(+), 100 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 7fec582358..5335b86c01 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2977,7 +2977,6 @@ static const struct SSEOpHelper_table1 sse_op_table1[256] = { #undef SSE_SPECIAL #define MMX_OP2(x) { gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm } -#define SSE_SPECIAL_FN ((void *)1) static const SSEFunc_0_epp sse_op_table2[3 * 8][2] = { [0 + 2] = MMX_OP2(psrlw), @@ -3061,113 +3060,134 @@ static const SSEFunc_0_epp sse_op_table5[256] = { [0xbf] = gen_helper_pavgb_mmx /* pavgusb */ }; -struct SSEOpHelper_epp { +struct SSEOpHelper_table6 { SSEFunc_0_epp op[2]; uint32_t ext_mask; +int flags; }; -struct SSEOpHelper_eppi { +struct SSEOpHelper_table7 { SSEFunc_0_eppi op[2]; uint32_t ext_mask; +int flags; }; -#define SSSE3_OP(x) { MMX_OP2(x), CPUID_EXT_SSSE3 } -#define SSE41_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_SSE41 } -#define SSE42_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_SSE42 } -#define SSE41_SPECIAL { { NULL, SSE_SPECIAL_FN }, CPUID_EXT_SSE41 } -#define PCLMULQDQ_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, \ -CPUID_EXT_PCLMULQDQ } -#define AESNI_OP(x) { { NULL, gen_helper_ ## x ## _xmm }, CPUID_EXT_AES } - -static const struct SSEOpHelper_epp sse_op_table6[256] = { -[0x00] = SSSE3_OP(pshufb), -[0x01] = SSSE3_OP(phaddw), -[0x02] = SSSE3_OP(phaddd), -[0x03] = SSSE3_OP(phaddsw), -[0x04] = SSSE3_OP(pmaddubsw), -[0x05] = SSSE3_OP(phsubw), -[0x06] = SSSE3_OP(phsubd), -[0x07] = SSSE3_OP(phsubsw), -[0x08] = SSSE3_OP(psignb), -[0x09] = SSSE3_OP(psignw), -[0x0a] = SSSE3_OP(psignd), -[0x0b] = SSSE3_OP(pmulhrsw), -[0x10] = SSE41_OP(pblendvb), -[0x14] = SSE41_OP(blendvps), -[0x15] = SSE41_OP(blendvpd), -[0x17] = SSE41_OP(ptest), -[0x1c] = SSSE3_OP(pabsb), -[0x1d] = SSSE3_OP(pabsw), -[0x1e] = SSSE3_OP(pabsd), -[0x20] = SSE41_OP(pmovsxbw), -[0x21] = SSE41_OP(pmovsxbd), -[0x22] = SSE41_OP(pmovsxbq), -[0x23] = SSE41_OP(pmovsxwd), -[0x24] = SSE41_OP(pmovsxwq), -[0x25] = SSE41_OP(pmovsxdq), -[0x28] = SSE41_OP(pmuldq), -[0x29] = SSE41_OP(pcmpeqq), -[0x2a] = SSE41_SPECIAL, /* movntqda */ -[0x2b] = SSE41_OP(packusdw), -[0x30] = SSE41_OP(pmovzxbw), -[0x31] = SSE41_OP(pmovzxbd), -[0x32] = SSE41_OP(pmovzxbq), -[0x33] = SSE41_OP(pmovzxwd), -[0x34] = SSE41_OP(pmovzxwq), -[0x35] = SSE41_OP(pmovzxdq), -[0x37] = SSE42_OP(pcmpgtq), -[0x38] = SSE41_OP(pminsb), -[0x39] = SSE41_OP(pminsd), -[0x3a] = SSE41_OP(pminuw), -[0x3b] = SSE41_OP(pminud), -[0x3c] = SSE41_OP(pmaxsb), -[0x3d] = SSE41_OP(pmaxsd), -[0x3e] = SSE41_OP(pmaxuw), -[0x3f] = SSE41_OP(pmaxud), -[0x40] = SSE41_OP(pmulld), -[0x41] = SSE41_OP(phminposuw), -[0xdb] = AESNI_OP(aesimc), -[0xdc] = AESNI_OP(aesenc), -[0xdd] = AESNI_OP(aesenclast), -[0xde] = AESNI_OP(aesdec), -[0xdf] = AESNI_OP(aesdeclast), +#define gen_helper_special_xmm NULL + +#define OP(name, op, flags, ext, mmx_name) \ +{{mmx_name, gen_helper_ ## name ## _xmm}, CPUID_EXT_ ## ext, flags} +#define BINARY_OP_MMX(name, ext) \ +OP(name, op2, SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) +#define BINARY_OP(name, ext, flags) \ +OP(name, op2, flags, ext, NULL) +#define UNARY_OP_MMX(name, ext) \ +OP(name, op1, SSE_OPF_V0 | SSE_OPF_MMX, ext, gen_helper_ ## name ## _mmx) +#define UNARY_OP(name, ext, flags) \ +OP(name, op1, SSE_OPF_V0 | flags, ext, NULL) +#define BLENDV_OP(name, ext, flags) OP(name, op3, SSE_OPF_BLENDV, ext, NULL) +#define CMP_OP(name, ext) OP(name, op1, SSE_OPF_CMP | SSE_OPF_V0, ext, NULL) +#define SPECIAL_OP(ext) OP(special, op1, SSE_OPF_SPECIAL, ext, NULL) + +/* prefix [66] 0f 38 */ +static const struct SSEOpHelper_table6 sse_op_table6[256] = { +[0x00] = BINARY_OP_MMX(pshufb, SSSE3), +[0x01] = BINARY_OP_MMX(phaddw, SSSE3), +[0x02] = BINARY_OP_MMX(phaddd, SSSE3), +[0x03] = BINARY_OP_MMX(phaddsw, SSSE3), +[0x04] = BINARY_OP_MMX(pmaddubsw, SSSE3), +[0x05] = BINARY_OP_MMX(phsubw, SSSE3), +[0x06] = BINARY_OP_MMX(phsubd, SSSE3), +[0x07] = BINARY_OP_MMX(phsubsw, SSSE3), +[0x08] = BINARY_OP_MMX(psignb, SSSE3), +[0x09] = BINARY_OP_MMX(psignw, SSSE3), +[0x0a] = BINARY_OP_MMX(psignd, SSSE3), +[0x0b] = BINARY_OP_MMX(pmulhrsw, SSSE3), +[0x10] = BLENDV_OP(pblendvb, SSE41, SSE_OPF_MMX), +[0x14] = BLENDV_OP(blendvps, SSE41, 0), +[0x15] = BLENDV_OP(blendvpd, SSE41, 0),
[PATCH v2 08/42] i386: Add ZMM_OFFSET macro
Add a convenience macro to get the address of an xmm_regs element within CPUX86State. This was originally going to be the basis of an implementation that broke operations into 128 bit chunks. I scrapped that idea, so this is now a purely cosmetic change. But I think a worthwhile one - it reduces the number of function calls that need to be split over multiple lines. No functional changes. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 60 + 1 file changed, 27 insertions(+), 33 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 2f5cc24e0c..e9e6062b7f 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2777,6 +2777,8 @@ static inline void gen_op_movq_env_0(DisasContext *s, int d_offset) tcg_gen_st_i64(s->tmp1_i64, cpu_env, d_offset); } +#define ZMM_OFFSET(reg) offsetof(CPUX86State, xmm_regs[reg]) + typedef void (*SSEFunc_i_ep)(TCGv_i32 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_l_ep)(TCGv_i64 val, TCGv_ptr env, TCGv_ptr reg); typedef void (*SSEFunc_0_epi)(TCGv_ptr env, TCGv_ptr reg, TCGv_i32 val); @@ -3329,14 +3331,14 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); -gen_sto_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); +gen_sto_env_A0(s, ZMM_OFFSET(reg)); break; case 0x3f0: /* lddqu */ CHECK_AVX_V0(s); if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); -gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); +gen_ldo_env_A0(s, ZMM_OFFSET(reg)); break; case 0x22b: /* movntss */ case 0x32b: /* movntsd */ @@ -3375,15 +3377,13 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, #ifdef TARGET_X86_64 if (s->dflag == MO_64) { gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0); -tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[reg])); +tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(reg)); gen_helper_movq_mm_T0_xmm(s->ptr0, s->T0); } else #endif { gen_ldst_modrm(env, s, modrm, MO_32, OR_TMP0, 0); -tcg_gen_addi_ptr(s->ptr0, cpu_env, - offsetof(CPUX86State,xmm_regs[reg])); +tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(reg)); tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0); gen_helper_movl_mm_T0_xmm(s->ptr0, s->tmp2_i32); } @@ -3410,11 +3410,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, CHECK_AVX_V0(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); -gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); +gen_ldo_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); -gen_op_movo(s, offsetof(CPUX86State, xmm_regs[reg]), -offsetof(CPUX86State,xmm_regs[rm])); +gen_op_movo(s, ZMM_OFFSET(reg), ZMM_OFFSET(rm)); } break; case 0x210: /* movss xmm, ea */ @@ -3474,7 +3473,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, CHECK_AVX_V0(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); -gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); +gen_ldo_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); gen_op_movl(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_L(0)), @@ -3519,7 +3518,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, CHECK_AVX_V0(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); -gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); +gen_ldo_env_A0(s, ZMM_OFFSET(reg)); } else { rm = (modrm & 7) | REX_B(s); gen_op_movl(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_L(1)), @@ -3542,8 +3541,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, goto illegal_op; field_length = x86_ldub_code(env, s) & 0x3F; bit_index = x86_ldub_code(env, s) & 0x3F; -tcg_gen_addi_ptr(s->ptr0, cpu_env, -offsetof(CPUX86State,xmm_regs[reg])); +tcg_gen_addi_ptr(s->ptr0, cpu_env, ZMM_OFFSET(reg)); if (b1 == 1) gen_helper_extrq_i(cpu_env, s->ptr0, tcg_const_i32(bit_index), @@ -3617,11 +3615,10 @@ static void
[PATCH v2 07/42] Enforce VEX encoding restrictions
Add CHECK_AVX* macros, and use them to validate VEX encoded AVX instructions All AVX instructions require both CPU and OS support, this is encapsulated by HF_AVX_EN. Some also require specific values in the VEX.L and VEX.V fields. Some (mostly integer operations) also require AVX2 Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 159 +--- 1 file changed, 149 insertions(+), 10 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 66ba690b7d..2f5cc24e0c 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3185,10 +3185,54 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { goto illegal_op; \ } while (0) +/* + * VEX encodings require AVX + * Allow legacy SSE encodings even if AVX not enabled + */ +#define CHECK_AVX(s) do { \ +if ((s->prefix & PREFIX_VEX) \ +&& !(env->hflags & HF_AVX_EN_MASK)) \ +goto illegal_op; \ +} while (0) + +/* If a VEX prefix is used then it must have V=b */ +#define CHECK_AVX_V0(s) do { \ +CHECK_AVX(s); \ +if ((s->prefix & PREFIX_VEX) && (s->vex_v != 0)) \ +goto illegal_op; \ +} while (0) + +/* If a VEX prefix is used then it must have L=0 */ +#define CHECK_AVX_128(s) do { \ +CHECK_AVX(s); \ +if ((s->prefix & PREFIX_VEX) && (s->vex_l != 0)) \ +goto illegal_op; \ +} while (0) + +/* If a VEX prefix is used then it must have V=b and L=0 */ +#define CHECK_AVX_V0_128(s) do { \ +CHECK_AVX(s); \ +if ((s->prefix & PREFIX_VEX) && (s->vex_v != 0 || s->vex_l != 0)) \ +goto illegal_op; \ +} while (0) + +/* 256-bit (ymm) variants require AVX2 */ +#define CHECK_AVX2_256(s) do { \ +if (s->vex_l && !(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2)) \ +goto illegal_op; \ +} while (0) + +/* Requires AVX2 and VEX encoding */ +#define CHECK_AVX2(s) do { \ +if ((s->prefix & PREFIX_VEX) == 0 \ +|| !(s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2)) \ +goto illegal_op; \ +} while (0) + static void gen_sse(CPUX86State *env, DisasContext *s, int b, target_ulong pc_start) { -int b1, op1_offset, op2_offset, is_xmm, val; +int b1, op1_offset, op2_offset, is_xmm, val, scalar_op; int modrm, mod, rm, reg; struct SSEOpHelper_table1 sse_op; struct SSEOpHelper_table6 op6; @@ -3228,15 +3272,18 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_exception(s, EXCP07_PREX, pc_start - s->cs_base); return; } -if (s->flags & HF_EM_MASK) { -illegal_op: -gen_illegal_opcode(s); -return; -} -if (is_xmm -&& !(s->flags & HF_OSFXSR_MASK) -&& (b != 0x38 && b != 0x3a)) { -goto unknown_op; +/* VEX encoded instuctions ignore EM bit. See also CHECK_AVX */ +if (!(s->prefix & PREFIX_VEX)) { +if (s->flags & HF_EM_MASK) { +illegal_op: +gen_illegal_opcode(s); +return; +} +if (is_xmm +&& !(s->flags & HF_OSFXSR_MASK) +&& (b != 0x38 && b != 0x3a)) { +goto unknown_op; +} } if (b == 0x0e) { if (!(s->cpuid_ext2_features & CPUID_EXT2_3DNOW)) { @@ -3278,12 +3325,14 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x1e7: /* movntdq */ case 0x02b: /* movntps */ case 0x12b: /* movntps */ +CHECK_AVX_V0(s); if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); gen_sto_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); break; case 0x3f0: /* lddqu */ +CHECK_AVX_V0(s); if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); @@ -3291,6 +3340,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; case 0x22b: /* movntss */ case 0x32b: /* movntsd */ +CHECK_AVX_V0_128(s); if (mod == 3) goto illegal_op; gen_lea_modrm(env, s, modrm); @@ -3321,6 +3371,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x16e: /* movd xmm, ea */ +CHECK_AVX_V0_128(s); #ifdef TARGET_X86_64 if (s->dflag == MO_64) { gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0); @@ -3356,6 +3407,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x128: /* movapd */ case 0x16f: /* movdqa xmm, ea */ case 0x26f: /* movdqu xmm, ea */ +CHECK_AVX_V0(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg])); @@ -3367,6 +3419,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
[PATCH v2 03/42] Add AVX_EN hflag
Add a new hflag bit to determine whether AVX instructions are allowed Signed-off-by: Paul Brook --- target/i386/cpu.h| 3 +++ target/i386/helper.c | 12 target/i386/tcg/fpu_helper.c | 1 + 3 files changed, 16 insertions(+) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index 9661f9fbd1..65200a1917 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -169,6 +169,7 @@ typedef enum X86Seg { #define HF_MPX_EN_SHIFT 25 /* MPX Enabled (CR4+XCR0+BNDCFGx) */ #define HF_MPX_IU_SHIFT 26 /* BND registers in-use */ #define HF_UMIP_SHIFT 27 /* CR4.UMIP */ +#define HF_AVX_EN_SHIFT 28 /* AVX Enabled (CR4+XCR0) */ #define HF_CPL_MASK (3 << HF_CPL_SHIFT) #define HF_INHIBIT_IRQ_MASK (1 << HF_INHIBIT_IRQ_SHIFT) @@ -195,6 +196,7 @@ typedef enum X86Seg { #define HF_MPX_EN_MASK (1 << HF_MPX_EN_SHIFT) #define HF_MPX_IU_MASK (1 << HF_MPX_IU_SHIFT) #define HF_UMIP_MASK (1 << HF_UMIP_SHIFT) +#define HF_AVX_EN_MASK (1 << HF_AVX_EN_SHIFT) /* hflags2 */ @@ -2035,6 +2037,7 @@ void host_cpuid(uint32_t function, uint32_t count, /* helper.c */ void x86_cpu_set_a20(X86CPU *cpu, int a20_state); +void cpu_sync_avx_hflag(CPUX86State *env); #ifndef CONFIG_USER_ONLY static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs) diff --git a/target/i386/helper.c b/target/i386/helper.c index fa409e9c44..30083c9cff 100644 --- a/target/i386/helper.c +++ b/target/i386/helper.c @@ -29,6 +29,17 @@ #endif #include "qemu/log.h" +void cpu_sync_avx_hflag(CPUX86State *env) +{ +if ((env->cr[4] & CR4_OSXSAVE_MASK) +&& (env->xcr0 & (XSTATE_SSE_MASK | XSTATE_YMM_MASK)) +== (XSTATE_SSE_MASK | XSTATE_YMM_MASK)) { +env->hflags |= HF_AVX_EN_MASK; +} else{ +env->hflags &= ~HF_AVX_EN_MASK; +} +} + void cpu_sync_bndcs_hflags(CPUX86State *env) { uint32_t hflags = env->hflags; @@ -209,6 +220,7 @@ void cpu_x86_update_cr4(CPUX86State *env, uint32_t new_cr4) env->hflags = hflags; cpu_sync_bndcs_hflags(env); +cpu_sync_avx_hflag(env); } #if !defined(CONFIG_USER_ONLY) diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c index ebf5e73df9..b391b69635 100644 --- a/target/i386/tcg/fpu_helper.c +++ b/target/i386/tcg/fpu_helper.c @@ -2943,6 +2943,7 @@ void helper_xsetbv(CPUX86State *env, uint32_t ecx, uint64_t mask) env->xcr0 = mask; cpu_sync_bndcs_hflags(env); +cpu_sync_avx_hflag(env); return; do_gpf: -- 2.36.0
[PATCH v2 06/42] i386: Add CHECK_NO_VEX
Reject invalid VEX encodings on MMX instructions. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index 5335b86c01..66ba690b7d 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -3179,6 +3179,12 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] = { #undef BLENDV_OP #undef SPECIAL_OP +/* VEX prefix not allowed */ +#define CHECK_NO_VEX(s) do { \ +if (s->prefix & PREFIX_VEX) \ +goto illegal_op; \ +} while (0) + static void gen_sse(CPUX86State *env, DisasContext *s, int b, target_ulong pc_start) { @@ -3262,6 +3268,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, b |= (b1 << 8); switch(b) { case 0x0e7: /* movntq */ +CHECK_NO_VEX(s); if (mod == 3) { goto illegal_op; } @@ -3297,6 +3304,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x6e: /* movd mm, ea */ +CHECK_NO_VEX(s); #ifdef TARGET_X86_64 if (s->dflag == MO_64) { gen_ldst_modrm(env, s, modrm, MO_64, OR_TMP0, 0); @@ -3330,6 +3338,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x6f: /* movq mm, ea */ +CHECK_NO_VEX(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); gen_ldq_env_A0(s, offsetof(CPUX86State, fpregs[reg].mmx)); @@ -3464,6 +3473,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; case 0x178: case 0x378: +CHECK_NO_VEX(s); { int bit_index, field_length; @@ -3484,6 +3494,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x7e: /* movd ea, mm */ +CHECK_NO_VEX(s); #ifdef TARGET_X86_64 if (s->dflag == MO_64) { tcg_gen_ld_i64(s->T0, cpu_env, @@ -3524,6 +3535,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_op_movq_env_0(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(1))); break; case 0x7f: /* movq ea, mm */ +CHECK_NO_VEX(s); if (mod != 3) { gen_lea_modrm(env, s, modrm); gen_stq_env_A0(s, offsetof(CPUX86State, fpregs[reg].mmx)); @@ -3607,6 +3619,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, offsetof(CPUX86State, xmm_t0.ZMM_L(1))); op1_offset = offsetof(CPUX86State,xmm_t0); } else { +CHECK_NO_VEX(s); tcg_gen_movi_tl(s->T0, val); tcg_gen_st32_tl(s->T0, cpu_env, offsetof(CPUX86State, mmx_t0.MMX_L(0))); @@ -3648,6 +3661,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, break; case 0x02a: /* cvtpi2ps */ case 0x12a: /* cvtpi2pd */ +CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); if (mod != 3) { gen_lea_modrm(env, s, modrm); @@ -3693,6 +3707,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, case 0x12c: /* cvttpd2pi */ case 0x02d: /* cvtps2pi */ case 0x12d: /* cvtpd2pi */ +CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); if (mod != 3) { gen_lea_modrm(env, s, modrm); @@ -3766,6 +3781,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, tcg_gen_st16_tl(s->T0, cpu_env, offsetof(CPUX86State,xmm_regs[reg].ZMM_W(val))); } else { +CHECK_NO_VEX(s); val &= 3; tcg_gen_st16_tl(s->T0, cpu_env, offsetof(CPUX86State,fpregs[reg].mmx.MMX_W(val))); @@ -3805,6 +3821,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, } break; case 0x2d6: /* movq2dq */ +CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); rm = (modrm & 7); gen_op_movq(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(0)), @@ -3812,6 +3829,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b, gen_op_movq_env_0(s, offsetof(CPUX86State, xmm_regs[reg].ZMM_Q(1))); break; case 0x3d6: /* movdq2q */ +CHECK_NO_VEX(s); gen_helper_enter_mmx(cpu_env); rm = (modrm & 7) | REX_B(s); gen_op_movq(s, offsetof(CPUX86State, fpregs[reg & 7].mmx), @@ -3827,6 +3845,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
[PATCH v2 04/42] i386: Rework sse_op_table1
Add a flags field each row in sse_op_table1. Initially this is only used as a replacement for the magic SSE_SPECIAL and SSE_DUMMY pointers, the other flags will become relevant as the rest of the AVX implementation is built out. Signed-off-by: Paul Brook --- target/i386/tcg/translate.c | 316 +--- 1 file changed, 186 insertions(+), 130 deletions(-) diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c index b7972f0ff5..7fec582358 100644 --- a/target/i386/tcg/translate.c +++ b/target/i386/tcg/translate.c @@ -2788,146 +2788,196 @@ typedef void (*SSEFunc_0_ppi)(TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv_i32 val); typedef void (*SSEFunc_0_eppt)(TCGv_ptr env, TCGv_ptr reg_a, TCGv_ptr reg_b, TCGv val); -#define SSE_SPECIAL ((void *)1) -#define SSE_DUMMY ((void *)2) +#define SSE_OPF_V0(1 << 0) /* vex.v must be b (only 2 operands) */ +#define SSE_OPF_CMP (1 << 1) /* does not write for first operand */ +#define SSE_OPF_BLENDV(1 << 2) /* blendv* instruction */ +#define SSE_OPF_SPECIAL (1 << 3) /* magic */ +#define SSE_OPF_3DNOW (1 << 4) /* 3DNow! instruction */ +#define SSE_OPF_MMX (1 << 5) /* MMX/integer/AVX2 instruction */ +#define SSE_OPF_SCALAR(1 << 6) /* Has SSE scalar variants */ +#define SSE_OPF_AVX2 (1 << 7) /* AVX2 instruction */ +#define SSE_OPF_SHUF (1 << 9) /* pshufx/shufpx */ + +#define OP(op, flags, a, b, c, d) \ +{flags, {a, b, c, d} } + +#define MMX_OP(x) OP(op2, SSE_OPF_MMX, \ +gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm, NULL, NULL) + +#define SSE_FOP(name) OP(op2, SSE_OPF_SCALAR, \ +gen_helper_##name##ps, gen_helper_##name##pd, \ +gen_helper_##name##ss, gen_helper_##name##sd) +#define SSE_OP(sname, dname, op, flags) OP(op, flags, \ +gen_helper_##sname##_xmm, gen_helper_##dname##_xmm, NULL, NULL) + +struct SSEOpHelper_table1 { +int flags; +SSEFunc_0_epp op[4]; +}; -#define MMX_OP2(x) { gen_helper_ ## x ## _mmx, gen_helper_ ## x ## _xmm } -#define SSE_FOP(x) { gen_helper_ ## x ## ps, gen_helper_ ## x ## pd, \ - gen_helper_ ## x ## ss, gen_helper_ ## x ## sd, } +#define SSE_3DNOW { SSE_OPF_3DNOW } +#define SSE_SPECIAL { SSE_OPF_SPECIAL } -static const SSEFunc_0_epp sse_op_table1[256][4] = { +static const struct SSEOpHelper_table1 sse_op_table1[256] = { /* 3DNow! extensions */ -[0x0e] = { SSE_DUMMY }, /* femms */ -[0x0f] = { SSE_DUMMY }, /* pf... */ +[0x0e] = SSE_SPECIAL, /* femms */ +[0x0f] = SSE_3DNOW, /* pf... (sse_op_table5) */ /* pure SSE operations */ -[0x10] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movups, movupd, movss, movsd */ -[0x11] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movups, movupd, movss, movsd */ -[0x12] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movlps, movlpd, movsldup, movddup */ -[0x13] = { SSE_SPECIAL, SSE_SPECIAL }, /* movlps, movlpd */ -[0x14] = { gen_helper_punpckldq_xmm, gen_helper_punpcklqdq_xmm }, -[0x15] = { gen_helper_punpckhdq_xmm, gen_helper_punpckhqdq_xmm }, -[0x16] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movhps, movhpd, movshdup */ -[0x17] = { SSE_SPECIAL, SSE_SPECIAL }, /* movhps, movhpd */ - -[0x28] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */ -[0x29] = { SSE_SPECIAL, SSE_SPECIAL }, /* movaps, movapd */ -[0x2a] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtpi2ps, cvtpi2pd, cvtsi2ss, cvtsi2sd */ -[0x2b] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* movntps, movntpd, movntss, movntsd */ -[0x2c] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvttps2pi, cvttpd2pi, cvttsd2si, cvttss2si */ -[0x2d] = { SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL, SSE_SPECIAL }, /* cvtps2pi, cvtpd2pi, cvtsd2si, cvtss2si */ -[0x2e] = { gen_helper_ucomiss, gen_helper_ucomisd }, -[0x2f] = { gen_helper_comiss, gen_helper_comisd }, -[0x50] = { SSE_SPECIAL, SSE_SPECIAL }, /* movmskps, movmskpd */ -[0x51] = SSE_FOP(sqrt), -[0x52] = { gen_helper_rsqrtps, NULL, gen_helper_rsqrtss, NULL }, -[0x53] = { gen_helper_rcpps, NULL, gen_helper_rcpss, NULL }, -[0x54] = { gen_helper_pand_xmm, gen_helper_pand_xmm }, /* andps, andpd */ -[0x55] = { gen_helper_pandn_xmm, gen_helper_pandn_xmm }, /* andnps, andnpd */ -[0x56] = { gen_helper_por_xmm, gen_helper_por_xmm }, /* orps, orpd */ -[0x57] = { gen_helper_pxor_xmm, gen_helper_pxor_xmm }, /* xorps, xorpd */ +[0x10] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ +[0x11] = SSE_SPECIAL, /* movups, movupd, movss, movsd */ +[0x12] = SSE_SPECIAL, /* movlps, movlpd, movsldup, movddup */ +[0x13] = SSE_SPECIAL, /* movlps, movlpd */ +[0x14] = SSE_OP(punpckldq, punpcklqdq, op2, 0), /* unpcklps, unpcklpd */ +[0x15] = SSE_OP(punpckhdq, punpckhqdq, op2, 0), /* unpckhps, unpckhpd */ +
[PATCH v2 02/42] i386: DPPS rounding fix
The DPPS (Dot Product) instruction is defined to first sum pairs of intermediate results, then sum those values to get the final result. i.e. (A+B)+(C+D) We incrementally sum the results, i.e. ((A+B)+C)+D, which can result in incorrect rouding. For consistency, also remove the redundant (but harmless) add operation from DPPD Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 47 +++ 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index 535440f882..a5a48a20f6 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -1934,32 +1934,36 @@ SSE_HELPER_I(helper_pblendw, W, 8, FBLENDP) void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) { -float32 iresult = float32_zero; +float32 prod, iresult, iresult2; +/* + * We must evaluate (A+B)+(C+D), not ((A+B)+C)+D + * to correctly round the intermediate results + */ if (mask & (1 << 4)) { -iresult = float32_add(iresult, - float32_mul(d->ZMM_S(0), s->ZMM_S(0), - >sse_status), - >sse_status); +iresult = float32_mul(d->ZMM_S(0), s->ZMM_S(0), >sse_status); +} else { +iresult = float32_zero; } if (mask & (1 << 5)) { -iresult = float32_add(iresult, - float32_mul(d->ZMM_S(1), s->ZMM_S(1), - >sse_status), - >sse_status); +prod = float32_mul(d->ZMM_S(1), s->ZMM_S(1), >sse_status); +} else { +prod = float32_zero; } +iresult = float32_add(iresult, prod, >sse_status); if (mask & (1 << 6)) { -iresult = float32_add(iresult, - float32_mul(d->ZMM_S(2), s->ZMM_S(2), - >sse_status), - >sse_status); +iresult2 = float32_mul(d->ZMM_S(2), s->ZMM_S(2), >sse_status); +} else { +iresult2 = float32_zero; } if (mask & (1 << 7)) { -iresult = float32_add(iresult, - float32_mul(d->ZMM_S(3), s->ZMM_S(3), - >sse_status), - >sse_status); +prod = float32_mul(d->ZMM_S(3), s->ZMM_S(3), >sse_status); +} else { +prod = float32_zero; } +iresult2 = float32_add(iresult2, prod, >sse_status); +iresult = float32_add(iresult, iresult2, >sse_status); + d->ZMM_S(0) = (mask & (1 << 0)) ? iresult : float32_zero; d->ZMM_S(1) = (mask & (1 << 1)) ? iresult : float32_zero; d->ZMM_S(2) = (mask & (1 << 2)) ? iresult : float32_zero; @@ -1968,13 +1972,12 @@ void glue(helper_dpps, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) void glue(helper_dppd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, uint32_t mask) { -float64 iresult = float64_zero; +float64 iresult; if (mask & (1 << 4)) { -iresult = float64_add(iresult, - float64_mul(d->ZMM_D(0), s->ZMM_D(0), - >sse_status), - >sse_status); +iresult = float64_mul(d->ZMM_D(0), s->ZMM_D(0), >sse_status); +} else { +iresult = float64_zero; } if (mask & (1 << 5)) { iresult = float64_add(iresult, -- 2.36.0
[PATCH v2 01/42] i386: pcmpestr 64-bit sign extension bug
The abs1 function in ops_sse.h only works sorrectly when the result fits in a signed int. This is fine most of the time because we're only dealing with byte sized values. However pcmp_elen helper function uses abs1 to calculate the absolute value of a cpu register. This incorrectly truncates to 32 bits, and will give the wrong anser for the most negative value. Fix by open coding the saturation check before taking the absolute value. Signed-off-by: Paul Brook --- target/i386/ops_sse.h | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index e4d74b814a..535440f882 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -2011,25 +2011,23 @@ SSE_HELPER_Q(helper_pcmpgtq, FCMPGTQ) static inline int pcmp_elen(CPUX86State *env, int reg, uint32_t ctrl) { -int val; +target_long val, limit; /* Presence of REX.W is indicated by a bit higher than 7 set */ if (ctrl >> 8) { -val = abs1((int64_t)env->regs[reg]); +val = (target_long)env->regs[reg]; } else { -val = abs1((int32_t)env->regs[reg]); +val = (int32_t)env->regs[reg]; } - if (ctrl & 1) { -if (val > 8) { -return 8; -} +limit = 8; } else { -if (val > 16) { -return 16; -} +limit = 16; } -return val; +if ((val > limit) || (val < -limit)) { +return limit; +} +return abs1(val); } static inline int pcmp_ilen(Reg *r, uint8_t ctrl) -- 2.36.0
[PATCH v2 09/42] i386: Helper macro for 256 bit AVX helpers
Once all the code is in place, 256 bit vector helpers will be generated by including ops_sse.h a third time with SHIFT=2. The first bit of support for this is to define a YMM_ONLY macro for code that only apples to 256 bit vectors. XXM_ONLY code will be executed for both 128 and 256 bit vectors. Signed-off-by: Paul Brook --- target/i386/ops_sse.h| 8 target/i386/ops_sse_header.h | 4 2 files changed, 12 insertions(+) diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h index a5a48a20f6..23daab6b50 100644 --- a/target/i386/ops_sse.h +++ b/target/i386/ops_sse.h @@ -24,6 +24,7 @@ #define Reg MMXReg #define SIZE 8 #define XMM_ONLY(...) +#define YMM_ONLY(...) #define B(n) MMX_B(n) #define W(n) MMX_W(n) #define L(n) MMX_L(n) @@ -37,7 +38,13 @@ #define W(n) ZMM_W(n) #define L(n) ZMM_L(n) #define Q(n) ZMM_Q(n) +#if SHIFT == 1 #define SUFFIX _xmm +#define YMM_ONLY(...) +#else +#define SUFFIX _ymm +#define YMM_ONLY(...) __VA_ARGS__ +#endif #endif /* @@ -2337,6 +2344,7 @@ void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s, #undef SHIFT #undef XMM_ONLY +#undef YMM_ONLY #undef Reg #undef B #undef W diff --git a/target/i386/ops_sse_header.h b/target/i386/ops_sse_header.h index cef28f2aae..7e7f2cee2a 100644 --- a/target/i386/ops_sse_header.h +++ b/target/i386/ops_sse_header.h @@ -21,7 +21,11 @@ #define SUFFIX _mmx #else #define Reg ZMMReg +#if SHIFT == 1 #define SUFFIX _xmm +#else +#define SUFFIX _ymm +#endif #endif #define dh_alias_Reg ptr -- 2.36.0
Re: [PATCH v22 0/8] support dirty restraint on vCPU
Hi, Yong, On Mon, Apr 25, 2022 at 12:52:45AM +0800, Hyman wrote: > Ping. > Hi, David and Peter, how do you think this patchset? > Is it suitable for queueing ? or is there still something need to be done ? It keeps looking good to me in general, let's see whether the maintainers have any comments. Thanks, -- Peter Xu
Re: [PATCH] hw/crypto: add Allwinner sun4i-ss crypto device
Le Thu, Apr 21, 2022 at 01:38:00PM +0100, Peter Maydell a écrit : > On Sun, 10 Apr 2022 at 20:12, Corentin Labbe wrote: > > > > From: Corentin LABBE > > > > The Allwinner A10 has a cryptographic offloader device which > > could be easily emulated. > > The emulated device is tested with Linux only as any of BSD does not > > support it. > > > > Signed-off-by: Corentin LABBE > > Hi; thanks for this patch, and sorry it's taken me a while to get > to reviewing it. > > (Daniel, I cc'd you since this device model is making use of crypto > related APIs.) > > Firstly, a note on patch structure. This is quite a large patch, > and I think it would be useful to split it at least into two parts: > (1) add the new device model > (2) change the allwinner SoC to create that new device Hello I will do it for next iteration > > > diff --git a/docs/system/arm/cubieboard.rst b/docs/system/arm/cubieboard.rst > > index 344ff8cef9..7836643ba4 100644 > > --- a/docs/system/arm/cubieboard.rst > > +++ b/docs/system/arm/cubieboard.rst > > @@ -14,3 +14,4 @@ Emulated devices: > > - SDHCI > > - USB controller > > - SATA controller > > +- crypto > > diff --git a/docs/system/devices/allwinner-sun4i-ss.rst > > b/docs/system/devices/allwinner-sun4i-ss.rst > > new file mode 100644 > > index 00..6e7d2142b5 > > --- /dev/null > > +++ b/docs/system/devices/allwinner-sun4i-ss.rst > > @@ -0,0 +1,31 @@ > > +Allwinner sun4i-ss > > +== > > If you create a new rst file in docs, you need to put it into the > manual by adding it to some table of contents. Otherwise sphinx > will complain when you build the documentation, and users won't be > able to find it. (If you pass 'configure' the --enable-docs option > that will check that you have everything installed to be able to > build the docs.) > > There are two options here: you can have this document, and > add it to the toctree in docs/system/device-emulation.rst, and > make the "crypto" bullet point in cubieboard.rst be a hyperlink to > the device-emulation.rst file. Or you can compress the information > down and put it into orangepi.rst. > > > +The ``sun4i-ss`` emulates the Allwinner cryptographic offloader > > +present on early Allwinner SoCs (A10, A10s, A13, A20, A33) > > +In qemu only A10 via the cubieboard machine is supported. > > + > > +The emulated hardware is capable of handling the following algorithms: > > +- SHA1 and MD5 hash algorithms > > +- AES/DES/DES3 in CBC/ECB > > +- PRNG > > + > > +The emulated hardware does not handle yet: > > +- CTS for AES > > +- CTR for AES/DES/DES3 > > +- IRQ and DMA mode > > +Anyway the Linux driver also does not handle them yet. > > + > > +The emulation needs a real crypto backend, for the moment only > > gnutls/nettle is supported. > > +So the device emulation needs qemu to be compiled with optionnal gnutls. > > > diff --git a/hw/Kconfig b/hw/Kconfig > > index ad20cce0a9..43bd7fc14d 100644 > > --- a/hw/Kconfig > > +++ b/hw/Kconfig > > @@ -6,6 +6,7 @@ source audio/Kconfig > > source block/Kconfig > > source char/Kconfig > > source core/Kconfig > > +source crypto/Kconfig > > source display/Kconfig > > source dma/Kconfig > > source gpio/Kconfig > > I don't think we really need a new subdirectory of hw/ > for a single device. If you can find two other devices that > already exist in QEMU that would also belong in hw/crypto/ > then we can create it. Otherwise just put this device in > hw/misc. I plan to add at least one other hw/crypto device (allwinner H3 sun8i-ce). I have another one already ready (rockchip rk3288) but I delay it since there are no related SoC in qemu yet. > > > diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig > > index 97f3b38019..fd8232b1d4 100644 > > --- a/hw/arm/Kconfig > > +++ b/hw/arm/Kconfig > > @@ -317,6 +317,7 @@ config ALLWINNER_A10 > > select AHCI > > select ALLWINNER_A10_PIT > > select ALLWINNER_A10_PIC > > +select ALLWINNER_CRYPTO_SUN4I_SS > > select ALLWINNER_EMAC > > select SERIAL > > select UNIMP > > diff --git a/hw/arm/allwinner-a10.c b/hw/arm/allwinner-a10.c > > index 05e84728cb..e9104ee028 100644 > > --- a/hw/arm/allwinner-a10.c > > +++ b/hw/arm/allwinner-a10.c > > @@ -23,6 +23,7 @@ > > #include "hw/misc/unimp.h" > > #include "sysemu/sysemu.h" > > #include "hw/boards.h" > > +#include "hw/crypto/allwinner-sun4i-ss.h" > > #include "hw/usb/hcd-ohci.h" > > > > #define AW_A10_MMC0_BASE0x01c0f000 > > @@ -32,6 +33,7 @@ > > #define AW_A10_EMAC_BASE0x01c0b000 > > #define AW_A10_EHCI_BASE0x01c14000 > > #define AW_A10_OHCI_BASE0x01c14400 > > +#define AW_A10_CRYPTO_BASE 0x01c15000 > > #define AW_A10_SATA_BASE0x01c18000 > > #define AW_A10_RTC_BASE 0x01c20d00 > > > > @@ -48,6 +50,10 @@ static void aw_a10_init(Object *obj) > > > > object_initialize_child(obj, "emac", >emac, TYPE_AW_EMAC); > > > > +#if defined CONFIG_NETTLE > > +object_initialize_child(obj, "crypto", >crypto,
Re: [PATCH v2 2/5] 9pfs: fix qemu_mknodat(S_IFSOCK) on macOS
On Samstag, 23. April 2022 06:33:50 CEST Akihiko Odaki wrote: > On 2022/04/22 23:06, Christian Schoenebeck wrote: > > On Freitag, 22. April 2022 04:43:40 CEST Akihiko Odaki wrote: > >> On 2022/04/22 0:07, Christian Schoenebeck wrote: > >>> mknod() on macOS does not support creating sockets, so divert to > >>> call sequence socket(), bind() and chmod() respectively if S_IFSOCK > >>> was passed with mode argument. > >>> > >>> Link: https://lore.kernel.org/qemu-devel/17933734.zYzKuhC07K@silver/ > >>> Signed-off-by: Christian Schoenebeck > >>> Reviewed-by: Will Cohen > >>> --- > >>> > >>>hw/9pfs/9p-util-darwin.c | 27 ++- > >>>1 file changed, 26 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c > >>> index e24d09763a..39308f2a45 100644 > >>> --- a/hw/9pfs/9p-util-darwin.c > >>> +++ b/hw/9pfs/9p-util-darwin.c > >>> @@ -74,6 +74,27 @@ int fsetxattrat_nofollow(int dirfd, const char > >>> *filename, const char *name,> > >>> > >>> */ > >>> > >>>#if defined CONFIG_PTHREAD_FCHDIR_NP > >>> > >>> +static int create_socket_file_at_cwd(const char *filename, mode_t mode) > >>> { > >>> +int fd, err; > >>> +struct sockaddr_un addr = { > >>> +.sun_family = AF_UNIX > >>> +}; > >>> + > >>> +fd = socket(PF_UNIX, SOCK_DGRAM, 0); > >>> +if (fd == -1) { > >>> +return fd; > >>> +} > >>> +snprintf(addr.sun_path, sizeof(addr.sun_path), "./%s", filename); > >> > >> It would result in an incorrect path if the path does not fit in > >> addr.sun_path. It should report an explicit error instead. > > > > Looking at its header file, 'sun_path' is indeed defined on macOS with an > > oddly small size of only 104 bytes. So yes, I should explicitly handle > > that > > error case. > > > > I'll post a v3. > > > >>> +err = bind(fd, (struct sockaddr *) , sizeof(addr)); > >>> +if (err == -1) { > >>> +goto out; > >> > >> You may close(fd) as soon as bind() returns (before checking the > >> returned value) and eliminate goto. > > > > Yeah, I thought about that alternative, but found it a bit ugly, and > > probably also counter-productive in case this function might get extended > > with more error pathes in future. Not that I would insist on the current > > solution though. > > I'm happy with the explanation. Thanks. > > >>> +} > >>> +err = chmod(addr.sun_path, mode); > >> > >> I'm not sure if it is fine to have a time window between bind() and > >> chmod(). Do you have some rationale? > > > > Good question. QEMU's 9p server is multi-threaded; all 9p requests come in > > serialized and the 9p server controller portion (9p.c) is only running on > > QEMU main thread, but the actual filesystem driver calls are then > > dispatched to QEMU worker threads and therefore running concurrently at > > this point: > > > > https://wiki.qemu.org/Documentation/9p#Threads_and_Coroutines > > > > Similar situation on Linux 9p client side: it handles access to a mounted > > 9p filesystem concurrently, requests are then serialized by 9p driver on > > Linux and sent over wire to 9p server (host). > > > > So yes, there might be implications by that short time windows. But could > > that be exploited on macOS hosts in practice? > > > > The socket file would have mode srwxr-xr-x for a short moment. > > > > For security_model=mapped* this should not be a problem. > > > > For security_model=none|passhrough, in theory, maybe? But how likely is > > that? If you are using a Linux client for instance, trying to brute-force > > opening the socket file, the client would send several 9p commands > > (Twalk, Tgetattr, Topen, probably more). The time window of the two > > commands above should be much smaller than that and I would expect one of > > the 9p commands to error out in between. > > > > What would be a viable approach to avoid this issue on macOS? > > It is unlikely that a naive brute-force approach will succeed to > exploit. The more concerning scenario is that the attacker uses the > knowledge of the underlying implementation of macOS to cause resource > contention to widen the window. Whether an exploitation is viable > depends on how much time you spend digging XNU. > > However, I'm also not sure if it really *has* a race condition. Looking > at v9fs_co_mknod(), it sequentially calls s->ops->mknod() and > s->ops->lstat(). It also results in an entity called "path name based > fid" in the code, which inherently cannot identify a file when it is > renamed or recreated. > > If there is some rationale it is safe, it may also be applied to the > sequence of bind() and chmod(). Can anyone explain the sequence of > s->ops->mknod() and s->ops->lstat() or path name based fid in general? You are talking about 9p server's controller level: I don't see something that would prevent a concurrent open() during this bind() ... chmod() time window unfortunately. Argument 'fidp' passed to
[PATCH v2 11/11] q800: add default vendor and product information for scsi-cd devices
The MacOS CDROM driver uses a SCSI INQUIRY command to check that any SCSI CDROMs detected match a whitelist of vendors and products before adding them to the list of available devices. Add known-good default vendor and product information using the existing compat_prop mechanism so the user doesn't have to use long command lines to set the qdev properties manually. Signed-off-by: Mark Cave-Ayland --- hw/m68k/q800.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c index abb549f8d8..8b34776c8e 100644 --- a/hw/m68k/q800.c +++ b/hw/m68k/q800.c @@ -692,6 +692,9 @@ static GlobalProperty hw_compat_q800[] = { { "scsi-hd", "product", " ST225N" }, { "scsi-hd", "ver", "1.0 " }, { "scsi-cd", "quirk_mode_sense_rom_force_dbd", "on"}, +{ "scsi-cd", "vendor", "MATSHITA" }, +{ "scsi-cd", "product", "CD-ROM CR-8005" }, +{ "scsi-cd", "ver", "1.0k" }, }; static const size_t hw_compat_q800_len = G_N_ELEMENTS(hw_compat_q800); -- 2.20.1
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
On Fri, Apr 22, 2022, at 3:56 AM, Chao Peng wrote: > On Tue, Apr 05, 2022 at 06:03:21PM +, Sean Christopherson wrote: >> On Tue, Apr 05, 2022, Quentin Perret wrote: >> > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > Only when the register succeeds, the fd is > converted into a private fd, before that, the fd is just a normal (shared) > one. During this conversion, the previous data is preserved so you can put > some initial data in guest pages (whether the architecture allows this is > architecture-specific and out of the scope of this patch). I think this can be made to work, but it will be awkward. On TDX, for example, what exactly are the semantics supposed to be? An error code if the memory isn't all zero? An error code if it has ever been written? Fundamentally, I think this is because your proposed lifecycle for these memfiles results in a lightweight API but is awkward for the intended use cases. You're proposing, roughly: 1. Create a memfile. Now it's in a shared state with an unknown virt technology. It can be read and written. Let's call this state BRAND_NEW. 2. Bind to a VM. Now it's an a bound state. For TDX, for example, let's call the new state BOUND_TDX. In this state, the TDX rules are followed (private memory can't be converted, etc). The problem here is that the BOUND_NEW state allows things that are nonsensical in TDX, and the binding step needs to invent some kind of semantics for what happens when binding a nonempty memfile. So I would propose a somewhat different order: 1. Create a memfile. It's in the UNBOUND state and no operations whatsoever are allowed except binding or closing. 2. Bind the memfile to a VM (or at least to a VM technology). Now it's in the initial state appropriate for that VM. For TDX, this completely bypasses the cases where the data is prepopulated and TDX can't handle it cleanly. For SEV, it bypasses a situation in which data might be written to the memory before we find out whether that data will be unreclaimable or unmovable. -- Now I have a question, since I don't think anyone has really answered it: how does this all work with SEV- or pKVM-like technologies in which private and shared pages share the same address space? I sounds like you're proposing to have a big memfile that contains private and shared pages and to use that same memfile as pages are converted back and forth. IO and even real physical DMA could be done on that memfile. Am I understanding correctly? If so, I think this makes sense, but I'm wondering if the actual memslot setup should be different. For TDX, private memory lives in a logically separate memslot space. For SEV and pKVM, it doesn't. I assume the API can reflect this straightforwardly. And the corresponding TDX question: is the intent still that shared pages aren't allowed at all in a TDX memfile? If so, that would be the most direct mapping to what the hardware actually does. --Andy
[PATCH v2 09/11] scsi-disk: allow MODE SELECT block descriptor to set the ROM device block size
Whilst CDROM drives usually have a 2048 byte sector size, older drives have the ability to switch between 2048 byte and 512 byte sector sizes by specifying a block descriptor in the MODE SELECT command. If a MODE SELECT block descriptor is provided, update the scsi-cd device block size with the provided value accordingly. This allows CDROMs to be used with A/UX whose driver only works with a 512 byte sector size. Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c | 7 +++ hw/scsi/trace-events | 1 + 2 files changed, 8 insertions(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 6991493cf4..41ebbe3045 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -1583,6 +1583,13 @@ static void scsi_disk_emulate_mode_select(SCSIDiskReq *r, uint8_t *inbuf) goto invalid_param; } +/* Allow changing the block size of ROM devices */ +if (s->qdev.type == TYPE_ROM && bd_len && +p[6] != (s->qdev.blocksize >> 8)) { +s->qdev.blocksize = p[6] << 8; +trace_scsi_disk_mode_select_rom_set_blocksize(s->qdev.blocksize); +} + len -= bd_len; p += bd_len; diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events index 25eae9f307..1a021ddae9 100644 --- a/hw/scsi/trace-events +++ b/hw/scsi/trace-events @@ -340,6 +340,7 @@ scsi_disk_dma_command_WRITE(const char *cmd, uint64_t lba, int len) "Write %s(se scsi_disk_new_request(uint32_t lun, uint32_t tag, const char *line) "Command: lun=%d tag=0x%x data=%s" scsi_disk_aio_sgio_command(uint32_t tag, uint8_t cmd, uint64_t lba, int len, uint32_t timeout) "disk aio sgio: tag=0x%x cmd=0x%x (sector %" PRId64 ", count %d) timeout=%u" scsi_disk_mode_select_page_truncated(int page, int len, int page_len) "page %d expected length %d but received length %d" +scsi_disk_mode_select_rom_set_blocksize(int blocksize) "set ROM block size to %d" # scsi-generic.c scsi_generic_command_complete_noio(void *req, uint32_t tag, int statuc) "Command complete %p tag=0x%x status=%d" -- 2.20.1
[PATCH v2 08/11] scsi-disk: allow the MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM drives
A/UX sends a MODE_PAGE_R_W_ERROR command with the AWRE bit set to 0 when enumerating CDROM drives. Since the bit is currently hardcoded to 1 then indicate that the AWRE bit can be changed (even though we don't care about the value) so that the MODE_PAGE_R_W_ERROR page can be set successfully. Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c | 4 1 file changed, 4 insertions(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index c657e4f5da..6991493cf4 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -1187,6 +1187,10 @@ static int mode_sense_page(SCSIDiskState *s, int page, uint8_t **p_outbuf, case MODE_PAGE_R_W_ERROR: length = 10; if (page_control == 1) { /* Changeable Values */ +if (s->qdev.type == TYPE_ROM) { +/* Automatic Write Reallocation Enabled */ +p[0] = 0x80; +} break; } p[0] = 0x80; /* Automatic Write Reallocation Enabled */ -- 2.20.1
Re: [PATCH v22 0/8] support dirty restraint on vCPU
Ping. Hi, David and Peter, how do you think this patchset? Is it suitable for queueing ? or is there still something need to be done ? Yong 在 2022/4/1 1:49, huang...@chinatelecom.cn 写道: From: Hyman Huang(黄勇) This is v22 of dirtylimit series. The following is the history of the patchset, since v22 kind of different from the original version, i made abstracts of changelog: RFC and v1: https://lore.kernel.org/qemu-devel/cover.1637214721.git.huang...@chinatelecom.cn/ v2: https://lore.kernel.org/qemu-devel/cover.1637256224.git.huang...@chinatelecom.cn/ v1->v2 changelog: - rename some function and variables. refactor the original algo of dirtylimit. Thanks for the comments given by Juan Quintela. v3: https://lore.kernel.org/qemu-devel/cover.1637403404.git.huang...@chinatelecom.cn/ v4: https://lore.kernel.org/qemu-devel/cover.1637653303.git.huang...@chinatelecom.cn/ v5: https://lore.kernel.org/qemu-devel/cover.1637759139.git.huang...@chinatelecom.cn/ v6: https://lore.kernel.org/qemu-devel/cover.1637856472.git.huang...@chinatelecom.cn/ v7: https://lore.kernel.org/qemu-devel/cover.1638202004.git.huang...@chinatelecom.cn/ v2->v7 changelog: - refactor the docs, annotation and fix bugs of the original algo of dirtylimit. Thanks for the review given by Markus Armbruster. v8: https://lore.kernel.org/qemu-devel/cover.1638463260.git.huang...@chinatelecom.cn/ v9: https://lore.kernel.org/qemu-devel/cover.1638495274.git.huang...@chinatelecom.cn/ v10: https://lore.kernel.org/qemu-devel/cover.1639479557.git.huang...@chinatelecom.cn/ v7->v10 changelog: - introduce a simpler but more efficient algo of dirtylimit inspired by Peter Xu. - keep polishing the annotation suggested by Markus Armbruster. v11: https://lore.kernel.org/qemu-devel/cover.1641315745.git.huang...@chinatelecom.cn/ v12: https://lore.kernel.org/qemu-devel/cover.1642774952.git.huang...@chinatelecom.cn/ v13: https://lore.kernel.org/qemu-devel/cover.1644506963.git.huang...@chinatelecom.cn/ v10->v13 changelog: - handle the hotplug/unplug scenario. - refactor the new algo, split the commit and make the code more clean. v14: https://lore.kernel.org/qemu-devel/cover.1644509582.git.huang...@chinatelecom.cn/ v13->v14 changelog: - sent by accident. v15: https://lore.kernel.org/qemu-devel/cover.1644976045.git.huang...@chinatelecom.cn/ v16: https://lore.kernel.org/qemu-devel/cover.1645067452.git.huang...@chinatelecom.cn/ v17: https://lore.kernel.org/qemu-devel/cover.1646243252.git.huang...@chinatelecom.cn/ v14->v17 changelog: - do some code clean and fix test bug reported by Dr. David Alan Gilbert. v18: https://lore.kernel.org/qemu-devel/cover.1646247968.git.huang...@chinatelecom.cn/ v19: https://lore.kernel.org/qemu-devel/cover.1647390160.git.huang...@chinatelecom.cn/ v20: https://lore.kernel.org/qemu-devel/cover.1647396907.git.huang...@chinatelecom.cn/ v21: https://lore.kernel.org/qemu-devel/cover.1647435820.git.huang...@chinatelecom.cn/ v17->v21 changelog: - add qtest, fix bug and do code clean. v21->v22 changelog: - move the vcpu dirty limit test into migration-test and do some modification suggested by Peter. Please review. Yong. Abstract This patchset introduce a mechanism to impose dirty restraint on vCPU, aiming to keep the vCPU running in a certain dirtyrate given by user. dirty restraint on vCPU maybe an alternative method to implement convergence logic for live migration, which could improve guest memory performance during migration compared with traditional method in theory. For the current live migration implementation, the convergence logic throttles all vCPUs of the VM, which has some side effects. -'read processes' on vCPU will be unnecessarily penalized - throttle increase percentage step by step, which seems struggling to find the optimal throttle percentage when dirtyrate is high. - hard to predict the remaining time of migration if the throttling percentage reachs 99% to a certain extent, the dirty restraint machnism can fix these effects by throttling at vCPU granularity during migration. the implementation is rather straightforward, we calculate vCPU dirtyrate via the Dirty Ring mechanism periodically as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation" does, for vCPU that be specified to impose dirty restraint, we throttle it periodically as the auto-converge does, once after throttling, we compare the quota dirtyrate with current dirtyrate, if current dirtyrate is not under the quota, increase the throttling percentage until current dirtyrate is under the quota. this patchset is the basis of implmenting a new auto-converge method for live migration, we introduce two qmp commands for impose/cancel the dirty restraint on specified vCPU, so it also can be an independent api to supply the upper app such as libvirt, which can use it to implement the convergence logic during live migration, supplemented with the qmp 'calc-dirty-rate' command or whatever. we post this
[PATCH v2 10/11] q800: add default vendor and product information for scsi-hd devices
The Apple HD SC Setup program uses a SCSI INQUIRY command to check that any SCSI hard disks detected match a whitelist of vendors and products before allowing the "Initialise" button to prepare an empty disk. Add known-good default vendor and product information using the existing compat_prop mechanism so the user doesn't have to use long command lines to set the qdev properties manually. Signed-off-by: Mark Cave-Ayland --- hw/m68k/q800.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c index f27ed01785..abb549f8d8 100644 --- a/hw/m68k/q800.c +++ b/hw/m68k/q800.c @@ -688,6 +688,9 @@ static void q800_init(MachineState *machine) static GlobalProperty hw_compat_q800[] = { { "scsi-hd", "quirk_mode_page_apple_vendor", "on"}, +{ "scsi-hd", "vendor", " SEAGATE" }, +{ "scsi-hd", "product", " ST225N" }, +{ "scsi-hd", "ver", "1.0 " }, { "scsi-cd", "quirk_mode_sense_rom_force_dbd", "on"}, }; static const size_t hw_compat_q800_len = G_N_ELEMENTS(hw_compat_q800); -- 2.20.1
[PATCH v2 04/11] q800: implement compat_props to enable quirk_mode_page_apple_vendor for scsi-hd devices
By default quirk_mode_page_apple_vendor should be enabled for all scsi-hd devices connected to the q800 machine to enable MacOS to detect and use them. Signed-off-by: Mark Cave-Ayland --- hw/m68k/q800.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c index 099a758c6f..42bf7bb4f0 100644 --- a/hw/m68k/q800.c +++ b/hw/m68k/q800.c @@ -686,6 +686,11 @@ static void q800_init(MachineState *machine) } } +static GlobalProperty hw_compat_q800[] = { +{ "scsi-hd", "quirk_mode_page_apple_vendor", "on"}, +}; +static const size_t hw_compat_q800_len = G_N_ELEMENTS(hw_compat_q800); + static void q800_machine_class_init(ObjectClass *oc, void *data) { MachineClass *mc = MACHINE_CLASS(oc); @@ -695,6 +700,7 @@ static void q800_machine_class_init(ObjectClass *oc, void *data) mc->max_cpus = 1; mc->block_default_type = IF_SCSI; mc->default_ram_id = "m68k_mac.ram"; +compat_props_add(mc->compat_props, hw_compat_q800, hw_compat_q800_len); } static const TypeInfo q800_machine_typeinfo = { -- 2.20.1
[PATCH v2 03/11] scsi-disk: add MODE_PAGE_APPLE_VENDOR quirk for Macintosh
One of the mechanisms MacOS uses to identify drives compatible with MacOS is to send a custom MODE SELECT command for page 0x30 to the drive. The response to this is a hard-coded manufacturer string which must match in order for the drive to be usable within MacOS. Add an implementation of the MODE SELECT page 0x30 response guarded by a newly defined SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR quirk bit so that drives attached to non-Apple machines function exactly as before. Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c | 17 + include/hw/scsi/scsi.h | 3 +++ include/scsi/constants.h | 1 + 3 files changed, 21 insertions(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index d89cdd4e4a..5de4506b97 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -1085,6 +1085,7 @@ static int mode_sense_page(SCSIDiskState *s, int page, uint8_t **p_outbuf, [MODE_PAGE_R_W_ERROR] = (1 << TYPE_DISK) | (1 << TYPE_ROM), [MODE_PAGE_AUDIO_CTL] = (1 << TYPE_ROM), [MODE_PAGE_CAPABILITIES] = (1 << TYPE_ROM), +[MODE_PAGE_APPLE_VENDOR] = (1 << TYPE_ROM), }; uint8_t *p = *p_outbuf + 2; @@ -1229,6 +1230,20 @@ static int mode_sense_page(SCSIDiskState *s, int page, uint8_t **p_outbuf, p[19] = (16 * 176) & 0xff; break; + case MODE_PAGE_APPLE_VENDOR: +if (s->quirks & (1 << SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR)) { +length = 0x24; +if (page_control == 1) { /* Changeable Values */ +break; +} + +memset(p, 0, length); +strcpy((char *)p + 8, "APPLE COMPUTER, INC "); +break; +} else { +return -1; +} + default: return -1; } @@ -3042,6 +3057,8 @@ static Property scsi_hd_properties[] = { DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0), DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version, 5), +DEFINE_PROP_BIT("quirk_mode_page_apple_vendor", SCSIDiskState, quirks, +SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR, 0), DEFINE_BLOCK_CHS_PROPERTIES(SCSIDiskState, qdev.conf), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 1ffb367f94..975d462347 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -226,4 +226,7 @@ SCSIDevice *scsi_device_get(SCSIBus *bus, int channel, int target, int lun); /* scsi-generic.c. */ extern const SCSIReqOps scsi_generic_req_ops; +/* scsi-disk.c */ +#define SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR 0 + #endif diff --git a/include/scsi/constants.h b/include/scsi/constants.h index 2a32c08b5e..891aa0f45c 100644 --- a/include/scsi/constants.h +++ b/include/scsi/constants.h @@ -234,6 +234,7 @@ #define MODE_PAGE_FAULT_FAIL 0x1c #define MODE_PAGE_TO_PROTECT 0x1d #define MODE_PAGE_CAPABILITIES0x2a +#define MODE_PAGE_APPLE_VENDOR0x30 #define MODE_PAGE_ALLS0x3f /* Not in Mt. Fuji, but in ATAPI 2.6 -- deprecated now in favor * of MODE_PAGE_SENSE_POWER */ -- 2.20.1
[PATCH v2 05/11] scsi-disk: add SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk for Macintosh
During SCSI bus enumeration A/UX sends a MODE SENSE command to the CDROM and expects the response to include a block descriptor. As per the latest SCSI documentation, QEMU currently force-disables the block descriptor for CDROM devices but the A/UX driver expects the block descriptor to always be returned. If the block descriptor is not returned in the response then A/UX becomes confused, since the block descriptor returned in the MODE SENSE response is used to generate a subsequent MODE SELECT command which is then invalid. Add a new SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD to allow this behaviour to be enabled as required. Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c| 18 +- include/hw/scsi/scsi.h | 1 + 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 5de4506b97..71fdf132c1 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -1279,10 +1279,17 @@ static int scsi_disk_emulate_mode_sense(SCSIDiskReq *r, uint8_t *outbuf) dev_specific_param |= 0x80; /* Readonly. */ } } else { -/* MMC prescribes that CD/DVD drives have no block descriptors, - * and defines no device-specific parameter. */ -dev_specific_param = 0x00; -dbd = true; +if (s->quirks & (1 << SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD)) { +dev_specific_param = 0x00; +dbd = false; +} else { +/* + * MMC prescribes that CD/DVD drives have no block descriptors, + * and defines no device-specific parameter. + */ +dev_specific_param = 0x00; +dbd = true; +} } if (r->req.cmd.buf[0] == MODE_SENSE) { @@ -1578,7 +1585,6 @@ static void scsi_disk_emulate_mode_select(SCSIDiskReq *r, uint8_t *inbuf) /* Ensure no change is made if there is an error! */ for (pass = 0; pass < 2; pass++) { if (mode_select_pages(r, p, len, pass == 1) < 0) { -assert(pass == 0); return; } } @@ -3107,6 +3113,8 @@ static Property scsi_cd_properties[] = { DEFAULT_MAX_IO_SIZE), DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version, 5), +DEFINE_PROP_BIT("quirk_mode_sense_rom_force_dbd", SCSIDiskState, quirks, +SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index 975d462347..a9e657e03c 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -228,5 +228,6 @@ extern const SCSIReqOps scsi_generic_req_ops; /* scsi-disk.c */ #define SCSI_DISK_QUIRK_MODE_PAGE_APPLE_VENDOR 0 +#define SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD 1 #endif -- 2.20.1
[PATCH v2 06/11] q800: implement compat_props to enable quirk_mode_sense_rom_force_dbd for scsi-cd devices
By default quirk_mode_sense_rom_force_dbd should be enabled for all scsi-cd devices connected to the q800 machine to correctly report the CDROM block descriptor back to A/UX. Signed-off-by: Mark Cave-Ayland --- hw/m68k/q800.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c index 42bf7bb4f0..f27ed01785 100644 --- a/hw/m68k/q800.c +++ b/hw/m68k/q800.c @@ -688,6 +688,7 @@ static void q800_init(MachineState *machine) static GlobalProperty hw_compat_q800[] = { { "scsi-hd", "quirk_mode_page_apple_vendor", "on"}, +{ "scsi-cd", "quirk_mode_sense_rom_force_dbd", "on"}, }; static const size_t hw_compat_q800_len = G_N_ELEMENTS(hw_compat_q800); -- 2.20.1
[PATCH v2 07/11] scsi-disk: allow truncated MODE SELECT requests
When A/UX configures the CDROM device it sends a truncated MODE SELECT request for page 1 (MODE_PAGE_R_W_ERROR) which is only 6 bytes in length rather than 10. This seems to be due to bug in Apple's code which calculates the CDB message length incorrectly. According to [1] this truncated request is accepted on real hardware whereas in QEMU it generates an INVALID_PARAM_LEN sense code which causes A/UX to get stuck in a loop retrying the command in an attempt to succeed. Alter the mode page request length check so that truncated requests are allowed as per real hardware, adding a trace event to enable the condition to be detected. [1] https://68kmla.org/bb/index.php?threads/scsi2sd-project-anyone-interested.29040/page-7#post-316444 Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c | 2 +- hw/scsi/trace-events | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 71fdf132c1..c657e4f5da 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -1525,7 +1525,7 @@ static int mode_select_pages(SCSIDiskReq *r, uint8_t *p, int len, bool change) goto invalid_param; } if (page_len > len) { -goto invalid_param_len; +trace_scsi_disk_mode_select_page_truncated(page, page_len, len); } if (!change) { diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events index e91b55a961..25eae9f307 100644 --- a/hw/scsi/trace-events +++ b/hw/scsi/trace-events @@ -339,6 +339,7 @@ scsi_disk_dma_command_READ(uint64_t lba, uint32_t len) "Read (sector %" PRId64 " scsi_disk_dma_command_WRITE(const char *cmd, uint64_t lba, int len) "Write %s(sector %" PRId64 ", count %u)" scsi_disk_new_request(uint32_t lun, uint32_t tag, const char *line) "Command: lun=%d tag=0x%x data=%s" scsi_disk_aio_sgio_command(uint32_t tag, uint8_t cmd, uint64_t lba, int len, uint32_t timeout) "disk aio sgio: tag=0x%x cmd=0x%x (sector %" PRId64 ", count %d) timeout=%u" +scsi_disk_mode_select_page_truncated(int page, int len, int page_len) "page %d expected length %d but received length %d" # scsi-generic.c scsi_generic_command_complete_noio(void *req, uint32_t tag, int statuc) "Command complete %p tag=0x%x status=%d" -- 2.20.1
[PATCH v2 01/11] scsi-disk: add FORMAT UNIT command
When initialising a drive ready to install MacOS, Apple HD SC Setup first attempts to format the drive. Add a simple FORMAT UNIT command which simply returns success to allow the format to succeed. Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c | 4 hw/scsi/trace-events | 1 + 2 files changed, 5 insertions(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 072686ed58..090679f3b5 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -2127,6 +2127,9 @@ static int32_t scsi_disk_emulate_command(SCSIRequest *req, uint8_t *buf) trace_scsi_disk_emulate_command_WRITE_SAME( req->cmd.buf[0] == WRITE_SAME_10 ? 10 : 16, r->req.cmd.xfer); break; +case FORMAT_UNIT: +trace_scsi_disk_emulate_command_FORMAT_UNIT(r->req.cmd.xfer); +break; default: trace_scsi_disk_emulate_command_UNKNOWN(buf[0], scsi_command_name(buf[0])); @@ -2533,6 +2536,7 @@ static const SCSIReqOps *const scsi_disk_reqops_dispatch[256] = { [VERIFY_10] = _disk_emulate_reqops, [VERIFY_12] = _disk_emulate_reqops, [VERIFY_16] = _disk_emulate_reqops, +[FORMAT_UNIT] = _disk_emulate_reqops, [READ_6] = _disk_dma_reqops, [READ_10] = _disk_dma_reqops, diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events index 20fb0dc162..e91b55a961 100644 --- a/hw/scsi/trace-events +++ b/hw/scsi/trace-events @@ -334,6 +334,7 @@ scsi_disk_emulate_command_UNMAP(size_t xfer) "Unmap (len %zd)" scsi_disk_emulate_command_VERIFY(int bytchk) "Verify (bytchk %d)" scsi_disk_emulate_command_WRITE_SAME(int cmd, size_t xfer) "WRITE SAME %d (len %zd)" scsi_disk_emulate_command_UNKNOWN(int cmd, const char *name) "Unknown SCSI command (0x%2.2x=%s)" +scsi_disk_emulate_command_FORMAT_UNIT(size_t xfer) "Format Unit (len %zd)" scsi_disk_dma_command_READ(uint64_t lba, uint32_t len) "Read (sector %" PRId64 ", count %u)" scsi_disk_dma_command_WRITE(const char *cmd, uint64_t lba, int len) "Write %s(sector %" PRId64 ", count %u)" scsi_disk_new_request(uint32_t lun, uint32_t tag, const char *line) "Command: lun=%d tag=0x%x data=%s" -- 2.20.1
[PATCH v2 00/11] scsi: add quirks and features to support m68k Macs
Here are the next set of patches from my ongoing work to allow the q800 machine to boot MacOS related to SCSI devices. The first patch implements a dummy FORMAT UNIT command which is used by the Apple HD SC Setup program when preparing an empty disk to install MacOS. Patch 2 adds a new quirks bitmap to SCSIDiskState to allow buggy and/or legacy features to enabled on an individual device basis. Once the quirks bitmap has been added, patch 3 uses the quirks feature to implement an Apple-specific mode page which is required to allow the disk to be recognised and used by Apple HD SC Setup. Patch 4 adds compat_props to the q800 machine which enable the new MODE_PAGE_APPLE_VENDOR quirk for all scsi-hd devices attached to the machine. Patch 5 adds a new quirk to force SCSI CDROMs to always return the block descriptor for a MODE SENSE command which is expected by A/UX, whilst patch 6 enables the quirk for all scsi-cd devices on the q800 machine. Patch 7 adds support for truncated MODE SELECT requests which are sent by A/UX (and also MacOS in some circumstances) when enumerating a SCSI CDROM device which are shown to be accepted on real hardware as documented in [1]. Patch 8 allows the MODE_PAGE_R_W_ERROR AWRE bit to be changeable since the A/UX MODE SELECT request sets this bit to 0 rather than the QEMU default which is 1. Patch 9 adds support for setting the CDROM block size via a MODE SELECT request which is supported by older CDROMs to allow the block size to be changed from the default of 2048 bytes to 512 bytes for compatibility purposes. This is used by A/UX which otherwise fails with SCSI errors if the block size is not set to 512 bytes when accessing CDROMs. Finally patches 10 and 11 augment the compat_props to set the default vendor, product and version information for all scsi-hd and scsi-cd devices attached to the q800 machine, taken from real drives. This is because MacOS will only allow a known set of SCSI devices to be recognised during the installation process. Signed-off-by: Mark Cave-Ayland [1] https://68kmla.org/bb/index.php?threads/scsi2sd-project-anyone-interested.29040/page-7#post-316444 v2: - Change patchset title from "scsi: add support for FORMAT UNIT command and quirks" to "scsi: add quirks and features to support m68k Macs" - Fix missing shift in patch 2 as pointed out by Fam - Rename MODE_PAGE_APPLE to MODE_PAGE_APPLE_VENDOR - Add SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk - Add support for truncated MODE SELECT requests - Allow MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM devices - Allow the MODE SELECT block descriptor to set the CDROM block size Mark Cave-Ayland (11): scsi-disk: add FORMAT UNIT command scsi-disk: add new quirks bitmap to SCSIDiskState scsi-disk: add MODE_PAGE_APPLE_VENDOR quirk for Macintosh q800: implement compat_props to enable quirk_mode_page_apple_vendor for scsi-hd devices scsi-disk: add SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk for Macintosh q800: implement compat_props to enable quirk_mode_sense_rom_force_dbd for scsi-cd devices scsi-disk: allow truncated MODE SELECT requests scsi-disk: allow the MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM drives scsi-disk: allow MODE SELECT block descriptor to set the ROM device block size q800: add default vendor and product information for scsi-hd devices q800: add default vendor and product information for scsi-cd devices hw/m68k/q800.c | 13 ++ hw/scsi/scsi-disk.c | 53 +++- hw/scsi/trace-events | 3 +++ include/hw/scsi/scsi.h | 4 +++ include/scsi/constants.h | 1 + 5 files changed, 68 insertions(+), 6 deletions(-) -- 2.20.1
[PATCH v2 02/11] scsi-disk: add new quirks bitmap to SCSIDiskState
Since the MacOS SCSI implementation is quite old (and Apple added some firmware customisations to their drives for m68k Macs) there is need to add a mechanism to correctly handle Apple-specific quirks. Add a new quirks bitmap to SCSIDiskState that can be used to enable these features as required. Signed-off-by: Mark Cave-Ayland --- hw/scsi/scsi-disk.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index 090679f3b5..d89cdd4e4a 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -94,6 +94,7 @@ struct SCSIDiskState { uint16_t port_index; uint64_t max_unmap_size; uint64_t max_io_size; +uint32_t quirks; QEMUBH *bh; char *version; char *serial; -- 2.20.1
Re: [PATCH 0/6] scsi: add support for FORMAT UNIT command and quirks
On 21/04/2022 07:51, Mark Cave-Ayland wrote: Here are the next set of patches from my ongoing work to allow the q800 machine to boot MacOS related to SCSI devices. The first patch implements a dummy FORMAT UNIT command which is used by the Apple HD SC Setup program when preparing an empty disk to install MacOS. Patches 2 adds a new quirks bitmap to SCSIDiskState to allow buggy and/or legacy features to enabled on an individual device basis. Once the quirks bitmap has been added, patch 3 uses the quirks feature to implement an Apple-specific mode page which is required to allow the disk to be recognised and used by Apple HD SC Setup. Patch 4 adds compat_props to the q800 machine which enable the MODE_PAGE_APPLE quirk for all scsi-hd devices attached to the machine. Finally patches 5 and 6 augment the compat_props to set the default vendor, product and version information for all scsi-hd and scsi-cd devices attached to the q800 machine, taken from real drives. This is because MacOS will only allow a known set of SCSI devices to be recognised during the installation process. Signed-off-by: Mark Cave-Ayland Mark Cave-Ayland (6): scsi-disk: add FORMAT UNIT command scsi-disk: add new quirks bitmap to SCSIDiskState scsi-disk: add MODE_PAGE_APPLE quirk for Macintosh q800: implement compat_props to enable quirk_mode_page_apple for scsi-hd devices q800: add default vendor, product and version information for scsi-hd devices q800: add default vendor, product and version information for scsi-cd devices hw/m68k/q800.c | 12 hw/scsi/scsi-disk.c | 24 hw/scsi/trace-events | 1 + include/hw/scsi/scsi.h | 3 +++ include/scsi/constants.h | 1 + 5 files changed, 41 insertions(+) I was fortunate enough to find a really good reference to some work done over on 68mla.org reverse engineering Apple's HD SC Setup and SCSI device detection. This pointed me towards a couple of additional SCSI changes for QEMU that also fix CDROM access under A/UX which I shall include in an updated v2. ATB, Mark.
Re: [PATCH 3/6] scsi-disk: add MODE_PAGE_APPLE quirk for Macintosh
On 21/04/2022 23:00, BALATON Zoltan wrote: On Thu, 21 Apr 2022, Richard Henderson wrote: On 4/21/22 08:29, Mark Cave-Ayland wrote: You need (1 << SCSI_DISK_QUIRK_MODE_PAGE_APPLE) instead. Doh, you're absolutely right. I believe the current recommendation is to use the BIT() macro in these cases. I think it's not a recommendation (as in code style) but it often makes things simpler by reducing the number of parenthesis so using it is probably a good idea for readability. But if you never need the bit number only the value then you could define the quirks constants as that in the first place. (Otherwise if you want bit numbers maybe make it an enum.) We probably need to fix BIT() to use 1ULL. At present it's using 1UL, to match the other (unfortunate) uses of unsigned long within bitops.h. The use of BIT() for things unrelated to bitops.h just bit a recent risc-v pull request, in that it failed to build on all 32-bit hosts. There's already a BIT_ULL(nr) when ULL is needed but in this case quirks was declared uint32_t so probably OK with UL as well. (Was this bitops.h taken from Linux? Keeping it compatible then may be a good idea to avoid confusion.) It seems there is still a bit of discussion around using BIT() here, so for v2 I'll add the shift directly with (1 << x). Then if the BIT() macro becomes suitable for more general use it can easily be updated as a separate patch later. ATB, Mark.
Re: [PATCH v2 17/34] configure: move Windows flags detection to meson
On Sat, Apr 23, 2022 at 5:09 PM Paolo Bonzini wrote: > Signed-off-by: Paolo Bonzini > Reviewed-by: Marc-André Lureau > --- > v1->v2: fix get_option('optimization') comparison to use a string > > configure | 20 > meson.build | 8 > 2 files changed, 8 insertions(+), 20 deletions(-) > > diff --git a/configure b/configure > index 0b236fda59..a6ba59cf6f 100755 > --- a/configure > +++ b/configure > @@ -224,10 +224,6 @@ glob() { > eval test -z '"${1#'"$2"'}"' > } > > -ld_has() { > -$ld --help 2>/dev/null | grep ".$1" >/dev/null 2>&1 > -} > - > if printf %s\\n "$source_path" "$PWD" | grep -q "[[:space:]:]"; > then >error_exit "main directory cannot contain spaces nor colons" > @@ -2088,22 +2084,6 @@ if test "$solaris" = "no" && test "$tsan" = "no"; > then > fi > fi > > -# Use ASLR, no-SEH and DEP if available > -if test "$mingw32" = "yes" ; then > -flags="--no-seh --nxcompat" > - > -# Disable ASLR for debug builds to allow debugging with gdb > -if test "$debug" = "no" ; then > -flags="--dynamicbase $flags" > -fi > - > -for flag in $flags; do > -if ld_has $flag ; then > -QEMU_LDFLAGS="-Wl,$flag $QEMU_LDFLAGS" > -fi > -done > -fi > - > # Guest agent Windows MSI package > > if test "$QEMU_GA_MANUFACTURER" = ""; then > diff --git a/meson.build b/meson.build > index 1a9549d90c..d569c6e944 100644 > --- a/meson.build > +++ b/meson.build > @@ -182,6 +182,14 @@ qemu_cxxflags = config_host['QEMU_CXXFLAGS'].split() > qemu_objcflags = config_host['QEMU_OBJCFLAGS'].split() > qemu_ldflags = config_host['QEMU_LDFLAGS'].split() > > +if targetos == 'windows' > + qemu_ldflags += cc.get_supported_link_arguments('-Wl,--no-seh', > '-Wl,--nxcompat') > + # Disable ASLR for debug builds to allow debugging with gdb > + if get_option('optimization') == '0' > +qemu_ldflags += cc.get_supported_link_arguments('-Wl,--dynamicbase') > + endif > +endif > + > if get_option('gprof') >qemu_cflags += ['-p'] >qemu_cxxflags += ['-p'] > -- > 2.35.1 > > > > -- Marc-André Lureau
Re: [PATCH v2] error-report: fix g_date_time_format assertion
On Sun, Apr 24, 2022 at 3:27 PM Haiyue Wang wrote: > The 'g_get_real_time' returns the number of microseconds since January > 1, 1970 UTC, but 'g_date_time_new_from_unix_utc' needs the number of > seconds, so it will cause the invalid time input: > > (process:279642): GLib-CRITICAL (recursed) **: g_date_time_format: > assertion 'datetime != NULL' failed > > Call function 'g_date_time_new_now_utc' instead, it has the same result > as 'g_date_time_new_from_unix_utc(g_get_real_time() / G_USEC_PER_SEC)'; > > Fixes: 73dab893b569 ("error-report: replace deprecated > g_get_current_time() with glib >= 2.62") > Signed-off-by: Haiyue Wang > Thanks, my bad Reviewed-by: Marc-André Lureau > --- > v2: use 'g_date_time_new_now_utc' directly, which handles the time > zone reference correctly. > --- > util/error-report.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/util/error-report.c b/util/error-report.c > index dbadaf206d..5edb2e6040 100644 > --- a/util/error-report.c > +++ b/util/error-report.c > @@ -173,7 +173,7 @@ static char * > real_time_iso8601(void) > { > #if GLIB_CHECK_VERSION(2,62,0) > -g_autoptr(GDateTime) dt = > g_date_time_new_from_unix_utc(g_get_real_time()); > +g_autoptr(GDateTime) dt = g_date_time_new_now_utc(); > /* ignore deprecation warning, since GLIB_VERSION_MAX_ALLOWED is 2.56 > */ > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wdeprecated-declarations" > -- > 2.36.0 > > > -- Marc-André Lureau
Re: Possible bug when setting aarch64 watchpoints
Sorry, I need to correct my previous post: If I set DBGWVR0_EL1 = 1<<23 // ie. 0x0080 and DBGWCR0_EL1 = 0x17<<24 | 0xFF<<5 | 0b11<<3 | 0b11<<1 | 0b1<<0 // ie. MASK = 23 = 0b10111 and then access memory [0x0080007F] I get a watchpoint exception. (ie. watchpoints ARE working/enabled) But if I access [0x00800080] I *don’t* get an exception. **If the MASK field gets set to 0b0111 instead of 0b10111 then only the bottom 7 bits of the address get masked (instead of 23) and the masked address isn’t 0x0080, and the exception won’t be triggered.** (if I *attempt* to set the MASK to 0b1, but it actually gets set to 0b0, then I get the behaviour quoted below). > On 24. Apr 2022, at 13:40, Chris Howard wrote: > > Hi, I’m new to qemu (and even bug-reporting) so apologies in advance… > > The MASK field in DBGWCRx_EL1 is **5** bits wide [28:24]. > > In target/arm/kvm64.c I found the line: > > wp.wcr = deposit32(wp.wcr, 24, 4, bits); // ie **4** bits > instead of **5** > > > If it’s not copying (or calculating?) the number of bits correctly this would > explain the behaviour I’m seeing: > > If I set > > DBGWVR0_EL1 = 0x0080 > > and > > DBGWCR0_EL1 = 0x1F<<24 | 0xFF<<5 | 0b11<<3 | 0b11<<1 | 0b1<<0 > > and then access memory [0x00807FFF] I get a watchpoint exception. (ie. > watchpoints ARE working/enabled) > > But if I access [0x00808] I *don’t* get an exception. > > **If the MASK field gets set to 0b instead of 0b1 then only the > bottom 15 bits of the address get masked (instead of 31) and the masked > address isn’t 0x0080, and the exception won’t be triggered.** > > > Unfortunately, changing the 4 to a 5 and recompiling had no effect :-( > > I may well have misunderstood something. :-/ > > —Chris
Possible bug when setting aarch64 watchpoints
Hi, I’m new to qemu (and even bug-reporting) so apologies in advance… The MASK field in DBGWCRx_EL1 is **5** bits wide [28:24]. In target/arm/kvm64.c I found the line: wp.wcr = deposit32(wp.wcr, 24, 4, bits); // ie **4** bits instead of **5** If it’s not copying (or calculating?) the number of bits correctly this would explain the behaviour I’m seeing: If I set DBGWVR0_EL1 = 0x0080 and DBGWCR0_EL1 = 0x1F<<24 | 0xFF<<5 | 0b11<<3 | 0b11<<1 | 0b1<<0 and then access memory [0x00807FFF] I get a watchpoint exception. (ie. watchpoints ARE working/enabled) But if I access [0x00808] I *don’t* get an exception. **If the MASK field gets set to 0b instead of 0b1 then only the bottom 15 bits of the address get masked (instead of 31) and the masked address isn’t 0x0080, and the exception won’t be triggered.** Unfortunately, changing the 4 to a 5 and recompiling had no effect :-( I may well have misunderstood something. :-/ —Chris
[PATCH v2] error-report: fix g_date_time_format assertion
The 'g_get_real_time' returns the number of microseconds since January 1, 1970 UTC, but 'g_date_time_new_from_unix_utc' needs the number of seconds, so it will cause the invalid time input: (process:279642): GLib-CRITICAL (recursed) **: g_date_time_format: assertion 'datetime != NULL' failed Call function 'g_date_time_new_now_utc' instead, it has the same result as 'g_date_time_new_from_unix_utc(g_get_real_time() / G_USEC_PER_SEC)'; Fixes: 73dab893b569 ("error-report: replace deprecated g_get_current_time() with glib >= 2.62") Signed-off-by: Haiyue Wang --- v2: use 'g_date_time_new_now_utc' directly, which handles the time zone reference correctly. --- util/error-report.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/error-report.c b/util/error-report.c index dbadaf206d..5edb2e6040 100644 --- a/util/error-report.c +++ b/util/error-report.c @@ -173,7 +173,7 @@ static char * real_time_iso8601(void) { #if GLIB_CHECK_VERSION(2,62,0) -g_autoptr(GDateTime) dt = g_date_time_new_from_unix_utc(g_get_real_time()); +g_autoptr(GDateTime) dt = g_date_time_new_now_utc(); /* ignore deprecation warning, since GLIB_VERSION_MAX_ALLOWED is 2.56 */ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wdeprecated-declarations" -- 2.36.0
[PATCH v1] error-report: fix g_date_time_format assertion
The 'g_get_real_time' returns the number of microseconds since January 1, 1970 UTC, but 'g_date_time_new_from_unix_utc' needs the number of seconds, so it will cause the invalid time input: (process:279642): GLib-CRITICAL (recursed) **: g_date_time_format: assertion 'datetime != NULL' failed Call 'g_date_time_new_now' with UTC time zone, it has the same result as 'g_date_time_new_from_unix_utc(g_get_real_time()/1e6)'; Fixes: 73dab893b569 ("error-report: replace deprecated g_get_current_time() with glib >= 2.62") Signed-off-by: Haiyue Wang --- util/error-report.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/util/error-report.c b/util/error-report.c index dbadaf206d..4000fff14a 100644 --- a/util/error-report.c +++ b/util/error-report.c @@ -173,7 +173,7 @@ static char * real_time_iso8601(void) { #if GLIB_CHECK_VERSION(2,62,0) -g_autoptr(GDateTime) dt = g_date_time_new_from_unix_utc(g_get_real_time()); +g_autoptr(GDateTime) dt = g_date_time_new_now(g_time_zone_new_utc()); /* ignore deprecation warning, since GLIB_VERSION_MAX_ALLOWED is 2.56 */ #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wdeprecated-declarations" -- 2.36.0
Re: [PATCH v5 01/13] mm/memfd: Introduce MFD_INACCESSIBLE flag
On Fri, Apr 22, 2022 at 10:43:50PM -0700, Vishal Annapurve wrote: > On Thu, Mar 10, 2022 at 6:09 AM Chao Peng wrote: > > > > From: "Kirill A. Shutemov" > > > > Introduce a new memfd_create() flag indicating the content of the > > created memfd is inaccessible from userspace through ordinary MMU > > access (e.g., read/write/mmap). However, the file content can be > > accessed via a different mechanism (e.g. KVM MMU) indirectly. > > > > It provides semantics required for KVM guest private memory support > > that a file descriptor with this flag set is going to be used as the > > source of guest memory in confidential computing environments such > > as Intel TDX/AMD SEV but may not be accessible from host userspace. > > > > Since page migration/swapping is not yet supported for such usages > > so these pages are currently marked as UNMOVABLE and UNEVICTABLE > > which makes them behave like long-term pinned pages. > > > > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is > > also impossible for a memfd created with this flag. > > > > At this time only shmem implements this flag. > > > > Signed-off-by: Kirill A. Shutemov > > Signed-off-by: Chao Peng > > --- > > include/linux/shmem_fs.h | 7 + > > include/uapi/linux/memfd.h | 1 + > > mm/memfd.c | 26 +++-- > > mm/shmem.c | 57 ++ > > 4 files changed, 88 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h > > index e65b80ed09e7..2dde843f28ef 100644 > > --- a/include/linux/shmem_fs.h > > +++ b/include/linux/shmem_fs.h > > @@ -12,6 +12,9 @@ > > > > /* inode in-kernel data */ > > > > +/* shmem extended flags */ > > +#define SHM_F_INACCESSIBLE 0x0001 /* prevent ordinary MMU access > > (e.g. read/write/mmap) to file content */ > > + > > struct shmem_inode_info { > > spinlock_t lock; > > unsigned intseals; /* shmem seals */ > > @@ -24,6 +27,7 @@ struct shmem_inode_info { > > struct shared_policypolicy; /* NUMA memory alloc policy > > */ > > struct simple_xattrsxattrs; /* list of xattrs */ > > atomic_tstop_eviction; /* hold when working on > > inode */ > > + unsigned intxflags; /* shmem extended flags */ > > struct inodevfs_inode; > > }; > > > > @@ -61,6 +65,9 @@ extern struct file *shmem_file_setup(const char *name, > > loff_t size, unsigned long flags); > > extern struct file *shmem_kernel_file_setup(const char *name, loff_t size, > > unsigned long flags); > > +extern struct file *shmem_file_setup_xflags(const char *name, loff_t size, > > + unsigned long flags, > > + unsigned int xflags); > > extern struct file *shmem_file_setup_with_mnt(struct vfsmount *mnt, > > const char *name, loff_t size, unsigned long flags); > > extern int shmem_zero_setup(struct vm_area_struct *); > > diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h > > index 7a8a26751c23..48750474b904 100644 > > --- a/include/uapi/linux/memfd.h > > +++ b/include/uapi/linux/memfd.h > > @@ -8,6 +8,7 @@ > > #define MFD_CLOEXEC0x0001U > > #define MFD_ALLOW_SEALING 0x0002U > > #define MFD_HUGETLB0x0004U > > +#define MFD_INACCESSIBLE 0x0008U > > > > /* > > * Huge page size encoding when MFD_HUGETLB is specified, and a huge page > > diff --git a/mm/memfd.c b/mm/memfd.c > > index 9f80f162791a..74d45a26cf5d 100644 > > --- a/mm/memfd.c > > +++ b/mm/memfd.c > > @@ -245,16 +245,20 @@ long memfd_fcntl(struct file *file, unsigned int cmd, > > unsigned long arg) > > #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1) > > #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN) > > > > -#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB) > > +#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \ > > + MFD_INACCESSIBLE) > > > > SYSCALL_DEFINE2(memfd_create, > > const char __user *, uname, > > unsigned int, flags) > > { > > + struct address_space *mapping; > > unsigned int *file_seals; > > + unsigned int xflags; > > struct file *file; > > int fd, error; > > char *name; > > + gfp_t gfp; > > long len; > > > > if (!(flags & MFD_HUGETLB)) { > > @@ -267,6 +271,10 @@ SYSCALL_DEFINE2(memfd_create, > > return -EINVAL; > > } > > > > + /* Disallow sealing when MFD_INACCESSIBLE is set. */ > > + if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING) > > + return -EINVAL; > > + > > /* length includes terminating zero */ > >
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
On Fri, Apr 22, 2022 at 01:06:25PM +0200, Paolo Bonzini wrote: > On 4/22/22 12:56, Chao Peng wrote: > > /* memfile notifier flags */ > > #define MFN_F_USER_INACCESSIBLE 0x0001 /* memory allocated in > > the file is inaccessible from userspace (e.g. read/write/mmap) */ > > #define MFN_F_UNMOVABLE 0x0002 /* memory allocated in > > the file is unmovable */ > > #define MFN_F_UNRECLAIMABLE 0x0003 /* memory allocated in > > the file is unreclaimable (e.g. via kswapd or any other pathes) */ > > You probably mean BIT(0/1/2) here. Right, it's BIT(n), Thanks. Chao > > Paolo > > > When memfile_notifier is being registered, memfile_register_notifier > > will > > need check these flags. E.g. for MFN_F_USER_INACCESSIBLE, it fails when > > previous mmap-ed mapping exists on the fd (I'm still unclear on how to > > do > > this). When multiple consumers are supported it also need check all > > registered consumers to see if any conflict (e.g. all consumers should > > have > > MFN_F_USER_INACCESSIBLE set). Only when the register succeeds, the fd > > is > > converted into a private fd, before that, the fd is just a normal > > (shared) > > one. During this conversion, the previous data is preserved so you can > > put > > some initial data in guest pages (whether the architecture allows this > > is > > architecture-specific and out of the scope of this patch).
Re: [PATCH v2 1/1] hw/i386/amd_iommu: Fix IOMMU event log encoding errors
On Fri, Apr 22, 2022 at 1:52 PM Wei Huang wrote: > > Coverity issues several UNINIT warnings against amd_iommu.c [1]. This > patch fixes them by clearing evt before encoding. On top of it, this > patch changes the event log size to 16 bytes per IOMMU specification, > and fixes the event log entry format in amdvi_encode_event(). > > [1] CID 1487116/1487200/1487190/1487232/1487115/1487258 > > Reported-by: Peter Maydell > Signed-off-by: Wei Huang > --- Acked-by: Jason Wang > hw/i386/amd_iommu.c | 24 ++-- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c > index ea8eaeb330b6..725f69095b9e 100644 > --- a/hw/i386/amd_iommu.c > +++ b/hw/i386/amd_iommu.c > @@ -201,15 +201,18 @@ static void amdvi_setevent_bits(uint64_t *buffer, > uint64_t value, int start, > /* > * AMDVi event structure > *0:15 -> DeviceID > - *55:63 -> event type + miscellaneous info > - *63:127 -> related address > + *48:63 -> event type + miscellaneous info > + *64:127 -> related address > */ > static void amdvi_encode_event(uint64_t *evt, uint16_t devid, uint64_t addr, > uint16_t info) > { > +evt[0] = 0; > +evt[1] = 0; > + > amdvi_setevent_bits(evt, devid, 0, 16); > -amdvi_setevent_bits(evt, info, 55, 8); > -amdvi_setevent_bits(evt, addr, 63, 64); > +amdvi_setevent_bits(evt, info, 48, 16); > +amdvi_setevent_bits(evt, addr, 64, 64); > } > /* log an error encountered during a page walk > * > @@ -218,7 +221,7 @@ static void amdvi_encode_event(uint64_t *evt, uint16_t > devid, uint64_t addr, > static void amdvi_page_fault(AMDVIState *s, uint16_t devid, > hwaddr addr, uint16_t info) > { > -uint64_t evt[4]; > +uint64_t evt[2]; > > info |= AMDVI_EVENT_IOPF_I | AMDVI_EVENT_IOPF; > amdvi_encode_event(evt, devid, addr, info); > @@ -234,7 +237,7 @@ static void amdvi_page_fault(AMDVIState *s, uint16_t > devid, > static void amdvi_log_devtab_error(AMDVIState *s, uint16_t devid, > hwaddr devtab, uint16_t info) > { > -uint64_t evt[4]; > +uint64_t evt[2]; > > info |= AMDVI_EVENT_DEV_TAB_HW_ERROR; > > @@ -248,7 +251,8 @@ static void amdvi_log_devtab_error(AMDVIState *s, > uint16_t devid, > */ > static void amdvi_log_command_error(AMDVIState *s, hwaddr addr) > { > -uint64_t evt[4], info = AMDVI_EVENT_COMMAND_HW_ERROR; > +uint64_t evt[2]; > +uint16_t info = AMDVI_EVENT_COMMAND_HW_ERROR; > > amdvi_encode_event(evt, 0, addr, info); > amdvi_log_event(s, evt); > @@ -261,7 +265,7 @@ static void amdvi_log_command_error(AMDVIState *s, hwaddr > addr) > static void amdvi_log_illegalcom_error(AMDVIState *s, uint16_t info, > hwaddr addr) > { > -uint64_t evt[4]; > +uint64_t evt[2]; > > info |= AMDVI_EVENT_ILLEGAL_COMMAND_ERROR; > amdvi_encode_event(evt, 0, addr, info); > @@ -276,7 +280,7 @@ static void amdvi_log_illegalcom_error(AMDVIState *s, > uint16_t info, > static void amdvi_log_illegaldevtab_error(AMDVIState *s, uint16_t devid, >hwaddr addr, uint16_t info) > { > -uint64_t evt[4]; > +uint64_t evt[2]; > > info |= AMDVI_EVENT_ILLEGAL_DEVTAB_ENTRY; > amdvi_encode_event(evt, devid, addr, info); > @@ -288,7 +292,7 @@ static void amdvi_log_illegaldevtab_error(AMDVIState *s, > uint16_t devid, > static void amdvi_log_pagetab_error(AMDVIState *s, uint16_t devid, > hwaddr addr, uint16_t info) > { > -uint64_t evt[4]; > +uint64_t evt[2]; > > info |= AMDVI_EVENT_PAGE_TAB_HW_ERROR; > amdvi_encode_event(evt, devid, addr, info); > -- > 2.35.1 >