date:20240528

Re: [PATCH v2 37/67] target/arm: Improve vector UQADD, UQSUB, SQADD, SQSUB

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:28, Richard Henderson
 wrote:
>
> No need for a full comparison; xor produces non-zero bits
> for QC just fine.
>
> Signed-off-by: Richard Henderson 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-28 Thread Steven Sistare via


On 5/27/2024 1:45 PM, Peter Xu wrote:

On Tue, May 21, 2024 at 07:46:12AM -0400, Steven Sistare wrote:

I understand, thanks.  If I can help with any of your todo list,
just ask - steve


Thanks for offering the help, Steve.  Started looking at this today, then I
found that I miss something high-level.  Let me ask here, and let me
apologize already for starting to throw multiple questions..

IIUC the whole idea of this patchset is to allow efficient QEMU upgrade, in
this case not host kernel but QEMU-only, and/or upper.

Is there any justification on why the complexity is needed here?  It looks
to me this one is more involved than cpr-reboot, so I'm thinking how much
we can get from the complexity, and whether it's worthwhile.  1000+ LOC is
the min support, and if we even expect more to come, that's really
important, IMHO.

For example, what's the major motivation of this whole work?  Is that more
on performance, or is it more for supporting the special devices like VFIO
which we used to not support, or something else?  I can't find them in
whatever cover letter I can find, including this one.

Firstly, regarding performance, IMHO it'll be always nice to share even
some very fundamental downtime measurement comparisons using the new exec
mode v.s. the old migration ways to upgrade QEMU binary.  Do you perhaps
have some number on hand when you started working on this feature years
ago?  Or maybe some old links on the list would help too, as I didn't
follow this work since the start.

On VFIO, IIUC you started out this project without VFIO migration being
there.  Now we have VFIO migration so not sure how much it would work for
the upgrade use case. Even with current VFIO migration, we may not want to
migrate device states for a local upgrade I suppose, as that can be a lot
depending on the type of device assigned.  However it'll be nice to discuss
this too if this is the major purpose of the series.

I think one other challenge on QEMU upgrade with VFIO devices is that the
dest QEMU won't be able to open the VFIO device when the src QEMU is still
using it as the owner.  IIUC this is a similar condition where QEMU wants
to have proper ownership transfer of a shared block device, and AFAIR right
now we resolved that issue using some form of file lock on the image file.
In this case it won't easily apply to a VFIO dev fd, but maybe we still
have other approaches, not sure whether you investigated any.  E.g. could
the VFIO handle be passed over using unix scm rights?  I think this might
remove one dependency of using exec which can cause quite some difference
v.s. a generic migration (from which regard, cpr-reboot is still a pretty
generic migration).

You also mentioned vhost/tap, is that also a major goal of this series in
the follow up patchsets?  Is this a problem only because this solution will
do exec?  Can it work if either the exec()ed qemu or dst qemu create the
vhost/tap fds when boot?

Meanwhile, could you elaborate a bit on the implication on chardevs?  From
what I read in the doc update it looks like a major part of work in the
future, but I don't yet understand the issue..  Is it also relevant to the
exec() approach?

In all cases, some of such discussion would be really appreciated.  And if
you used to consider other approaches to solve this problem it'll be great
to mention how you chose this way.  Considering this work contains too many
things, it'll be nice if such discussion can start with the fundamentals,
e.g. on why exec() is a must.


The main goal of cpr-exec is providing a fast and reliable way to update
qemu. cpr-reboot is not fast enough or general enough.  It requires the
guest to support suspend and resume for all devices, and that takes seconds.
If one actually reboots the host, that adds more seconds, depending on
system services.  cpr-exec takes 0.1 secs, and works every time, unlike
like migration which can fail to converge on a busy system.  Live migration
also consumes more system and network resources.  cpr-exec seamlessly
preserves client connections by preserving chardevs, and overall provides
a much nicer user experience.

chardev's are preserved by keeping their fd open across the exec, and
remembering the value of the fd in precreate vmstate so that new qemu
can associate the fd with the chardev rather than opening a new one.

The approach of preserving open file descriptors is very general and applicable
to all kinds of devices, regardless of whether they support live migration
in hardware.  Device fd's are preserved using the same mechanism as for
chardevs.

Devices that support live migration in hardware do not like to live migrate
in place to the same node.  It is not what they are designed for, and some
implementations will flat out fail because the source and target interfaces
are the same.

For vhost/tap, sometimes the management layer opens the dev and passes an
fd to qemu, and sometimes qemu opens the dev.  The upcoming vhost/tap support
allows both.  For the c

Re: [PATCH V1 07/26] migration: VMStateId

2024-05-28 Thread Steven Sistare via


On 5/27/2024 2:20 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:16AM -0700, Steve Sistare wrote:

Define a type for the 256 byte id string to guarantee the same length is
used and enforced everywhere.

Signed-off-by: Steve Sistare 
---
  include/exec/ramblock.h | 3 ++-
  include/migration/vmstate.h | 2 ++
  migration/savevm.c  | 8 
  migration/vmstate.c | 3 ++-
  4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 0babd10..61deefe 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -23,6 +23,7 @@
  #include "cpu-common.h"
  #include "qemu/rcu.h"
  #include "exec/ramlist.h"
+#include "migration/vmstate.h"
  
  struct RAMBlock {

  struct rcu_head rcu;
@@ -35,7 +36,7 @@ struct RAMBlock {
  void (*resized)(const char*, uint64_t length, void *host);
  uint32_t flags;
  /* Protected by the BQL.  */
-char idstr[256];
+VMStateId idstr;
  /* RCU-enabled, writes protected by the ramlist lock */
  QLIST_ENTRY(RAMBlock) next;
  QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;


Hmm.. Don't look like a good idea to include a migration header in
ramblock.h?  Is this ramblock change needed for this work?


Well, entities that are migrated include migration headers, and now that
includes RAMBlock.  There is precedent:

0 include/exec/ramblock.h   26 #include "migration/vmstate.h"
1 include/hw/acpi/ich9_tco. 14 #include "migration/vmstate.h"
2 include/hw/display/ramfb.  4 #include "migration/vmstate.h"
3 include/hw/hyperv/vmbus.h 16 #include "migration/vmstate.h"
4 include/hw/input/pl050.h  14 #include "migration/vmstate.h"
5 include/hw/pci/shpc.h  7 #include "migration/vmstate.h"
6 include/hw/virtio/virtio. 20 #include "migration/vmstate.h"
7 include/migration/cpu.h8 #include "migration/vmstate.h"

Granted, only some of the C files that include ramblock.h need all of vmstate.h.
I could define VMStateId in a smaller file such as migration/misc.h, or a
new file migration/vmstateid.h, and include that in ramblock.h.
Any preference?

- Steve

Re: [PATCH V1 08/26] migration: vmstate_info_void_ptr

2024-05-28 Thread Steven Sistare via


On 5/27/2024 2:31 PM, Peter Xu wrote:

On Mon, Apr 29, 2024 at 08:55:17AM -0700, Steve Sistare wrote:

Define VMSTATE_VOID_PTR so the value of a pointer (but not its target)
can be saved in the migration stream.  This will be needed for CPR.

Signed-off-by: Steve Sistare 


This is really tricky.

 From a first glance, I don't think migrating a VA is valid at all for
migration even if with exec.. and looks insane to me for a cross-process
migration, which seems to be allowed to use as a generic VMSD helper.. as
VA is the address space barrier for different processes and I think it
normally even apply to generic execve(), and we're trying to jailbreak for
some reason..

It definitely won't work for any generic migration as sizeof(void*) can be
different afaict between hosts, e.g. 32bit -> 64bit migrations.

Some description would be really helpful in this commit message,
e.g. explain the users and why.  Do we need to poison that for generic VMSD
use (perhaps with prefixed underscores)?  I think I'll need to read on the
rest to tell..


Short answer: we never dereference the void* in the new process.  And must not.

Longer answer:

During CPR for vfio, each mapped DMA region is re-registered in the new
process using the new VA.  The ioctl to re-register identifies the mapping
by IOVA and length.

The same requirement holds for CPR of iommufd devices.  However, in the
iommufd framework, IOVA does not uniquely identify a dma mapping, and we
need to use the old VA as the unique identifier.  The new process
re-registers each mapping, passing the old VA and new VA to the kernel.
The old VA is never dereferenced in the new process, we just need its value.

I suspected that the void* which must not be dereferenced might make people
uncomfortable.  I have an older version of my code which adds a uint64_t
field to RAMBlock for recording and migrating the old VA.  The saving and
loading code is slightly less elegant, but no big deal.  Would you prefer
that?

- Steve

Re: [PATCH 1/1] block: drop force_dup parameter of raw_reconfigure_getfd()

2024-05-28 Thread Kevin Wolf

Am 30.04.2024 um 19:02 hat Denis V. Lunev via geschrieben:
> This parameter is always passed as 'false' from the caller.
> 
> Signed-off-by: Denis V. Lunev 
> CC: Andrey Zhadchenko 
> CC: Kevin Wolf 
> CC: Hanna Reitz 

Let me add a "Since commit 72373e40fbc" to the commit message.

Thanks, applied to the block branch.

Kevin

Re: [PATCH] iotests/pylintrc: allow up to 10 similar lines

2024-05-28 Thread Kevin Wolf

Am 28.05.2024 um 14:49 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 30.04.24 12:13, Vladimir Sementsov-Ogievskiy wrote:
> > We want to have similar QMP objects in different tests. Reworking these
> > objects to make common parts by calling some helper functions doesn't
> > seem good. It's a lot more comfortable to see the whole QAPI request in
> > one place.
> > 
> > So, let's increase the limit, to unblock further commit
> > "iotests: add backup-discard-source"
> > 
> > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > ---
> > 
> > Hi all! That's a patch to unblock my PR
> > "[PULL 0/6] Block jobs patches for 2024-04-29"
> ><20240429115157.2260885-1-vsement...@yandex-team.ru>
> >
> > https://patchew.org/QEMU/20240429115157.2260885-1-vsement...@yandex-team.ru/
> > 
> > 
> >   tests/qemu-iotests/pylintrc | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
> > index de2e0c2781..05b75ee59b 100644
> > --- a/tests/qemu-iotests/pylintrc
> > +++ b/tests/qemu-iotests/pylintrc
> > @@ -55,4 +55,4 @@ max-line-length=79
> >   [SIMILARITIES]
> > -min-similarity-lines=6
> > +min-similarity-lines=10
> 
> 
> Hi! I hope it's OK, if I just apply this to my block branch, and
> resend "[PULL 0/6] Block jobs patches for 2024-04-29" which is blocked
> on this problem.

Sorry, I only just returned from PTO and now have to catch up with
things.

But yes, if John doesn't have any objections, it's fine with me, too.

Reviewed-by: Kevin Wolf

Re: [PATCH v3 3/4] hw/clock: Expose 'qtest-clock-period' QOM property for QTests

2024-05-28 Thread Philippe Mathieu-Daudé


On 23/5/24 21:41, Inès Varhol wrote:

Expose the clock period via the QOM 'qtest-clock-period' property so it
can be used in QTests. This property is only accessible in QTests (not
via HMP).

Signed-off-by: Philippe Mathieu-Daudé 


Addressing Luc and Peter comments, you can replace that line by:

Reviewed-by: Philippe Mathieu-Daudé 

Thanks!


Signed-off-by: Inès Varhol 
---
  docs/devel/clocks.rst |  3 +++
  hw/core/clock.c   | 16 
  2 files changed, 19 insertions(+)

[PATCH 6/6] accel/tcg: Move qemu_plugin_vcpu_init__async() to plugins/

2024-05-28 Thread Philippe Mathieu-Daudé

Calling qemu_plugin_vcpu_init__async() on the vCPU thread
is a detail of plugins, not relevant to TCG vCPU management.

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/tcg/cpu-exec-common.c | 11 ++-
 plugins/core.c  |  8 +++-
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec-common.c b/accel/tcg/cpu-exec-common.c
index 3c4a4c9f21..02499bfb1d 100644
--- a/accel/tcg/cpu-exec-common.c
+++ b/accel/tcg/cpu-exec-common.c
@@ -57,19 +57,12 @@ void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
 cpu_loop_exit_restore(cpu, pc);
 }
 
-#ifdef CONFIG_PLUGIN
-static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused)
-{
-qemu_plugin_vcpu_init_hook(cpu);
-}
-#endif
-
 bool tcg_exec_realize_assigned(CPUState *cpu, Error **errp)
 {
 #ifdef CONFIG_PLUGIN
 cpu->plugin_state = qemu_plugin_create_vcpu_state();
-/* Plugin initialization must wait until the cpu start executing code */
-async_run_on_cpu(cpu, qemu_plugin_vcpu_init__async, RUN_ON_CPU_NULL);
+
+qemu_plugin_vcpu_init_hook(cpu);
 #endif
 
 return true;
diff --git a/plugins/core.c b/plugins/core.c
index 0726bc7f25..0eda47ba33 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -245,7 +245,7 @@ static void plugin_grow_scoreboards__locked(CPUState *cpu)
 end_exclusive();
 }
 
-void qemu_plugin_vcpu_init_hook(CPUState *cpu)
+static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused)
 {
 bool success;
 
@@ -261,6 +261,12 @@ void qemu_plugin_vcpu_init_hook(CPUState *cpu)
 plugin_vcpu_cb__simple(cpu, QEMU_PLUGIN_EV_VCPU_INIT);
 }
 
+void qemu_plugin_vcpu_init_hook(CPUState *cpu)
+{
+/* Plugin initialization must wait until the cpu start executing code */
+async_run_on_cpu(cpu, qemu_plugin_vcpu_init__async, RUN_ON_CPU_NULL);
+}
+
 void qemu_plugin_vcpu_exit_hook(CPUState *cpu)
 {
 bool success;
-- 
2.41.0

[PATCH 1/6] system/runstate: Remove unused 'qemu/plugin.h' header

2024-05-28 Thread Philippe Mathieu-Daudé

system/runstate.c never required "qemu/plugin.h".

Signed-off-by: Philippe Mathieu-Daudé 
---
 system/runstate.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/system/runstate.c b/system/runstate.c
index cb4905a40f..ec32e270cb 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -45,7 +45,6 @@
 #include "qemu/job.h"
 #include "qemu/log.h"
 #include "qemu/module.h"
-#include "qemu/plugin.h"
 #include "qemu/sockets.h"
 #include "qemu/timer.h"
 #include "qemu/thread.h"
-- 
2.41.0

[PATCH 5/6] accel: Restrict TCG plugin (un)registration to TCG accel

2024-05-28 Thread Philippe Mathieu-Daudé

Use the AccelClass::cpu_common_[un]realize_assigned() handlers
to [un]register the TCG plugin handlers, allowing to remove
accel specific code from the common hw/core/cpu-common.c file.

Remove the now unnecessary qemu_plugin_vcpu_init_hook() and
qemu_plugin_vcpu_exit_hook() stub.

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/tcg/internal-common.h |  2 ++
 include/qemu/plugin.h   |  6 --
 accel/tcg/cpu-exec-common.c | 27 +++
 accel/tcg/tcg-all.c |  2 ++
 hw/core/cpu-common.c| 25 -
 5 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/accel/tcg/internal-common.h b/accel/tcg/internal-common.h
index ec2c6317b7..d900897c6e 100644
--- a/accel/tcg/internal-common.h
+++ b/accel/tcg/internal-common.h
@@ -54,6 +54,8 @@ void cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
uintptr_t host_pc);
 
 bool tcg_exec_realize_unassigned(CPUState *cpu, Error **errp);
+bool tcg_exec_realize_assigned(CPUState *cpu, Error **errp);
+void tcg_exec_unrealize_assigned(CPUState *cpu);
 void tcg_exec_unrealize_unassigned(CPUState *cpu);
 
 #endif
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index bc5aef979e..d39d105795 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -221,12 +221,6 @@ static inline int qemu_plugin_load_list(QemuPluginList 
*head, Error **errp)
 return 0;
 }
 
-static inline void qemu_plugin_vcpu_init_hook(CPUState *cpu)
-{ }
-
-static inline void qemu_plugin_vcpu_exit_hook(CPUState *cpu)
-{ }
-
 static inline void qemu_plugin_tb_trans_cb(CPUState *cpu,
struct qemu_plugin_tb *tb)
 { }
diff --git a/accel/tcg/cpu-exec-common.c b/accel/tcg/cpu-exec-common.c
index bc9b1a260e..3c4a4c9f21 100644
--- a/accel/tcg/cpu-exec-common.c
+++ b/accel/tcg/cpu-exec-common.c
@@ -56,3 +56,30 @@ void cpu_loop_exit_atomic(CPUState *cpu, uintptr_t pc)
 cpu->exception_index = EXCP_ATOMIC;
 cpu_loop_exit_restore(cpu, pc);
 }
+
+#ifdef CONFIG_PLUGIN
+static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused)
+{
+qemu_plugin_vcpu_init_hook(cpu);
+}
+#endif
+
+bool tcg_exec_realize_assigned(CPUState *cpu, Error **errp)
+{
+#ifdef CONFIG_PLUGIN
+cpu->plugin_state = qemu_plugin_create_vcpu_state();
+/* Plugin initialization must wait until the cpu start executing code */
+async_run_on_cpu(cpu, qemu_plugin_vcpu_init__async, RUN_ON_CPU_NULL);
+#endif
+
+return true;
+}
+
+/* undo the initializations in reverse order */
+void tcg_exec_unrealize_assigned(CPUState *cpu)
+{
+#ifdef CONFIG_PLUGIN
+/* Call the plugin hook before clearing the cpu is fully unrealized */
+qemu_plugin_vcpu_exit_hook(cpu);
+#endif
+}
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index c08a6acc21..a32663f507 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -228,6 +228,8 @@ static void tcg_accel_class_init(ObjectClass *oc, void 
*data)
 ac->name = "tcg";
 ac->init_machine = tcg_init_machine;
 ac->cpu_common_realize_unassigned = tcg_exec_realize_unassigned;
+ac->cpu_common_realize_assigned = tcg_exec_realize_assigned;
+ac->cpu_common_unrealize_assigned = tcg_exec_unrealize_assigned;
 ac->cpu_common_unrealize_unassigned = tcg_exec_unrealize_unassigned;
 ac->allowed = &tcg_allowed;
 ac->gdbstub_supported_sstep_flags = tcg_gdbstub_supported_sstep_flags;
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 0f0a247f56..fda2c2c1d5 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -30,9 +30,6 @@
 #include "hw/boards.h"
 #include "hw/qdev-properties.h"
 #include "trace.h"
-#ifdef CONFIG_PLUGIN
-#include "qemu/plugin.h"
-#endif
 
 CPUState *cpu_by_arch_id(int64_t id)
 {
@@ -192,13 +189,6 @@ static void cpu_common_parse_features(const char 
*typename, char *features,
 }
 }
 
-#ifdef CONFIG_PLUGIN
-static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused)
-{
-qemu_plugin_vcpu_init_hook(cpu);
-}
-#endif
-
 static void cpu_common_realizefn(DeviceState *dev, Error **errp)
 {
 CPUState *cpu = CPU(dev);
@@ -222,14 +212,6 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
 cpu_resume(cpu);
 }
 
-/* Plugin initialization must wait until the cpu start executing code */
-#ifdef CONFIG_PLUGIN
-if (tcg_enabled()) {
-cpu->plugin_state = qemu_plugin_create_vcpu_state();
-async_run_on_cpu(cpu, qemu_plugin_vcpu_init__async, RUN_ON_CPU_NULL);
-}
-#endif
-
 /* NOTE: latest generic point where the cpu is fully realized */
 }
 
@@ -237,13 +219,6 @@ static void cpu_common_unrealizefn(DeviceState *dev)
 {
 CPUState *cpu = CPU(dev);
 
-/* Call the plugin hook before clearing the cpu is fully unrealized */
-#ifdef CONFIG_PLUGIN
-if (tcg_enabled()) {
-qemu_plugin_vcpu_exit_hook(cpu);
-}
-#endif
-
 /* NOTE: latest generic point before the cpu is fully

[PATCH 4/6] accel: Introduce accel_cpu_common_[un]realize_assigned() handlers

2024-05-28 Thread Philippe Mathieu-Daudé

Introduce handlers called while the vCPU has an assigned
index and is still in the global %cpus_queue.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/accel.h | 20 
 accel/accel-target.c | 23 +++
 cpu-target.c |  6 ++
 3 files changed, 49 insertions(+)

diff --git a/include/qemu/accel.h b/include/qemu/accel.h
index dd18c41dc0..f828d32204 100644
--- a/include/qemu/accel.h
+++ b/include/qemu/accel.h
@@ -44,6 +44,8 @@ typedef struct AccelClass {
hwaddr start_addr, hwaddr size);
 #endif
 bool (*cpu_common_realize_unassigned)(CPUState *cpu, Error **errp);
+bool (*cpu_common_realize_assigned)(CPUState *cpu, Error **errp);
+void (*cpu_common_unrealize_assigned)(CPUState *cpu);
 void (*cpu_common_unrealize_unassigned)(CPUState *cpu);
 
 /* gdbstub related hooks */
@@ -100,6 +102,24 @@ void accel_cpu_instance_init(CPUState *cpu);
  */
 bool accel_cpu_common_realize_unassigned(CPUState *cpu, Error **errp);
 
+/**
+ * accel_cpu_common_realize_assigned:
+ * @cpu: The CPU that needs to call accel-specific cpu realization.
+ * @errp: currently unused.
+ *
+ * The @cpu index is assigned, @cpu is added to the global #cpus_queue.
+ */
+bool accel_cpu_common_realize_assigned(CPUState *cpu, Error **errp);
+
+/**
+ * accel_cpu_common_unrealize_unassigned:
+ * @cpu: The CPU that needs to call accel-specific cpu unrealization.
+ *
+ * The @cpu index is still assigned, @cpu is still part of the global
+ * #cpus_queue.
+ */
+void accel_cpu_common_unrealize_assigned(CPUState *cpu);
+
 /**
  * accel_cpu_common_unrealize_unassigned:
  * @cpu: The CPU that needs to call accel-specific cpu unrealization.
diff --git a/accel/accel-target.c b/accel/accel-target.c
index e0a79c0fce..b2ba219a44 100644
--- a/accel/accel-target.c
+++ b/accel/accel-target.c
@@ -140,6 +140,29 @@ bool accel_cpu_common_realize_unassigned(CPUState *cpu, 
Error **errp)
 return true;
 }
 
+bool accel_cpu_common_realize_assigned(CPUState *cpu, Error **errp)
+{
+AccelState *accel = current_accel();
+AccelClass *acc = ACCEL_GET_CLASS(accel);
+
+if (acc->cpu_common_realize_assigned
+&& !acc->cpu_common_realize_assigned(cpu, errp)) {
+return false;
+}
+
+return true;
+}
+
+void accel_cpu_common_unrealize_assigned(CPUState *cpu)
+{
+AccelState *accel = current_accel();
+AccelClass *acc = ACCEL_GET_CLASS(accel);
+
+if (acc->cpu_common_unrealize_assigned) {
+acc->cpu_common_unrealize_assigned(cpu);
+}
+}
+
 void accel_cpu_common_unrealize_unassigned(CPUState *cpu)
 {
 AccelState *accel = current_accel();
diff --git a/cpu-target.c b/cpu-target.c
index 9ab5a28cb5..de903f30cb 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -143,6 +143,10 @@ bool cpu_exec_realizefn(CPUState *cpu, Error **errp)
 /* Wait until cpu initialization complete before exposing cpu. */
 cpu_list_add(cpu);
 
+if (!accel_cpu_common_realize_assigned(cpu, errp)) {
+return false;
+}
+
 #ifdef CONFIG_USER_ONLY
 assert(qdev_get_vmsd(DEVICE(cpu)) == NULL ||
qdev_get_vmsd(DEVICE(cpu))->unmigratable);
@@ -171,6 +175,8 @@ void cpu_exec_unrealizefn(CPUState *cpu)
 }
 #endif
 
+accel_cpu_common_unrealize_assigned(cpu);
+
 cpu_list_remove(cpu);
 /*
  * Now that the vCPU has been removed from the RCU list, we can call
-- 
2.41.0

[PATCH 3/6] accel: Clarify accel_cpu_common_[un]realize() use unassigned vCPU

2024-05-28 Thread Philippe Mathieu-Daudé

In preparation of introducing [un]realize handlers for
when vCPUs are assigned, rename current handlers using
the '_unassigned' suffix.

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/tcg/internal-common.h |  4 ++--
 include/qemu/accel.h| 17 +++--
 accel/accel-target.c| 11 ++-
 accel/tcg/cpu-exec.c|  4 ++--
 accel/tcg/tcg-all.c |  4 ++--
 cpu-target.c|  4 ++--
 6 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/accel/tcg/internal-common.h b/accel/tcg/internal-common.h
index a8fc3db774..ec2c6317b7 100644
--- a/accel/tcg/internal-common.h
+++ b/accel/tcg/internal-common.h
@@ -53,7 +53,7 @@ TranslationBlock *tb_link_page(TranslationBlock *tb);
 void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
uintptr_t host_pc);
 
-bool tcg_exec_realizefn(CPUState *cpu, Error **errp);
-void tcg_exec_unrealizefn(CPUState *cpu);
+bool tcg_exec_realize_unassigned(CPUState *cpu, Error **errp);
+void tcg_exec_unrealize_unassigned(CPUState *cpu);
 
 #endif
diff --git a/include/qemu/accel.h b/include/qemu/accel.h
index 972a849a2b..dd18c41dc0 100644
--- a/include/qemu/accel.h
+++ b/include/qemu/accel.h
@@ -43,8 +43,8 @@ typedef struct AccelClass {
 bool (*has_memory)(MachineState *ms, AddressSpace *as,
hwaddr start_addr, hwaddr size);
 #endif
-bool (*cpu_common_realize)(CPUState *cpu, Error **errp);
-void (*cpu_common_unrealize)(CPUState *cpu);
+bool (*cpu_common_realize_unassigned)(CPUState *cpu, Error **errp);
+void (*cpu_common_unrealize_unassigned)(CPUState *cpu);
 
 /* gdbstub related hooks */
 int (*gdbstub_supported_sstep_flags)(void);
@@ -92,17 +92,22 @@ void accel_setup_post(MachineState *ms);
 void accel_cpu_instance_init(CPUState *cpu);
 
 /**
- * accel_cpu_common_realize:
+ * accel_cpu_common_realize_unassigned:
  * @cpu: The CPU that needs to call accel-specific cpu realization.
  * @errp: currently unused.
+ *
+ * The @cpu index is not yet assigned.
  */
-bool accel_cpu_common_realize(CPUState *cpu, Error **errp);
+bool accel_cpu_common_realize_unassigned(CPUState *cpu, Error **errp);
 
 /**
- * accel_cpu_common_unrealize:
+ * accel_cpu_common_unrealize_unassigned:
  * @cpu: The CPU that needs to call accel-specific cpu unrealization.
+ *
+ * The @cpu index is no more assigned, @cpu has been removed from the global
+ * #cpus_queue.
  */
-void accel_cpu_common_unrealize(CPUState *cpu);
+void accel_cpu_common_unrealize_unassigned(CPUState *cpu);
 
 /**
  * accel_supported_gdbstub_sstep_flags:
diff --git a/accel/accel-target.c b/accel/accel-target.c
index 08626c00c2..e0a79c0fce 100644
--- a/accel/accel-target.c
+++ b/accel/accel-target.c
@@ -119,7 +119,7 @@ void accel_cpu_instance_init(CPUState *cpu)
 }
 }
 
-bool accel_cpu_common_realize(CPUState *cpu, Error **errp)
+bool accel_cpu_common_realize_unassigned(CPUState *cpu, Error **errp)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 AccelState *accel = current_accel();
@@ -132,21 +132,22 @@ bool accel_cpu_common_realize(CPUState *cpu, Error **errp)
 }
 
 /* generic realization */
-if (acc->cpu_common_realize && !acc->cpu_common_realize(cpu, errp)) {
+if (acc->cpu_common_realize_unassigned
+&& !acc->cpu_common_realize_unassigned(cpu, errp)) {
 return false;
 }
 
 return true;
 }
 
-void accel_cpu_common_unrealize(CPUState *cpu)
+void accel_cpu_common_unrealize_unassigned(CPUState *cpu)
 {
 AccelState *accel = current_accel();
 AccelClass *acc = ACCEL_GET_CLASS(accel);
 
 /* generic unrealization */
-if (acc->cpu_common_unrealize) {
-acc->cpu_common_unrealize(cpu);
+if (acc->cpu_common_unrealize_unassigned) {
+acc->cpu_common_unrealize_unassigned(cpu);
 }
 }
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2972f75b96..08769cf91e 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -1074,7 +1074,7 @@ int cpu_exec(CPUState *cpu)
 return ret;
 }
 
-bool tcg_exec_realizefn(CPUState *cpu, Error **errp)
+bool tcg_exec_realize_unassigned(CPUState *cpu, Error **errp)
 {
 static bool tcg_target_initialized;
 
@@ -1094,7 +1094,7 @@ bool tcg_exec_realizefn(CPUState *cpu, Error **errp)
 }
 
 /* undo the initializations in reverse order */
-void tcg_exec_unrealizefn(CPUState *cpu)
+void tcg_exec_unrealize_unassigned(CPUState *cpu)
 {
 #ifndef CONFIG_USER_ONLY
 tcg_iommu_free_notifier_list(cpu);
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index 2090907dba..c08a6acc21 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -227,8 +227,8 @@ static void tcg_accel_class_init(ObjectClass *oc, void 
*data)
 AccelClass *ac = ACCEL_CLASS(oc);
 ac->name = "tcg";
 ac->init_machine = tcg_init_machine;
-ac->cpu_common_realize = tcg_exec_realizefn;
-ac->cpu_common_unrealize = tcg_exec_unrealizefn;
+ac->cpu_common_realize_unassigned = tcg_exec_realize_

[PATCH 2/6] accel/tcg: Move common declarations to 'internal-common.h'

2024-05-28 Thread Philippe Mathieu-Daudé

'internal-target.h' is meant for target-specific declarations,
while 'internal-common.h' for common ones. Move common declarations
to it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 accel/tcg/internal-common.h | 15 +++
 accel/tcg/internal-target.h | 14 --
 accel/tcg/tcg-all.c |  2 +-
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/accel/tcg/internal-common.h b/accel/tcg/internal-common.h
index cff43d221b..a8fc3db774 100644
--- a/accel/tcg/internal-common.h
+++ b/accel/tcg/internal-common.h
@@ -15,6 +15,8 @@
 extern int64_t max_delay;
 extern int64_t max_advance;
 
+extern bool one_insn_per_tb;
+
 /*
  * Return true if CS is not running in parallel with other cpus, either
  * because there are no other cpus or we are within an exclusive context.
@@ -41,4 +43,17 @@ static inline bool cpu_plugin_mem_cbs_enabled(const CPUState 
*cpu)
 #endif
 }
 
+TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc,
+  uint64_t cs_base, uint32_t flags,
+  int cflags);
+void page_init(void);
+void tb_htable_init(void);
+void tb_reset_jump(TranslationBlock *tb, int n);
+TranslationBlock *tb_link_page(TranslationBlock *tb);
+void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
+   uintptr_t host_pc);
+
+bool tcg_exec_realizefn(CPUState *cpu, Error **errp);
+void tcg_exec_unrealizefn(CPUState *cpu);
+
 #endif
diff --git a/accel/tcg/internal-target.h b/accel/tcg/internal-target.h
index 4e36cf858e..fe109724c6 100644
--- a/accel/tcg/internal-target.h
+++ b/accel/tcg/internal-target.h
@@ -69,19 +69,7 @@ void tb_invalidate_phys_range_fast(ram_addr_t ram_addr,
 G_NORETURN void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
 #endif /* CONFIG_SOFTMMU */
 
-TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc,
-  uint64_t cs_base, uint32_t flags,
-  int cflags);
-void page_init(void);
-void tb_htable_init(void);
-void tb_reset_jump(TranslationBlock *tb, int n);
-TranslationBlock *tb_link_page(TranslationBlock *tb);
 bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
-void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
-   uintptr_t host_pc);
-
-bool tcg_exec_realizefn(CPUState *cpu, Error **errp);
-void tcg_exec_unrealizefn(CPUState *cpu);
 
 /* Return the current PC from CPU, which may be cached in TB. */
 static inline vaddr log_pc(CPUState *cpu, const TranslationBlock *tb)
@@ -93,8 +81,6 @@ static inline vaddr log_pc(CPUState *cpu, const 
TranslationBlock *tb)
 }
 }
 
-extern bool one_insn_per_tb;
-
 /**
  * tcg_req_mo:
  * @type: TCGBar
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index c6619f5b98..2090907dba 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -38,7 +38,7 @@
 #if !defined(CONFIG_USER_ONLY)
 #include "hw/boards.h"
 #endif
-#include "internal-target.h"
+#include "internal-common.h"
 
 struct TCGState {
 AccelState parent_obj;
-- 
2.41.0

[PATCH 0/6] accel: Restrict TCG plugin (un)registration to TCG accel

2024-05-28 Thread Philippe Mathieu-Daudé

Hi,

TL;DR; this series remove TCG plugin code from generic accel code.

Since the introduction of the scoreboard in plugins, the INIT
hook use the vCPU index, which is only available somewhere
during the vCPU REALIZE() step (see below for call tree).

In order to clarify that, we split accel_cpu_common_[un]realize
as *unassigned and *assigned steps. This allow to remove the
plugin [un]registration code from common accel code.

Another approach suggested by rth is to add a PostRealize()
handler in DeviceClass. This was already experimented here:
https://lore.kernel.org/qemu-devel/20240209123226.32576-1-phi...@linaro.org/
Since it is a change harder to sell, I took this simplified
path which just make the vCPU REALIZE a bit more complex,
but not really an concern since the current call tree is
https://etherpad.opendev.org/p/QEMU_vCPU_life.

Philippe Mathieu-Daudé (6):
  system/runstate: Remove unused 'qemu/plugin.h' header
  accel/tcg: Move common declarations to 'internal-common.h'
  accel: Clarify accel_cpu_common_[un]realize() use unassigned vCPU
  accel: Introduce accel_cpu_common_[un]realize_assigned() handlers
  accel: Restrict TCG plugin (un)registration to TCG accel
  accel/tcg: Move qemu_plugin_vcpu_init__async() to plugins/

 accel/tcg/internal-common.h | 17 
 accel/tcg/internal-target.h | 14 -
 include/qemu/accel.h| 39 ++---
 include/qemu/plugin.h   |  6 --
 accel/accel-target.c| 34 +++-
 accel/tcg/cpu-exec-common.c | 20 +++
 accel/tcg/cpu-exec.c|  4 ++--
 accel/tcg/tcg-all.c |  8 +---
 cpu-target.c| 10 --
 hw/core/cpu-common.c| 25 
 plugins/core.c  |  8 +++-
 system/runstate.c   |  1 -
 12 files changed, 120 insertions(+), 66 deletions(-)

-- 
2.41.0

Re: [PATCH v3 3/4] hw/clock: Expose 'qtest-clock-period' QOM property for QTests

2024-05-28 Thread Peter Maydell

On Thu, 23 May 2024 at 20:44, Inès Varhol  wrote:
>
> Expose the clock period via the QOM 'qtest-clock-period' property so it
> can be used in QTests. This property is only accessible in QTests (not
> via HMP).
>
> Signed-off-by: Philippe Mathieu-Daudé 
> Signed-off-by: Inès Varhol 
> ---
>  docs/devel/clocks.rst |  3 +++
>  hw/core/clock.c   | 16 
>  2 files changed, 19 insertions(+)
>
> diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
> index 177ee1c90d..19e67601ec 100644
> --- a/docs/devel/clocks.rst
> +++ b/docs/devel/clocks.rst
> @@ -358,6 +358,9 @@ humans (for instance in debugging), use 
> ``clock_display_freq()``,
>  which returns a prettified string-representation, e.g. "33.3 MHz".
>  The caller must free the string with g_free() after use.
>
> +It's also possible to retrieve the clock period from a QTest by
> +accessing QOM property ``qtest-clock-period`` using a QMP command.

We should add:

  This property is only present when the device is being run under
  the ``qtest`` accelerator; it is not available when QEMU is
  being run normally.

thanks
-- PMM

Re: [PATCH] hw/s390x: Remove unused macro VMSTATE_ADAPTER_ROUTES

2024-05-28 Thread Eric Farman

On Mon, 2024-05-27 at 14:13 +0200, Thomas Huth wrote:
> It's not used anywhere, so let's simply remove it.
> 
> Signed-off-by: Thomas Huth 
> ---
>  include/hw/s390x/s390_flic.h | 3 ---
>  1 file changed, 3 deletions(-)

Reviewed-by: Eric Farman

Re: [PATCH v3 4/4] tests/qtest: Check STM32L4x5 clock connections

2024-05-28 Thread Peter Maydell

On Thu, 23 May 2024 at 20:44, Inès Varhol  wrote:
>
> For USART, GPIO and SYSCFG devices, check that clock frequency before
> and after enabling the peripheral clock in RCC is correct.
>
> Signed-off-by: Inès Varhol 
> ---
>  tests/qtest/stm32l4x5.h | 43 +
>  tests/qtest/stm32l4x5_gpio-test.c   | 23 +++
>  tests/qtest/stm32l4x5_syscfg-test.c | 20 --
>  tests/qtest/stm32l4x5_usart-test.c  | 26 +
>  4 files changed, 110 insertions(+), 2 deletions(-)
>  create mode 100644 tests/qtest/stm32l4x5.h
>
> diff --git a/tests/qtest/stm32l4x5.h b/tests/qtest/stm32l4x5.h
> new file mode 100644
> index 00..cf59aeb019
> --- /dev/null
> +++ b/tests/qtest/stm32l4x5.h
> @@ -0,0 +1,43 @@
> +/*
> + * QTest testcase header for STM32L4X5 :
> + * used for consolidating common objects in stm32l4x5_*-test.c
> + *
> + * Copyright (c) 2024 Arnaud Minier 
> + * Copyright (c) 2024 Inès Varhol 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"

Header files must never include osdep.h. The rules for
osdep.h are:
 * never included in a .h file
 * always included as the first include in every .c file

> +#include "libqtest.h"
> +
> +/*
> + * MSI (4 MHz) is used as system clock source after startup
> + * from Reset.
> + * AHB, APB1 and APB2 prescalers are set to 1 at reset.
> + *
> + * A clock period is stored in units of 2^-32 ns :
> + * 10^9 * 2^32 / 400 = 1073741824000
> + */
> +#define SYSCLK_PERIOD 1073741824000UL

Rather than doing the calculation by hand, it would be
clearer to use the CLOCK_PERIOD_FROM_HZ() macro from hw/clock.h.
(If #including clock.h from the test C file doesn't work for
some reason, you can copy the macro definition; it's a one-liner).

#define SYSCLK_PERIOD CLOCK_PERIOD_FROM_HZ(400)

Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 1/4] hw/misc: Create STM32L4x5 SYSCFG clock

2024-05-28 Thread Peter Maydell

On Thu, 23 May 2024 at 20:44, Inès Varhol  wrote:
>
> This commit creates a clock in STM32L4x5 SYSCFG and wires it up to the
> corresponding clock from STM32L4x5 RCC.
>
> Signed-off-by: Inès Varhol 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v3 2/4] hw/char: Use v2 VMStateDescription for STM32L4x5 USART

2024-05-28 Thread Peter Maydell

On Thu, 23 May 2024 at 20:44, Inès Varhol  wrote:
>
> `vmstate_stm32l4x5_usart_base` namely uses `VMSTATE_CLOCK` so
> version needs to be 2.
>
> Signed-off-by: Inès Varhol 
> ---
>  hw/char/stm32l4x5_usart.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/hw/char/stm32l4x5_usart.c b/hw/char/stm32l4x5_usart.c
> index 02f666308c..0f16f0917a 100644
> --- a/hw/char/stm32l4x5_usart.c
> +++ b/hw/char/stm32l4x5_usart.c
> @@ -546,8 +546,8 @@ static int stm32l4x5_usart_base_post_load(void *opaque, 
> int version_id)
>
>  static const VMStateDescription vmstate_stm32l4x5_usart_base = {
>  .name = TYPE_STM32L4X5_USART_BASE,
> -.version_id = 1,
> -.minimum_version_id = 1,
> +.version_id = 2,
> +.minimum_version_id = 2,

I don't understand why we are bumping the version number here;
can you explain? Usually we bump the version number when we
add a new vmstate field to the vmstate, but this commit doesn't
do that.

thanks
-- PMM

Re: [PATCH v2 2/3] hw/misc: In STM32L4x5 EXTI, handle direct and configurable interrupts

2024-05-28 Thread Peter Maydell

On Wed, 22 May 2024 at 21:40, Inès Varhol  wrote:
>
> The previous implementation for EXTI interrupts only handled
> "configurable" interrupts, like those originating from STM32L4x5 SYSCFG
> (the only device currently connected to the EXTI up until now).
>
> In order to connect STM32L4x5 USART to the EXTI, this commit adds
> handling for direct interrupts (interrupts without configurable edge).
>
> The implementation of configurable interrupts (interrupts supporting
> edge selection) was incorrectly expecting alternating input levels :
> this commits adds a new status field `irq_levels` to actually detect
> edges.

This patch is now doing two different things:
 * correcting the handling of detecting of rising/falling edges
 * adding the direct-interrupt support

These should be two separate patches, I think.

> Signed-off-by: Inès Varhol 
> ---
>  include/hw/misc/stm32l4x5_exti.h |  2 ++
>  hw/misc/stm32l4x5_exti.c | 25 -
>  2 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/include/hw/misc/stm32l4x5_exti.h 
> b/include/hw/misc/stm32l4x5_exti.h
> index 82f75a2417..62f79362f2 100644
> --- a/include/hw/misc/stm32l4x5_exti.h
> +++ b/include/hw/misc/stm32l4x5_exti.h
> @@ -45,6 +45,8 @@ struct Stm32l4x5ExtiState {
>  uint32_t swier[EXTI_NUM_REGISTER];
>  uint32_t pr[EXTI_NUM_REGISTER];
>
> +/* used for edge detection */
> +uint32_t irq_levels[EXTI_NUM_REGISTER];
>  qemu_irq irq[EXTI_NUM_LINES];
>  };
>
> diff --git a/hw/misc/stm32l4x5_exti.c b/hw/misc/stm32l4x5_exti.c
> index eebefc6cd3..bdc3dc10d6 100644
> --- a/hw/misc/stm32l4x5_exti.c
> +++ b/hw/misc/stm32l4x5_exti.c
> @@ -87,6 +87,7 @@ static void stm32l4x5_exti_reset_hold(Object *obj, 
> ResetType type)
>  s->ftsr[bank] = 0x;
>  s->swier[bank] = 0x;
>  s->pr[bank] = 0x;
> +s->irq_levels[bank] = 0x;
>  }
>  }
>
> @@ -106,17 +107,30 @@ static void stm32l4x5_exti_set_irq(void *opaque, int 
> irq, int level)
>  return;
>  }
>
> +/* In case of a direct line interrupt */
> +if (extract32(exti_romask[bank], irq, 1)) {
> +qemu_set_irq(s->irq[oirq], level);
> +return;
> +}
> +
> +/* In case of a configurable interrupt */
>  if (((1 << irq) & s->rtsr[bank]) && level) {
>  /* Rising Edge */
> -s->pr[bank] |= 1 << irq;
> -qemu_irq_pulse(s->irq[oirq]);
> +if (!extract32(s->irq_levels[bank], irq, 1)) {
> +s->pr[bank] |= 1 << irq;
> +qemu_irq_pulse(s->irq[oirq]);
> +}
> +s->irq_levels[bank] |= 1 << irq;
>  } else if (((1 << irq) & s->ftsr[bank]) && !level) {
>  /* Falling Edge */
> -s->pr[bank] |= 1 << irq;
> -qemu_irq_pulse(s->irq[oirq]);
> +if (extract32(s->irq_levels[bank], irq, 1)) {
> +s->pr[bank] |= 1 << irq;
> +qemu_irq_pulse(s->irq[oirq]);
> +}
> +s->irq_levels[bank] &= ~(1 << irq);
>  }

The irq_levels[] array should be updated based on 'level'
separately from determining whether it's a rising or falling
edge. With this code, if for instance the line is
configured for rising-edge detection then we never clear
the irq_levels[] bit when the level falls again, because
the only place we're clearing irq_levels bits is inside the
falling-edge-detected codepath.

We also don't get the case of "guest set the bit in both the
RTSR and FTSR right, which the datasheet specifically calls out
as permitted.

I think something like this will do the right thing:

if (level == extract32(s->irq_levels[bank], irq, 1)) {
/* No change in IRQ line state: do nothing */
return;
}
s->irq_levels[bank] = deposit32(s->irq_levels[bank], irq, level);

if ((level && extract32(s->rtsr[bank], irq, 1) ||
(!level && extract32(s->ftsr[bank], irq, 1)) {
/*
 * Trigger on this rising or falling edge. Note that the guest
 * can configure the same interrupt line to trigger on both
 * rising and falling edges if it wishes, or to not trigger
 * at all.
 */
s->pr[bank] |= 1 << irq;
qemu_irq_pulse(s->irq[oirq]);
}

(I would then consider the "In the following situations" comment
to be a bit redundant and deleteable, but that's a matter of taste.)

>  /*
> - * In the following situations :
> + * In the following situations (for configurable interrupts) :
>   * - falling edge but rising trigger selected
>   * - rising edge but falling trigger selected
>   * - no trigger selected
> @@ -262,6 +276,7 @@ static const VMStateDescription vmstate_stm32l4x5_exti = {
>  VMSTATE_UINT32_ARRAY(ftsr, Stm32l4x5ExtiState, EXTI_NUM_REGISTER),
>  VMSTATE_UINT32_ARRAY(swier, Stm32l4x5ExtiState, EXTI_NUM_REGISTER),
>  VMSTATE_UINT32_ARRAY(pr, Stm32l4x5ExtiState, EXTI_NUM_REGISTER),
> +VMSTATE_UINT32_ARRAY(irq_levels, Stm32l4x5ExtiState,

Re: [PATCH 1/1] block: drop force_dup parameter of raw_reconfigure_getfd()

2024-05-28 Thread Denis V. Lunev


On 4/30/24 19:02, Denis V. Lunev wrote:

This parameter is always passed as 'false' from the caller.

Signed-off-by: Denis V. Lunev 
CC: Andrey Zhadchenko 
CC: Kevin Wolf 
CC: Hanna Reitz 
---
  block/file-posix.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 35684f7e21..5c46938936 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1039,8 +1039,7 @@ static int fcntl_setfl(int fd, int flag)
  }
  
  static int raw_reconfigure_getfd(BlockDriverState *bs, int flags,

- int *open_flags, uint64_t perm, bool 
force_dup,
- Error **errp)
+ int *open_flags, uint64_t perm, Error **errp)
  {
  BDRVRawState *s = bs->opaque;
  int fd = -1;
@@ -1068,7 +1067,7 @@ static int raw_reconfigure_getfd(BlockDriverState *bs, 
int flags,
  assert((s->open_flags & O_ASYNC) == 0);
  #endif
  
-if (!force_dup && *open_flags == s->open_flags) {

+if (*open_flags == s->open_flags) {
  /* We're lucky, the existing fd is fine */
  return s->fd;
  }
@@ -3748,8 +3747,7 @@ static int raw_check_perm(BlockDriverState *bs, uint64_t 
perm, uint64_t shared,
  int ret;
  
  /* We may need a new fd if auto-read-only switches the mode */

-ret = raw_reconfigure_getfd(bs, input_flags, &open_flags, perm,
-false, errp);
+ret = raw_reconfigure_getfd(bs, input_flags, &open_flags, perm, errp);
  if (ret < 0) {
  return ret;
  } else if (ret != s->fd) {

ping v2

Re: [PATCH 1/1] prealloc: add truncate mode for prealloc filter

2024-05-28 Thread Denis V. Lunev


On 4/30/24 19:05, Denis V. Lunev wrote:

Preallocate filter allows to implement really interesting setups.

Assume that we have
* shared block device, f.e. iSCSI LUN, implemented with some HW device
* clustered LVM on top of it
* QCOW2 image stored inside LVM volume

This allows very cheap clustered setups with all QCOW2 features intact.
Currently supported setups using QCOW2 with data_file option are not
so cool as snapshots are not allowed, QCOW2 should be placed into some
additional distributed storage and so on.

Though QCOW2 inside LVM volume has a drawback. The image is growing and
in order to accomodate that image LVM volume is to be resized. This
could be done externally using ENOSPACE event/condition but this is
cumbersome.

This patch introduces native implementation for such a setup. We should
just put prealloc filter in between QCOW2 format and file nodes. In that
case LVM will be resized at proper moment and that is done effectively
as resizing is done in chinks.

The patch adds allocation mode for this purpose in order to distinguish
'fallocate' for ordinary file system and 'truncate'.

Signed-off-by: Denis V. Lunev 
CC: Alexander Ivanov 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Vladimir Sementsov-Ogievskiy 
---
  block/preallocate.c | 50 +++--
  1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/block/preallocate.c b/block/preallocate.c
index 4d82125036..6d31627325 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -33,10 +33,24 @@
  #include "block/block-io.h"
  #include "block/block_int.h"
  
+typedef enum PreallocateMode {

+PREALLOCATE_MODE_FALLOCATE = 0,
+PREALLOCATE_MODE_TRUNCATE = 1,
+PREALLOCATE_MODE__MAX = 2,
+} PreallocateMode;
+
+static QEnumLookup prealloc_mode_lookup = {
+.array = (const char *const[]) {
+"falloc",
+"truncate",
+},
+.size = PREALLOCATE_MODE__MAX,
+};
  
  typedef struct PreallocateOpts {

  int64_t prealloc_size;
  int64_t prealloc_align;
+PreallocateMode prealloc_mode;
  } PreallocateOpts;
  
  typedef struct BDRVPreallocateState {

@@ -79,6 +93,7 @@ typedef struct BDRVPreallocateState {
  
  #define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align"

  #define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size"
+#define PREALLOCATE_OPT_MODE "mode"
  static QemuOptsList runtime_opts = {
  .name = "preallocate",
  .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
@@ -94,7 +109,14 @@ static QemuOptsList runtime_opts = {
  .type = QEMU_OPT_SIZE,
  .help = "how much to preallocate, default 128M",
  },
-{ /* end of list */ }
+{
+.name = PREALLOCATE_OPT_MODE,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode on image expansion "
+"(allowed values: falloc, truncate)",
+.def_value_str = "falloc",
+},
+{ /* end of list */ },
  },
  };
  
@@ -102,6 +124,8 @@ static bool preallocate_absorb_opts(PreallocateOpts *dest, QDict *options,

  BlockDriverState *child_bs, Error **errp)
  {
  QemuOpts *opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
+Error *local_err = NULL;
+char *buf;
  
  if (!qemu_opts_absorb_qdict(opts, options, errp)) {

  return false;
@@ -112,6 +136,17 @@ static bool preallocate_absorb_opts(PreallocateOpts *dest, 
QDict *options,
  dest->prealloc_size =
  qemu_opt_get_size(opts, PREALLOCATE_OPT_PREALLOC_SIZE, 128 * MiB);
  
+buf = qemu_opt_get_del(opts, PREALLOCATE_OPT_MODE);

+/* prealloc_mode can be downgraded later during allocate_clusters */
+dest->prealloc_mode = qapi_enum_parse(&prealloc_mode_lookup, buf,
+  PREALLOCATE_MODE_FALLOCATE,
+  &local_err);
+g_free(buf);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return false;
+}
+
  qemu_opts_del(opts);
  
  if (!QEMU_IS_ALIGNED(dest->prealloc_align, BDRV_SECTOR_SIZE)) {

@@ -335,9 +370,20 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t 
bytes,
  
  want_merge_zero = want_merge_zero && (prealloc_start <= offset);
  
-ret = bdrv_co_pwrite_zeroes(

+switch (s->opts.prealloc_mode) {
+case PREALLOCATE_MODE_FALLOCATE:
+ret = bdrv_co_pwrite_zeroes(
  bs->file, prealloc_start, prealloc_end - prealloc_start,
  BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
+break;
+case PREALLOCATE_MODE_TRUNCATE:
+ret = bdrv_co_truncate(bs->file, prealloc_end, false,
+   PREALLOC_MODE_OFF, 0, NULL);
+break;
+default:
+return false;
+}
+
  if (ret < 0) {
  s->file_end = ret;
  return false;

ping v2

Re: [PATCH 1/2] ppc/pnv: Fix loss of LPC SERIRQ interrupts

2024-05-28 Thread Miles Glenn

Reviewed-by: Glenn Miles 

Thanks,

Glenn

On Tue, 2024-05-28 at 16:20 +1000, Nicholas Piggin wrote:
> From: Glenn Miles 
> 
> The LPC HC irq status register bits are set when an LPC IRQSER input
> is
> asserted. These irq status bits drive the PSI irq to the CPU
> interrupt
> controller. The LPC HC irq status bits are cleared by software
> writing
> to the register with 1's for the bits to clear.
> 
> Existing register write was clearing the irq status bits even when
> the
> input was asserted, this results in interrupts being lost.
> 
> This fix changes the behavior to keep track of the device IRQ status
> in internal state that is separate from the irq status register, and
> only allowing the irq status bits to be cleared if the associated
> input is not asserted.
> 
> [np: rebased before P9 PSI SERIRQ patch, adjust changelog/comments]
> Signed-off-by: Glenn Miles 
> Signed-off-by: Nicholas Piggin 
> ---
>  include/hw/ppc/pnv_lpc.h |  3 +++
>  hw/ppc/pnv_lpc.c | 22 +++---
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/ppc/pnv_lpc.h b/include/hw/ppc/pnv_lpc.h
> index 5d22c45570..97c6872c3f 100644
> --- a/include/hw/ppc/pnv_lpc.h
> +++ b/include/hw/ppc/pnv_lpc.h
> @@ -73,6 +73,9 @@ struct PnvLpcController {
>  uint32_t opb_irq_pol;
>  uint32_t opb_irq_input;
>  
> +/* LPC device IRQ state */
> +uint32_t lpc_hc_irq_inputs;
> +
>  /* LPC HC registers */
>  uint32_t lpc_hc_fw_seg_idsel;
>  uint32_t lpc_hc_fw_rd_acc_size;
> diff --git a/hw/ppc/pnv_lpc.c b/hw/ppc/pnv_lpc.c
> index d692858bee..252690dcaa 100644
> --- a/hw/ppc/pnv_lpc.c
> +++ b/hw/ppc/pnv_lpc.c
> @@ -505,7 +505,14 @@ static void lpc_hc_write(void *opaque, hwaddr
> addr, uint64_t val,
>  pnv_lpc_eval_irqs(lpc);
>  break;
>  case LPC_HC_IRQSTAT:
> -lpc->lpc_hc_irqstat &= ~val;
> +/*
> + * This register is write-to-clear for the IRQSER (LPC
> device IRQ)
> + * status. However if the device has not de-asserted its
> interrupt
> + * that will just raise this IRQ status bit again. Model
> this by
> + * keeping track of the inputs and only clearing if the
> inputs are
> + * deasserted.
> + */
> +lpc->lpc_hc_irqstat &= ~(val & ~lpc->lpc_hc_irq_inputs);
>  pnv_lpc_eval_irqs(lpc);
>  break;
>  case LPC_HC_ERROR_ADDRESS:
> @@ -803,11 +810,20 @@ static void pnv_lpc_isa_irq_handler_cpld(void
> *opaque, int n, int level)
>  static void pnv_lpc_isa_irq_handler(void *opaque, int n, int level)
>  {
>  PnvLpcController *lpc = PNV_LPC(opaque);
> +uint32_t irq_bit = LPC_HC_IRQ_SERIRQ0 >> n;
>  
> -/* The Naples HW latches the 1 levels, clearing is done by SW */
>  if (level) {
> -lpc->lpc_hc_irqstat |= LPC_HC_IRQ_SERIRQ0 >> n;
> +lpc->lpc_hc_irq_inputs |= irq_bit;
> +
> +/*
> +  * The LPC HC in Naples and later latches LPC IRQ into a bit
> field in
> +  * the IRQSTAT register, and that drives the PSI IRQ to the IC.
> +  * Software clears this bit manually (see LPC_HC_IRQSTAT
> handler).
> + */
> +lpc->lpc_hc_irqstat |= irq_bit;
>  pnv_lpc_eval_irqs(lpc);
> +} else {
> +lpc->lpc_hc_irq_inputs &= ~irq_bit;
>  }
>  }
>

Re: [RFC 1/6] scripts/simpletrace-rust: Add the basic cargo framework

2024-05-28 Thread Stefan Hajnoczi

On Tue, May 28, 2024 at 03:53:55PM +0800, Zhao Liu wrote:
> Hi Stefan,
> 
> [snip]
> 
> > > diff --git a/scripts/simpletrace-rust/.rustfmt.toml 
> > > b/scripts/simpletrace-rust/.rustfmt.toml
> > > new file mode 100644
> > > index ..97a97c24ebfb
> > > --- /dev/null
> > > +++ b/scripts/simpletrace-rust/.rustfmt.toml
> > > @@ -0,0 +1,9 @@
> > > +brace_style = "AlwaysNextLine"
> > > +comment_width = 80
> > > +edition = "2021"
> > > +group_imports = "StdExternalCrate"
> > > +imports_granularity = "item"
> > > +max_width = 80
> > > +use_field_init_shorthand = true
> > > +use_try_shorthand = true
> > > +wrap_comments = true
> > 
> > There should be QEMU-wide policy. That said, why is it necessary to 
> > customize rustfmt?
> 
> Indeed, but QEMU's style for Rust is currently undefined, so I'm trying
> to add this to make it easier to check the style...I will separate it
> out as a style policy proposal.

Why is a config file necessary? QEMU should use the default Rust style.

Stefan


signature.asc
Description: PGP signature

[PULL 13/42] target/arm: Split out gengvec.c

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-8-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate.h |5 +
 target/arm/tcg/gengvec.c   | 1612 
 target/arm/tcg/translate.c | 1588 ---
 target/arm/tcg/meson.build |1 +
 4 files changed, 1618 insertions(+), 1588 deletions(-)
 create mode 100644 target/arm/tcg/gengvec.c

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index dc66ff21908..80e85096a83 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -445,6 +445,11 @@ void gen_gvec_ssra(unsigned vece, uint32_t rd_ofs, 
uint32_t rm_ofs,
 void gen_gvec_usra(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
int64_t shift, uint32_t opr_sz, uint32_t max_sz);
 
+void gen_srshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh);
+void gen_srshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh);
+void gen_urshr32_i32(TCGv_i32 d, TCGv_i32 a, int32_t sh);
+void gen_urshr64_i64(TCGv_i64 d, TCGv_i64 a, int64_t sh);
+
 void gen_gvec_srshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
 int64_t shift, uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_urshr(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
new file mode 100644
index 000..7a1856253ff
--- /dev/null
+++ b/target/arm/tcg/gengvec.c
@@ -0,0 +1,1612 @@
+/*
+ *  ARM generic vector expansion
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *  Copyright (c) 2005-2007 CodeSourcery
+ *  Copyright (c) 2007 OpenedHand, Ltd.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "translate.h"
+
+
+static void gen_gvec_fn3_qc(uint32_t rd_ofs, uint32_t rn_ofs, uint32_t rm_ofs,
+uint32_t opr_sz, uint32_t max_sz,
+gen_helper_gvec_3_ptr *fn)
+{
+TCGv_ptr qc_ptr = tcg_temp_new_ptr();
+
+tcg_gen_addi_ptr(qc_ptr, tcg_env, offsetof(CPUARMState, vfp.qc));
+tcg_gen_gvec_3_ptr(rd_ofs, rn_ofs, rm_ofs, qc_ptr,
+   opr_sz, max_sz, 0, fn);
+}
+
+void gen_gvec_sqrdmlah_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+  uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[2] = {
+gen_helper_gvec_qrdmlah_s16, gen_helper_gvec_qrdmlah_s32
+};
+tcg_debug_assert(vece >= 1 && vece <= 2);
+gen_gvec_fn3_qc(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, fns[vece - 1]);
+}
+
+void gen_gvec_sqrdmlsh_qc(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+  uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3_ptr * const fns[2] = {
+gen_helper_gvec_qrdmlsh_s16, gen_helper_gvec_qrdmlsh_s32
+};
+tcg_debug_assert(vece >= 1 && vece <= 2);
+gen_gvec_fn3_qc(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, fns[vece - 1]);
+}
+
+#define GEN_CMP0(NAME, COND)  \
+void NAME(unsigned vece, uint32_t d, uint32_t m,  \
+  uint32_t opr_sz, uint32_t max_sz)   \
+{ tcg_gen_gvec_cmpi(COND, vece, d, m, 0, opr_sz, max_sz); }
+
+GEN_CMP0(gen_gvec_ceq0, TCG_COND_EQ)
+GEN_CMP0(gen_gvec_cle0, TCG_COND_LE)
+GEN_CMP0(gen_gvec_cge0, TCG_COND_GE)
+GEN_CMP0(gen_gvec_clt0, TCG_COND_LT)
+GEN_CMP0(gen_gvec_cgt0, TCG_COND_GT)
+
+#undef GEN_CMP0
+
+static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+tcg_gen_vec_sar8i_i64(a, a, shift);
+tcg_gen_vec_add8_i64(d, d, a);
+}
+
+static void gen_ssra16_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+tcg_gen_vec_sar16i_i64(a, a, shift);
+tcg_gen_vec_add16_i64(d, d, a);
+}
+
+static void gen_ssra32_i32(TCGv_i32 d, TCGv_i32 a, int32_t shift)
+{
+tcg_gen_sari_i32(a, a, shift);
+tcg_gen_add_i32(d, d, a);
+}
+
+static void gen_ssra64_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
+{
+tcg_gen_sari_i64(a, a, shift);
+tcg_gen_add_i64(d, d, a);
+}
+
+static void gen_ssra_vec(unsigned vece, TCGv_vec d, TCGv_vec a, int64_t sh)
+{
+tcg_gen_sari_vec(vece, a, a, sh);
+tcg_gen_add_vec(vece, d, d, a);
+}
+
+void gen_gvec_ssra(unsigned vece, uint32_t rd_of

[PULL 02/42] hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers

2024-05-28 Thread Peter Maydell

From: Zenghui Yu 

We wrongly encoded ID_AA64PFR1_EL1 using {3,0,0,4,2} in hvf_sreg_match[] so
we fail to get the expected ARMCPRegInfo from cp_regs hash table with the
wrong key.

Fix it with the correct encoding {3,0,0,4,1}. With that fixed, the Linux
guest can properly detect FEAT_SSBS2 on my M1 HW.

All DBG{B,W}{V,C}R_EL1 registers are also wrongly encoded with op0 == 14.
It happens to work because HVF_SYSREG(CRn, CRm, 14, op1, op2) equals to
HVF_SYSREG(CRn, CRm, 2, op1, op2), by definition. But we shouldn't rely on
it.

Cc: qemu-sta...@nongnu.org
Fixes: a1477da3ddeb ("hvf: Add Apple Silicon support")
Signed-off-by: Zenghui Yu 
Reviewed-by: Alexander Graf 
Message-id: 20240503153453.54389-1-zenghui...@linux.dev
Signed-off-by: Peter Maydell 
---
 target/arm/hvf/hvf.c | 130 +--
 1 file changed, 65 insertions(+), 65 deletions(-)

diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 08d0757438c..45e2218be58 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -396,85 +396,85 @@ struct hvf_sreg_match {
 };
 
 static struct hvf_sreg_match hvf_sreg_match[] = {
-{ HV_SYS_REG_DBGBVR0_EL1, HVF_SYSREG(0, 0, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR0_EL1, HVF_SYSREG(0, 0, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR0_EL1, HVF_SYSREG(0, 0, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR0_EL1, HVF_SYSREG(0, 0, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR0_EL1, HVF_SYSREG(0, 0, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR0_EL1, HVF_SYSREG(0, 0, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR0_EL1, HVF_SYSREG(0, 0, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR0_EL1, HVF_SYSREG(0, 0, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR1_EL1, HVF_SYSREG(0, 1, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR1_EL1, HVF_SYSREG(0, 1, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR1_EL1, HVF_SYSREG(0, 1, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR1_EL1, HVF_SYSREG(0, 1, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR1_EL1, HVF_SYSREG(0, 1, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR1_EL1, HVF_SYSREG(0, 1, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR1_EL1, HVF_SYSREG(0, 1, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR1_EL1, HVF_SYSREG(0, 1, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR2_EL1, HVF_SYSREG(0, 2, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR2_EL1, HVF_SYSREG(0, 2, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR2_EL1, HVF_SYSREG(0, 2, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR2_EL1, HVF_SYSREG(0, 2, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR2_EL1, HVF_SYSREG(0, 2, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR2_EL1, HVF_SYSREG(0, 2, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR2_EL1, HVF_SYSREG(0, 2, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR2_EL1, HVF_SYSREG(0, 2, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR3_EL1, HVF_SYSREG(0, 3, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR3_EL1, HVF_SYSREG(0, 3, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR3_EL1, HVF_SYSREG(0, 3, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR3_EL1, HVF_SYSREG(0, 3, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR3_EL1, HVF_SYSREG(0, 3, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR3_EL1, HVF_SYSREG(0, 3, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR3_EL1, HVF_SYSREG(0, 3, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR3_EL1, HVF_SYSREG(0, 3, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR4_EL1, HVF_SYSREG(0, 4, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR4_EL1, HVF_SYSREG(0, 4, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR4_EL1, HVF_SYSREG(0, 4, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR4_EL1, HVF_SYSREG(0, 4, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR4_EL1, HVF_SYSREG(0, 4, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR4_EL1, HVF_SYSREG(0, 4, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR4_EL1, HVF_SYSREG(0, 4, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR4_EL1, HVF_SYSREG(0, 4, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR5_EL1, HVF_SYSREG(0, 5, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR5_EL1, HVF_SYSREG(0, 5, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR5_EL1, HVF_SYSREG(0, 5, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR5_EL1, HVF_SYSREG(0, 5, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR5_EL1, HVF_SYSREG(0, 5, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR5_EL1, HVF_SYSREG(0, 5, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR5_EL1, HVF_SYSREG(0, 5, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR5_EL1, HVF_SYSREG(0, 5, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR6_EL1, HVF_SYSREG(0, 6, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR6_EL1, HVF_SYSREG(0, 6, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR6_EL1, HVF_SYSREG(0, 6, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR6_EL1, HVF_SYSREG(0, 6, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR6_EL1, HVF_SYSREG(0, 6, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR6_EL1, HVF_SYSREG(0, 6, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR6_EL1, HVF_SYSREG(0, 6, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR6_EL1, HVF_SYSREG(0, 6, 2, 0, 7) },
 
-{ HV_SYS_REG_DBGBVR7_EL1, HVF_SYSREG(0, 7, 14, 0, 4) },
-{ HV_SYS_REG_DBGBCR7_EL1, HVF_SYSREG(0, 7, 14, 0, 5) },
-{ HV_SYS_REG_DBGWVR7_EL1, HVF_SYSREG(0, 7, 14, 0, 6) },
-{ HV_SYS_REG_DBGWCR7_EL1, HVF_SYSREG(0, 7, 14, 0, 7) },
+{ HV_SYS_REG_DBGBVR7_EL1, HVF_SYSREG(0, 7, 2, 0, 4) },
+{ HV_SYS_REG_DBGBCR7_EL1, HVF_SYSREG(0, 7, 2, 0, 5) },
+{ HV_SYS_REG_DBGWVR7_EL1, HVF_SYSREG(0, 7, 2, 0, 6) },
+{ HV_SYS_REG_DBGWCR7_EL1, HVF_SYS

[PULL 36/42] target/arm: Use gvec for neon faddp, fmaxp, fminp

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-31-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h |  7 -
 target/arm/tcg/translate-neon.c | 55 ++---
 target/arm/tcg/vec_helper.c | 45 ---
 3 files changed, 3 insertions(+), 104 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 32684773299..065460ea80e 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -650,13 +650,6 @@ DEF_HELPER_FLAGS_6(gvec_fcmlas_idx, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(gvec_fcmlad, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, i32)
 
-DEF_HELPER_FLAGS_5(neon_paddh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_5(neon_pmaxh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_5(neon_pminh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_5(neon_padds, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_5(neon_pmaxs, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-DEF_HELPER_FLAGS_5(neon_pmins, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
-
 DEF_HELPER_FLAGS_4(gvec_sstoh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_sitos, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_ustoh, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 144f18ba22e..2326a05a0aa 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -1144,6 +1144,9 @@ DO_3S_FP_GVEC(VFMA, gen_helper_gvec_vfma_s, 
gen_helper_gvec_vfma_h)
 DO_3S_FP_GVEC(VFMS, gen_helper_gvec_vfms_s, gen_helper_gvec_vfms_h)
 DO_3S_FP_GVEC(VRECPS, gen_helper_gvec_recps_nf_s, gen_helper_gvec_recps_nf_h)
 DO_3S_FP_GVEC(VRSQRTS, gen_helper_gvec_rsqrts_nf_s, 
gen_helper_gvec_rsqrts_nf_h)
+DO_3S_FP_GVEC(VPADD, gen_helper_gvec_faddp_s, gen_helper_gvec_faddp_h)
+DO_3S_FP_GVEC(VPMAX, gen_helper_gvec_fmaxp_s, gen_helper_gvec_fmaxp_h)
+DO_3S_FP_GVEC(VPMIN, gen_helper_gvec_fminp_s, gen_helper_gvec_fminp_h)
 
 WRAP_FP_GVEC(gen_VMAXNM_fp32_3s, FPST_STD, gen_helper_gvec_fmaxnum_s)
 WRAP_FP_GVEC(gen_VMAXNM_fp16_3s, FPST_STD_F16, gen_helper_gvec_fmaxnum_h)
@@ -1180,58 +1183,6 @@ static bool trans_VMINNM_fp_3s(DisasContext *s, 
arg_3same *a)
 return do_3same(s, a, gen_VMINNM_fp32_3s);
 }
 
-static bool do_3same_fp_pair(DisasContext *s, arg_3same *a,
- gen_helper_gvec_3_ptr *fn)
-{
-/* FP pairwise operations */
-TCGv_ptr fpstatus;
-
-if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-return false;
-}
-
-/* UNDEF accesses to D16-D31 if they don't exist. */
-if (!dc_isar_feature(aa32_simd_r32, s) &&
-((a->vd | a->vn | a->vm) & 0x10)) {
-return false;
-}
-
-if (!vfp_access_check(s)) {
-return true;
-}
-
-assert(a->q == 0); /* enforced by decode patterns */
-
-
-fpstatus = fpstatus_ptr(a->size == MO_16 ? FPST_STD_F16 : FPST_STD);
-tcg_gen_gvec_3_ptr(vfp_reg_offset(1, a->vd),
-   vfp_reg_offset(1, a->vn),
-   vfp_reg_offset(1, a->vm),
-   fpstatus, 8, 8, 0, fn);
-
-return true;
-}
-
-/*
- * For all the functions using this macro, size == 1 means fp16,
- * which is an architecture extension we don't implement yet.
- */
-#define DO_3S_FP_PAIR(INSN,FUNC)\
-static bool trans_##INSN##_fp_3s(DisasContext *s, arg_3same *a) \
-{   \
-if (a->size == MO_16) { \
-if (!dc_isar_feature(aa32_fp16_arith, s)) { \
-return false;   \
-}   \
-return do_3same_fp_pair(s, a, FUNC##h); \
-}   \
-return do_3same_fp_pair(s, a, FUNC##s); \
-}
-
-DO_3S_FP_PAIR(VPADD, gen_helper_neon_padd)
-DO_3S_FP_PAIR(VPMAX, gen_helper_neon_pmax)
-DO_3S_FP_PAIR(VPMIN, gen_helper_neon_pmin)
-
 static bool do_vector_2sh(DisasContext *s, arg_2reg_shift *a, GVecGen2iFn *fn)
 {
 /* Handle a 2-reg-shift insn which can be vectorized. */
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 79e1fdcaa9f..26a9ca9c14a 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2192,51 +2192,6 @@ DO_ABA(gvec_uaba_d, uint64_t)
 
 #undef DO_ABA
 
-#define DO_NEON_PAIRWISE(NAME, OP)  \
-void HELPER(NAME##s)(void *vd, void *vn, void *vm,  \
- void *stat, uint32_t oprsz)\
-{

[PULL 15/42] target/arm: Convert Cryptographic AES to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-10-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  | 21 +++--
 target/arm/tcg/translate-a64.c | 86 +++---
 2 files changed, 54 insertions(+), 53 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 0e7656fd158..1de09903dc4 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -19,11 +19,17 @@
 # This file is processed by scripts/decodetree.py
 #
 
-&r   rn
-&ri  rd imm
-&rri_sf  rd rn imm sf
-&i   imm
+%rd 0:5
 
+&r  rn
+&ri rd imm
+&rri_sf rd rn imm sf
+&i  imm
+&qrr_e  q rd rn esz
+&qrrr_e q rd rn rm esz
+
+@rr_q1e0  .. rn:5 rd:5  &qrr_e q=1 esz=0
+@r2r_q1e0     .. rm:5 rd:5  &qrrr_e rn=%rd q=1 
esz=0
 
 ### Data Processing - Immediate
 
@@ -590,3 +596,10 @@ CPYFE   00 011 0 01100 .  01 . . 
@cpy
 CPYP00 011 1 01000 .  01 . . @cpy
 CPYM00 011 1 01010 .  01 . . @cpy
 CPYE00 011 1 01100 .  01 . . @cpy
+
+### Cryptographic AES
+
+AESE01001110 00 10100 00100 10 . .  @r2r_q1e0
+AESD01001110 00 10100 00101 10 . .  @r2r_q1e0
+AESMC   01001110 00 10100 00110 10 . .  @rr_q1e0
+AESIMC  01001110 00 10100 00111 10 . .  @rr_q1e0
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 8842ff634d5..3894db4bee2 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1313,6 +1313,34 @@ bool sme_enabled_check_with_svcr(DisasContext *s, 
unsigned req)
 return true;
 }
 
+/*
+ * Expanders for AdvSIMD translation functions.
+ */
+
+static bool do_gvec_op2_ool(DisasContext *s, arg_qrr_e *a, int data,
+gen_helper_gvec_2 *fn)
+{
+if (!a->q && a->esz == MO_64) {
+return false;
+}
+if (fp_access_check(s)) {
+gen_gvec_op2_ool(s, a->q, a->rd, a->rn, data, fn);
+}
+return true;
+}
+
+static bool do_gvec_op3_ool(DisasContext *s, arg_qrrr_e *a, int data,
+gen_helper_gvec_3 *fn)
+{
+if (!a->q && a->esz == MO_64) {
+return false;
+}
+if (fp_access_check(s)) {
+gen_gvec_op3_ool(s, a->q, a->rd, a->rn, a->rm, data, fn);
+}
+return true;
+}
+
 /*
  * This utility function is for doing register extension with an
  * optional shift. You will likely want to pass a temporary for the
@@ -4560,6 +4588,15 @@ static bool trans_EXTR(DisasContext *s, arg_extract *a)
 return true;
 }
 
+/*
+ * Cryptographic AES
+ */
+
+TRANS_FEAT(AESE, aa64_aes, do_gvec_op3_ool, a, 0, gen_helper_crypto_aese)
+TRANS_FEAT(AESD, aa64_aes, do_gvec_op3_ool, a, 0, gen_helper_crypto_aesd)
+TRANS_FEAT(AESMC, aa64_aes, do_gvec_op2_ool, a, 0, gen_helper_crypto_aesmc)
+TRANS_FEAT(AESIMC, aa64_aes, do_gvec_op2_ool, a, 0, gen_helper_crypto_aesimc)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -13460,54 +13497,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto AES
- *  31 24 23  22 21   17 1612 11 10 95 40
- * +-+--+---++-+--+--+
- * | 0 1 0 0 1 1 1 0 | size | 1 0 1 0 0 | opcode | 1 0 |  Rn  |  Rd  |
- * +-+--+---++-+--+--+
- */
-static void disas_crypto_aes(DisasContext *s, uint32_t insn)
-{
-int size = extract32(insn, 22, 2);
-int opcode = extract32(insn, 12, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-gen_helper_gvec_2 *genfn2 = NULL;
-gen_helper_gvec_3 *genfn3 = NULL;
-
-if (!dc_isar_feature(aa64_aes, s) || size != 0) {
-unallocated_encoding(s);
-return;
-}
-
-switch (opcode) {
-case 0x4: /* AESE */
-genfn3 = gen_helper_crypto_aese;
-break;
-case 0x6: /* AESMC */
-genfn2 = gen_helper_crypto_aesmc;
-break;
-case 0x5: /* AESD */
-genfn3 = gen_helper_crypto_aesd;
-break;
-case 0x7: /* AESIMC */
-genfn2 = gen_helper_crypto_aesimc;
-break;
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-if (genfn2) {
-gen_gvec_op2_ool(s, true, rd, rn, 0, genfn2);
-} else {
-gen_gvec_op3_ool(s, true, rd, rd, rn, 0, genfn3);
-}
-}
-
 /* Crypto three-reg SHA
  *  31 24 23  22  21 20  16  15 1412 11 10 95 40
  *

[PULL 22/42] target/arm: Convert XAR to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-17-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |  4 
 target/arm/tcg/translate-a64.c | 43 +++---
 2 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 1292312a7f9..7f354af25d3 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -654,3 +654,7 @@ SM3TT1A 11001110 010 . 10 .. 00 . . 
@crypto3i
 SM3TT1B 11001110 010 . 10 .. 01 . . @crypto3i
 SM3TT2A 11001110 010 . 10 .. 10 . . @crypto3i
 SM3TT2B 11001110 010 . 10 .. 11 . . @crypto3i
+
+### Cryptographic XAR
+
+XAR 1100 1110 100 rm:5 imm:6 rn:5 rd:5
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index cf3a7dfa99f..75f1e6a7b90 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4688,6 +4688,20 @@ TRANS_FEAT(SM3TT1B, aa64_sm3, do_crypto3i, a, 
gen_helper_crypto_sm3tt1b)
 TRANS_FEAT(SM3TT2A, aa64_sm3, do_crypto3i, a, gen_helper_crypto_sm3tt2a)
 TRANS_FEAT(SM3TT2B, aa64_sm3, do_crypto3i, a, gen_helper_crypto_sm3tt2b)
 
+static bool trans_XAR(DisasContext *s, arg_XAR *a)
+{
+if (!dc_isar_feature(aa64_sha3, s)) {
+return false;
+}
+if (fp_access_check(s)) {
+gen_gvec_xar(MO_64, vec_full_reg_offset(s, a->rd),
+ vec_full_reg_offset(s, a->rn),
+ vec_full_reg_offset(s, a->rm), a->imm, 16,
+ vec_full_reg_size(s));
+}
+return true;
+}
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -13588,34 +13602,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto XAR
- *  31   21 20  16 1510 95 40
- * +---+--++--+--+
- * | 1 1 0 0 1 1 1 0 1 0 0 |  Rm  |  imm6  |  Rn  |  Rd  |
- * +---+--++--+--+
- */
-static void disas_crypto_xar(DisasContext *s, uint32_t insn)
-{
-int rm = extract32(insn, 16, 5);
-int imm6 = extract32(insn, 10, 6);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-
-if (!dc_isar_feature(aa64_sha3, s)) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-gen_gvec_xar(MO_64, vec_full_reg_offset(s, rd),
- vec_full_reg_offset(s, rn),
- vec_full_reg_offset(s, rm), imm6, 16,
- vec_full_reg_size(s));
-}
-
 /* C3.6 Data processing - SIMD, inc Crypto
  *
  * As the decode gets a little complex we are using a table based
@@ -13644,7 +13630,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
 { 0x5e000400, 0xdfe08400, disas_simd_scalar_copy },
 { 0x5f00, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
 { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
-{ 0xce80, 0xffe0, disas_crypto_xar },
 { 0x0e400400, 0x9f60c400, disas_simd_three_reg_same_fp16 },
 { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 },
 { 0x5e400400, 0xdf60c400, disas_simd_scalar_three_reg_same_fp16 },
-- 
2.34.1

[PULL 41/42] target/arm: Convert FMLAL, FMLSL to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-36-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |  10 +++
 target/arm/tcg/translate-a64.c | 144 ++---
 2 files changed, 51 insertions(+), 103 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 22dfe8568d6..7e993ed345f 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -797,6 +797,11 @@ FMLA_v  0.00 1110 0.1 . 11001 1 . . 
@qrrr_sd
 FMLS_v  0.00 1110 110 . 1 1 . . @qrrr_h
 FMLS_v  0.00 1110 1.1 . 11001 1 . . @qrrr_sd
 
+FMLAL_v 0.00 1110 001 . 11101 1 . . @qrrr_h
+FMLSL_v 0.00 1110 101 . 11101 1 . . @qrrr_h
+FMLAL2_v0.10 1110 001 . 11001 1 . . @qrrr_h
+FMLSL2_v0.10 1110 101 . 11001 1 . . @qrrr_h
+
 FCMEQ_v 0.00 1110 010 . 00100 1 . . @qrrr_h
 FCMEQ_v 0.00 1110 0.1 . 11100 1 . . @qrrr_sd
 
@@ -877,3 +882,8 @@ FMLS_vi 0.00  11 0 . 0101 . 0 . .   
@qrrx_d
 FMULX_vi0.10  00 ..  1001 . 0 . .   @qrrx_h
 FMULX_vi0.10  10 . . 1001 . 0 . .   @qrrx_s
 FMULX_vi0.10  11 0 . 1001 . 0 . .   @qrrx_d
+
+FMLAL_vi0.00  10 ..   . 0 . .   @qrrx_h
+FMLSL_vi0.00  10 ..  0100 . 0 . .   @qrrx_h
+FMLAL2_vi   0.10  10 ..  1000 . 0 . .   @qrrx_h
+FMLSL2_vi   0.10  10 ..  1100 . 0 . .   @qrrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 9fe70a939bc..a4ff1fd2027 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5256,6 +5256,24 @@ static gen_helper_gvec_3_ptr * const f_vector_fminnmp[3] 
= {
 };
 TRANS(FMINNMP_v, do_fp3_vector, a, f_vector_fminnmp)
 
+static bool do_fmlal(DisasContext *s, arg_qrrr_e *a, bool is_s, bool is_2)
+{
+if (fp_access_check(s)) {
+int data = (is_2 << 1) | is_s;
+tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+   vec_full_reg_offset(s, a->rn),
+   vec_full_reg_offset(s, a->rm), tcg_env,
+   a->q ? 16 : 8, vec_full_reg_size(s),
+   data, gen_helper_gvec_fmlal_a64);
+}
+return true;
+}
+
+TRANS_FEAT(FMLAL_v, aa64_fhm, do_fmlal, a, false, false)
+TRANS_FEAT(FMLSL_v, aa64_fhm, do_fmlal, a, true, false)
+TRANS_FEAT(FMLAL2_v, aa64_fhm, do_fmlal, a, false, true)
+TRANS_FEAT(FMLSL2_v, aa64_fhm, do_fmlal, a, true, true)
+
 TRANS(ADDP_v, do_gvec_fn3, a, gen_gvec_addp)
 TRANS(SMAXP_v, do_gvec_fn3_no64, a, gen_gvec_smaxp)
 TRANS(SMINP_v, do_gvec_fn3_no64, a, gen_gvec_sminp)
@@ -5447,6 +5465,24 @@ static bool do_fmla_vector_idx(DisasContext *s, 
arg_qrrx_e *a, bool neg)
 TRANS(FMLA_vi, do_fmla_vector_idx, a, false)
 TRANS(FMLS_vi, do_fmla_vector_idx, a, true)
 
+static bool do_fmlal_idx(DisasContext *s, arg_qrrx_e *a, bool is_s, bool is_2)
+{
+if (fp_access_check(s)) {
+int data = (a->idx << 2) | (is_2 << 1) | is_s;
+tcg_gen_gvec_3_ptr(vec_full_reg_offset(s, a->rd),
+   vec_full_reg_offset(s, a->rn),
+   vec_full_reg_offset(s, a->rm), tcg_env,
+   a->q ? 16 : 8, vec_full_reg_size(s),
+   data, gen_helper_gvec_fmlal_idx_a64);
+}
+return true;
+}
+
+TRANS_FEAT(FMLAL_vi, aa64_fhm, do_fmlal_idx, a, false, false)
+TRANS_FEAT(FMLSL_vi, aa64_fhm, do_fmlal_idx, a, true, false)
+TRANS_FEAT(FMLAL2_vi, aa64_fhm, do_fmlal_idx, a, false, true)
+TRANS_FEAT(FMLSL2_vi, aa64_fhm, do_fmlal_idx, a, true, true)
+
 /*
  * Advanced SIMD scalar pairwise
  */
@@ -10911,78 +10947,6 @@ static void disas_simd_3same_logic(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Floating point op subgroup of C3.6.16. */
-static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
-{
-/* For floating point ops, the U, size[1] and opcode bits
- * together indicate the operation. size[0] indicates single
- * or double.
- */
-int fpopcode = extract32(insn, 11, 5)
-| (extract32(insn, 23, 1) << 5)
-| (extract32(insn, 29, 1) << 6);
-int is_q = extract32(insn, 30, 1);
-int size = extract32(insn, 22, 1);
-int rm = extract32(insn, 16, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-
-if (size == 1 && !is_q) {
-unallocated_encoding(s);
-return;
-}
-
-switch (fpopcode) {
-case 0x1d: /* FMLAL  */
-case 0x3d: /* FMLSL  */
-case 0x59: /* FMLAL2 */
-case 0x79: /* FMLSL2 */
-if (size & 1 || !dc_isar_feature(aa64_fhm, s)) {
-unallocated_encoding(s);
-return;
-}
-if (

[PULL 39/42] target/arm: Convert SMAXP, SMINP, UMAXP, UMINP to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

These are the last instructions within handle_simd_3same_pair
so remove it.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-34-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|  16 +
 target/arm/tcg/translate.h |   8 +++
 target/arm/tcg/a64.decode  |   4 ++
 target/arm/tcg/gengvec.c   |  48 +
 target/arm/tcg/translate-a64.c | 119 +
 target/arm/tcg/vec_helper.c|  16 +
 6 files changed, 109 insertions(+), 102 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 51ed49aa50c..f830531dd3d 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1064,6 +1064,22 @@ DEF_HELPER_FLAGS_4(gvec_addp_h, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_addp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_addp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(gvec_smaxp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smaxp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_smaxp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_sminp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sminp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_umaxp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umaxp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_umaxp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(gvec_uminp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_uminp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_uminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "tcg/helper-a64.h"
 #include "tcg/helper-sve.h"
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 04771f483b6..3abdbedfe5c 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -516,6 +516,14 @@ void gen_gvec_uaba(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 
 void gen_gvec_addp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_smaxp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_sminp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_umaxp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_uminp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 84f5bcc0e08..22dfe8568d6 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -837,6 +837,10 @@ FMINNMP_v   0.10 1110 110 . 0 1 . . 
@qrrr_h
 FMINNMP_v   0.10 1110 1.1 . 11000 1 . . @qrrr_sd
 
 ADDP_v  0.00 1110 ..1 . 10111 1 . . @qrrr_e
+SMAXP_v 0.00 1110 ..1 . 10100 1 . . @qrrr_e
+SMINP_v 0.00 1110 ..1 . 10101 1 . . @qrrr_e
+UMAXP_v 0.10 1110 ..1 . 10100 1 . . @qrrr_e
+UMINP_v 0.10 1110 ..1 . 10101 1 . . @qrrr_e
 
 ### Advanced SIMD scalar x indexed element
 
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index f010dd5a0e8..22c9d17dce4 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1622,3 +1622,51 @@ void gen_gvec_addp(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 };
 tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
 }
+
+void gen_gvec_smaxp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3 * const fns[4] = {
+gen_helper_gvec_smaxp_b,
+gen_helper_gvec_smaxp_h,
+gen_helper_gvec_smaxp_s,
+};
+tcg_debug_assert(vece <= MO_32);
+tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_gvec_sminp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3 * const fns[4] = {
+gen_helper_gvec_sminp_b,
+gen_helper_gvec_sminp_h,
+gen_helper_gvec_sminp_s,
+};
+tcg_debug_assert(vece <= MO_32);
+tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
+
+void gen_gvec_umaxp(u

[PULL 18/42] target/arm: Convert Cryptographic 3-register SHA512 to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-13-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  | 11 
 target/arm/tcg/translate-a64.c | 97 --
 2 files changed, 32 insertions(+), 76 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 350afabc779..c342c276089 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -31,6 +31,7 @@
 @rr_q1e0  .. rn:5 rd:5  &qrr_e q=1 esz=0
 @r2r_q1e0     .. rm:5 rd:5  &qrrr_e rn=%rd q=1 
esz=0
 @rrr_q1e0    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=0
+@rrr_q1e3    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=3
 
 ### Data Processing - Immediate
 
@@ -620,3 +621,13 @@ SHA256SU1   0101 1110 000 . 011000 . .  
@rrr_q1e0
 SHA1H   0101 1110 0010 1000  10 . . @rr_q1e0
 SHA1SU1 0101 1110 0010 1000 0001 10 . . @rr_q1e0
 SHA256SU0   0101 1110 0010 1000 0010 10 . . @rr_q1e0
+
+### Cryptographic three-register SHA512
+
+SHA512H 1100 1110 011 . 10 . .  @rrr_q1e0
+SHA512H21100 1110 011 . 11 . .  @rrr_q1e0
+SHA512SU1   1100 1110 011 . 100010 . .  @rrr_q1e0
+RAX11100 1110 011 . 100011 . .  @rrr_q1e3
+SM3PARTW1   1100 1110 011 . 11 . .  @rrr_q1e0
+SM3PARTW2   1100 1110 011 . 110001 . .  @rrr_q1e0
+SM4EKEY 1100 1110 011 . 110010 . .  @rrr_q1e0
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1d20bf0c35b..77b24cd52ed 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1341,6 +1341,17 @@ static bool do_gvec_op3_ool(DisasContext *s, arg_qrrr_e 
*a, int data,
 return true;
 }
 
+static bool do_gvec_fn3(DisasContext *s, arg_qrrr_e *a, GVecGen3Fn *fn)
+{
+if (!a->q && a->esz == MO_64) {
+return false;
+}
+if (fp_access_check(s)) {
+gen_gvec_fn3(s, a->q, a->rd, a->rn, a->rm, fn, a->esz);
+}
+return true;
+}
+
 /*
  * This utility function is for doing register extension with an
  * optional shift. You will likely want to pass a temporary for the
@@ -4589,7 +4600,7 @@ static bool trans_EXTR(DisasContext *s, arg_extract *a)
 }
 
 /*
- * Cryptographic AES, SHA
+ * Cryptographic AES, SHA, SHA512
  */
 
 TRANS_FEAT(AESE, aa64_aes, do_gvec_op3_ool, a, 0, gen_helper_crypto_aese)
@@ -4610,6 +4621,15 @@ TRANS_FEAT(SHA1H, aa64_sha1, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha1h)
 TRANS_FEAT(SHA1SU1, aa64_sha1, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha1su1)
 TRANS_FEAT(SHA256SU0, aa64_sha256, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha256su0)
 
+TRANS_FEAT(SHA512H, aa64_sha512, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha512h)
+TRANS_FEAT(SHA512H2, aa64_sha512, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha512h2)
+TRANS_FEAT(SHA512SU1, aa64_sha512, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha512su1)
+TRANS_FEAT(RAX1, aa64_sha3, do_gvec_fn3, a, gen_gvec_rax1)
+TRANS_FEAT(SM3PARTW1, aa64_sm3, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sm3partw1)
+TRANS_FEAT(SM3PARTW2, aa64_sm3, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sm3partw2)
+TRANS_FEAT(SM4EKEY, aa64_sm4, do_gvec_op3_ool, a, 0, gen_helper_crypto_sm4ekey)
+
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -13510,80 +13530,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto three-reg SHA512
- *  31   21 20  16 15  14  13 12  11  10  95 40
- * +---+--+---+---+-++--+--+
- * | 1 1 0 0 1 1 1 0 0 1 1 |  Rm  | 1 | O | 0 0 | opcode |  Rn  |  Rd  |
- * +---+--+---+---+-++--+--+
- */
-static void disas_crypto_three_reg_sha512(DisasContext *s, uint32_t insn)
-{
-int opcode = extract32(insn, 10, 2);
-int o =  extract32(insn, 14, 1);
-int rm = extract32(insn, 16, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-bool feature;
-gen_helper_gvec_3 *oolfn = NULL;
-GVecGen3Fn *gvecfn = NULL;
-
-if (o == 0) {
-switch (opcode) {
-case 0: /* SHA512H */
-feature = dc_isar_feature(aa64_sha512, s);
-oolfn = gen_helper_crypto_sha512h;
-break;
-case 1: /* SHA512H2 */
-feature = dc_isar_feature(aa64_sha512, s);
-oolfn = gen_helper_crypto_sha512h2;
-break;
-case 2: /* SHA512SU1 */
-feature = dc_isar_feature(aa64_sha512, s);
-oolfn = gen_helper_crypto_sha512su1;
-break;
-c

[PULL 23/42] target/arm: Convert Advanced SIMD copy to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-18-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |  13 +
 target/arm/tcg/translate-a64.c | 426 +++--
 2 files changed, 152 insertions(+), 287 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7f354af25d3..d5bfeae7a82 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -658,3 +658,16 @@ SM3TT2B 11001110 010 . 10 .. 11 . . 
@crypto3i
 ### Cryptographic XAR
 
 XAR 1100 1110 100 rm:5 imm:6 rn:5 rd:5
+
+### Advanced SIMD scalar copy
+
+DUP_element_s   0101 1110 000 imm:5 0  1 rn:5 rd:5
+
+### Advanced SIMD copy
+
+DUP_element_v   0 q:1 00 1110 000 imm:5 0  1 rn:5 rd:5
+DUP_general 0 q:1 00 1110 000 imm:5 0 0001 1 rn:5 rd:5
+INS_general 0 1   00 1110 000 imm:5 0 0011 1 rn:5 rd:5
+SMOV0 q:1 00 1110 000 imm:5 0 0101 1 rn:5 rd:5
+UMOV0 q:1 00 1110 000 imm:5 0 0111 1 rn:5 rd:5
+INS_element 0 1   10 1110 000 di:5  0 si:4 1 rn:5 rd:5
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 75f1e6a7b90..1a12bf22fd8 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4702,6 +4702,145 @@ static bool trans_XAR(DisasContext *s, arg_XAR *a)
 return true;
 }
 
+/*
+ * Advanced SIMD copy
+ */
+
+static bool decode_esz_idx(int imm, MemOp *pesz, unsigned *pidx)
+{
+unsigned esz = ctz32(imm);
+if (esz <= MO_64) {
+*pesz = esz;
+*pidx = imm >> (esz + 1);
+return true;
+}
+return false;
+}
+
+static bool trans_DUP_element_s(DisasContext *s, arg_DUP_element_s *a)
+{
+MemOp esz;
+unsigned idx;
+
+if (!decode_esz_idx(a->imm, &esz, &idx)) {
+return false;
+}
+if (fp_access_check(s)) {
+/*
+ * This instruction just extracts the specified element and
+ * zero-extends it into the bottom of the destination register.
+ */
+TCGv_i64 tmp = tcg_temp_new_i64();
+read_vec_element(s, tmp, a->rn, idx, esz);
+write_fp_dreg(s, a->rd, tmp);
+}
+return true;
+}
+
+static bool trans_DUP_element_v(DisasContext *s, arg_DUP_element_v *a)
+{
+MemOp esz;
+unsigned idx;
+
+if (!decode_esz_idx(a->imm, &esz, &idx)) {
+return false;
+}
+if (esz == MO_64 && !a->q) {
+return false;
+}
+if (fp_access_check(s)) {
+tcg_gen_gvec_dup_mem(esz, vec_full_reg_offset(s, a->rd),
+ vec_reg_offset(s, a->rn, idx, esz),
+ a->q ? 16 : 8, vec_full_reg_size(s));
+}
+return true;
+}
+
+static bool trans_DUP_general(DisasContext *s, arg_DUP_general *a)
+{
+MemOp esz;
+unsigned idx;
+
+if (!decode_esz_idx(a->imm, &esz, &idx)) {
+return false;
+}
+if (esz == MO_64 && !a->q) {
+return false;
+}
+if (fp_access_check(s)) {
+tcg_gen_gvec_dup_i64(esz, vec_full_reg_offset(s, a->rd),
+ a->q ? 16 : 8, vec_full_reg_size(s),
+ cpu_reg(s, a->rn));
+}
+return true;
+}
+
+static bool do_smov_umov(DisasContext *s, arg_SMOV *a, MemOp is_signed)
+{
+MemOp esz;
+unsigned idx;
+
+if (!decode_esz_idx(a->imm, &esz, &idx)) {
+return false;
+}
+if (is_signed) {
+if (esz == MO_64 || (esz == MO_32 && !a->q)) {
+return false;
+}
+} else {
+if (esz == MO_64 ? !a->q : a->q) {
+return false;
+}
+}
+if (fp_access_check(s)) {
+TCGv_i64 tcg_rd = cpu_reg(s, a->rd);
+read_vec_element(s, tcg_rd, a->rn, idx, esz | is_signed);
+if (is_signed && !a->q) {
+tcg_gen_ext32u_i64(tcg_rd, tcg_rd);
+}
+}
+return true;
+}
+
+TRANS(SMOV, do_smov_umov, a, MO_SIGN)
+TRANS(UMOV, do_smov_umov, a, 0)
+
+static bool trans_INS_general(DisasContext *s, arg_INS_general *a)
+{
+MemOp esz;
+unsigned idx;
+
+if (!decode_esz_idx(a->imm, &esz, &idx)) {
+return false;
+}
+if (fp_access_check(s)) {
+write_vec_element(s, cpu_reg(s, a->rn), a->rd, idx, esz);
+clear_vec_high(s, true, a->rd);
+}
+return true;
+}
+
+static bool trans_INS_element(DisasContext *s, arg_INS_element *a)
+{
+MemOp esz;
+unsigned didx, sidx;
+
+if (!decode_esz_idx(a->di, &esz, &didx)) {
+return false;
+}
+sidx = a->si >> esz;
+if (fp_access_check(s)) {
+TCGv_i64 tmp = tcg_temp_new_i64();
+
+read_vec_element(s, tmp, a->rn, sidx, esz);
+write_vec_element(s, tmp, a->rd, didx, esz);
+
+/* INS is considered a 128-bit write for SVE. */
+clear_vec_high(s, true, a->rd);
+}
+return true;
+}
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note

[PULL 12/42] target/arm: Verify sz=0 for Advanced SIMD scalar pairwise (fp16)

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

All of these insns have "if sz == '1' then UNDEFINED" in their pseudocode.
Fixes a RISU miscompare for invalid insn 0x5ef0c87a.

Fixes: 5c36d89567c ("arm/translate-a64: add all FP16 ops in 
simd_scalar_pairwise")
Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-id: 20240524232121.284515-7-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-a64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5455ae36850..0bdddb8517a 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8006,7 +8006,7 @@ static void disas_simd_scalar_pairwise(DisasContext *s, 
uint32_t insn)
 case 0x2f: /* FMINP */
 /* FP op, size[0] is 32 or 64 bit*/
 if (!u) {
-if (!dc_isar_feature(aa64_fp16, s)) {
+if ((size & 1) || !dc_isar_feature(aa64_fp16, s)) {
 unallocated_encoding(s);
 return;
 } else {
-- 
2.34.1

[PULL 20/42] target/arm: Convert Cryptographic 4-register to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-15-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |   8 ++
 target/arm/tcg/translate-a64.c | 132 +++--
 2 files changed, 51 insertions(+), 89 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 5a46205751c..ef6902e86a5 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -27,11 +27,13 @@
 &i  imm
 &qrr_e  q rd rn esz
 &qrrr_e q rd rn rm esz
+&q_eq rd rn rm ra esz
 
 @rr_q1e0  .. rn:5 rd:5  &qrr_e q=1 esz=0
 @r2r_q1e0     .. rm:5 rd:5  &qrrr_e rn=%rd q=1 
esz=0
 @rrr_q1e0    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=0
 @rrr_q1e3    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=3
+@_q1e3   ... rm:5 . ra:5 rn:5 rd:5  &q_e q=1 esz=3
 
 ### Data Processing - Immediate
 
@@ -636,3 +638,9 @@ SM4EKEY 1100 1110 011 . 110010 . .  
@rrr_q1e0
 
 SHA512SU0   1100 1110 110 0 10 . .  @rr_q1e0
 SM4E1100 1110 110 0 11 . .  @r2r_q1e0
+
+### Cryptographic four-register
+
+EOR31100 1110 000 . 0 . . . @_q1e3
+BCAX1100 1110 001 . 0 . . . @_q1e3
+SM3SS1  1100 1110 010 . 0 . . . @_q1e3
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index eed0abe9121..2951e7eb59e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -1352,6 +1352,17 @@ static bool do_gvec_fn3(DisasContext *s, arg_qrrr_e *a, 
GVecGen3Fn *fn)
 return true;
 }
 
+static bool do_gvec_fn4(DisasContext *s, arg_q_e *a, GVecGen4Fn *fn)
+{
+if (!a->q && a->esz == MO_64) {
+return false;
+}
+if (fp_access_check(s)) {
+gen_gvec_fn4(s, a->q, a->rd, a->rn, a->rm, a->ra, fn, a->esz);
+}
+return true;
+}
+
 /*
  * This utility function is for doing register extension with an
  * optional shift. You will likely want to pass a temporary for the
@@ -4632,6 +4643,38 @@ TRANS_FEAT(SM4EKEY, aa64_sm4, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sm4ekey)
 TRANS_FEAT(SHA512SU0, aa64_sha512, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha512su0)
 TRANS_FEAT(SM4E, aa64_sm4, do_gvec_op3_ool, a, 0, gen_helper_crypto_sm4e)
 
+TRANS_FEAT(EOR3, aa64_sha3, do_gvec_fn4, a, gen_gvec_eor3)
+TRANS_FEAT(BCAX, aa64_sha3, do_gvec_fn4, a, gen_gvec_bcax)
+
+static bool trans_SM3SS1(DisasContext *s, arg_SM3SS1 *a)
+{
+if (!dc_isar_feature(aa64_sm3, s)) {
+return false;
+}
+if (fp_access_check(s)) {
+TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+TCGv_i32 tcg_op3 = tcg_temp_new_i32();
+TCGv_i32 tcg_res = tcg_temp_new_i32();
+unsigned vsz, dofs;
+
+read_vec_element_i32(s, tcg_op1, a->rn, 3, MO_32);
+read_vec_element_i32(s, tcg_op2, a->rm, 3, MO_32);
+read_vec_element_i32(s, tcg_op3, a->ra, 3, MO_32);
+
+tcg_gen_rotri_i32(tcg_res, tcg_op1, 20);
+tcg_gen_add_i32(tcg_res, tcg_res, tcg_op2);
+tcg_gen_add_i32(tcg_res, tcg_res, tcg_op3);
+tcg_gen_rotri_i32(tcg_res, tcg_res, 25);
+
+/* Clear the whole register first, then store bits [127:96]. */
+vsz = vec_full_reg_size(s);
+dofs = vec_full_reg_offset(s, a->rd);
+tcg_gen_gvec_dup_imm(MO_64, dofs, vsz, vsz, 0);
+write_vec_element_i32(s, tcg_res, a->rd, 3, MO_32);
+}
+return true;
+}
 
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
@@ -13533,94 +13576,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto four-register
- *  31   23 22 21 20  16 15  14  10 95 40
- * +---+-+--+---+--+--+--+
- * | 1 1 0 0 1 1 1 0 0 | Op0 |  Rm  | 0 |  Ra  |  Rn  |  Rd  |
- * +---+-+--+---+--+--+--+
- */
-static void disas_crypto_four_reg(DisasContext *s, uint32_t insn)
-{
-int op0 = extract32(insn, 21, 2);
-int rm = extract32(insn, 16, 5);
-int ra = extract32(insn, 10, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-bool feature;
-
-switch (op0) {
-case 0: /* EOR3 */
-case 1: /* BCAX */
-feature = dc_isar_feature(aa64_sha3, s);
-break;
-case 2: /* SM3SS1 */
-feature = dc_isar_feature(aa64_sm3, s);
-break;
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!feature) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-if (op0 < 2) {
-

[PULL 01/42] xlnx_dpdma: fix descriptor endianness bug

2024-05-28 Thread Peter Maydell

From: Alexandra Diupina 

Add xlnx_dpdma_read_descriptor() and
xlnx_dpdma_write_descriptor() functions.
xlnx_dpdma_read_descriptor() combines reading a
descriptor from desc_addr by calling dma_memory_read()
and swapping the desc fields from guest memory order
to host memory order. xlnx_dpdma_write_descriptor()
performs similar actions when writing a descriptor.

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: d3c6369a96 ("introduce xlnx-dpdma")
Signed-off-by: Alexandra Diupina 
[PMM: tweaked indent, dropped behaviour change for write-failure case]
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/dma/xlnx_dpdma.c | 68 ++---
 1 file changed, 64 insertions(+), 4 deletions(-)

diff --git a/hw/dma/xlnx_dpdma.c b/hw/dma/xlnx_dpdma.c
index 530717d1885..dde4aeca401 100644
--- a/hw/dma/xlnx_dpdma.c
+++ b/hw/dma/xlnx_dpdma.c
@@ -614,6 +614,65 @@ static void xlnx_dpdma_register_types(void)
 type_register_static(&xlnx_dpdma_info);
 }
 
+static MemTxResult xlnx_dpdma_read_descriptor(XlnxDPDMAState *s,
+  uint64_t desc_addr,
+  DPDMADescriptor *desc)
+{
+MemTxResult res = dma_memory_read(&address_space_memory, desc_addr,
+  &desc, sizeof(DPDMADescriptor),
+  MEMTXATTRS_UNSPECIFIED);
+if (res) {
+return res;
+}
+
+/* Convert from LE into host endianness.  */
+desc->control = le32_to_cpu(desc->control);
+desc->descriptor_id = le32_to_cpu(desc->descriptor_id);
+desc->xfer_size = le32_to_cpu(desc->xfer_size);
+desc->line_size_stride = le32_to_cpu(desc->line_size_stride);
+desc->timestamp_lsb = le32_to_cpu(desc->timestamp_lsb);
+desc->timestamp_msb = le32_to_cpu(desc->timestamp_msb);
+desc->address_extension = le32_to_cpu(desc->address_extension);
+desc->next_descriptor = le32_to_cpu(desc->next_descriptor);
+desc->source_address = le32_to_cpu(desc->source_address);
+desc->address_extension_23 = le32_to_cpu(desc->address_extension_23);
+desc->address_extension_45 = le32_to_cpu(desc->address_extension_45);
+desc->source_address2 = le32_to_cpu(desc->source_address2);
+desc->source_address3 = le32_to_cpu(desc->source_address3);
+desc->source_address4 = le32_to_cpu(desc->source_address4);
+desc->source_address5 = le32_to_cpu(desc->source_address5);
+desc->crc = le32_to_cpu(desc->crc);
+
+return res;
+}
+
+static MemTxResult xlnx_dpdma_write_descriptor(uint64_t desc_addr,
+   DPDMADescriptor *desc)
+{
+DPDMADescriptor tmp_desc = *desc;
+
+/* Convert from host endianness into LE.  */
+tmp_desc.control = cpu_to_le32(tmp_desc.control);
+tmp_desc.descriptor_id = cpu_to_le32(tmp_desc.descriptor_id);
+tmp_desc.xfer_size = cpu_to_le32(tmp_desc.xfer_size);
+tmp_desc.line_size_stride = cpu_to_le32(tmp_desc.line_size_stride);
+tmp_desc.timestamp_lsb = cpu_to_le32(tmp_desc.timestamp_lsb);
+tmp_desc.timestamp_msb = cpu_to_le32(tmp_desc.timestamp_msb);
+tmp_desc.address_extension = cpu_to_le32(tmp_desc.address_extension);
+tmp_desc.next_descriptor = cpu_to_le32(tmp_desc.next_descriptor);
+tmp_desc.source_address = cpu_to_le32(tmp_desc.source_address);
+tmp_desc.address_extension_23 = cpu_to_le32(tmp_desc.address_extension_23);
+tmp_desc.address_extension_45 = cpu_to_le32(tmp_desc.address_extension_45);
+tmp_desc.source_address2 = cpu_to_le32(tmp_desc.source_address2);
+tmp_desc.source_address3 = cpu_to_le32(tmp_desc.source_address3);
+tmp_desc.source_address4 = cpu_to_le32(tmp_desc.source_address4);
+tmp_desc.source_address5 = cpu_to_le32(tmp_desc.source_address5);
+tmp_desc.crc = cpu_to_le32(tmp_desc.crc);
+
+return dma_memory_write(&address_space_memory, desc_addr, &tmp_desc,
+sizeof(DPDMADescriptor), MEMTXATTRS_UNSPECIFIED);
+}
+
 size_t xlnx_dpdma_start_operation(XlnxDPDMAState *s, uint8_t channel,
 bool one_desc)
 {
@@ -651,8 +710,7 @@ size_t xlnx_dpdma_start_operation(XlnxDPDMAState *s, 
uint8_t channel,
 desc_addr = xlnx_dpdma_descriptor_next_address(s, channel);
 }
 
-if (dma_memory_read(&address_space_memory, desc_addr, &desc,
-sizeof(DPDMADescriptor), MEMTXATTRS_UNSPECIFIED)) {
+if (xlnx_dpdma_read_descriptor(s, desc_addr, &desc)) {
 s->registers[DPDMA_EISR] |= ((1 << 1) << channel);
 xlnx_dpdma_update_irq(s);
 s->operation_finished[channel] = true;
@@ -755,8 +813,10 @@ size_t xlnx_dpdma_start_operation(XlnxDPDMAState *s, 
uint8_t channel,
 /* The descriptor need to be updated when it's completed. */
 DPRINTF("update the descriptor with the done flag set.\n");
 xlnx_dpdma_desc_set_done(&

[PULL 03/42] hw/arm/npcm7xx: remove setting of mp-affinity

2024-05-28 Thread Peter Maydell

From: Dorjoy Chowdhury 

The value of the mp-affinity property being set in npcm7xx_realize is
always the same as the default value it would have when arm_cpu_realizefn
is called if the property is not set here. So there is no need to set
the property value in npcm7xx_realize function.

Signed-off-by: Dorjoy Chowdhury 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240504141733.14813-1-dorjoychy...@gmail.com
Signed-off-by: Peter Maydell 
---
 hw/arm/npcm7xx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index 9f2d96c733a..cb7791301b4 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -487,9 +487,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 
 /* CPUs */
 for (i = 0; i < nc->num_cpus; i++) {
-object_property_set_int(OBJECT(&s->cpu[i]), "mp-affinity",
-arm_build_mp_affinity(i, NPCM7XX_MAX_NUM_CPUS),
-&error_abort);
 object_property_set_int(OBJECT(&s->cpu[i]), "reset-cbar",
 NPCM7XX_GIC_CPU_IF_ADDR, &error_abort);
 object_property_set_bool(OBJECT(&s->cpu[i]), "reset-hivecs", true,
-- 
2.34.1

[PULL 35/42] target/arm: Convert FMAXP, FMINP, FMAXNMP, FMINNMP to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

These are the last instructions within disas_simd_three_reg_same_fp16,
so remove it.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-30-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|  16 ++
 target/arm/tcg/a64.decode  |  24 +++
 target/arm/tcg/translate-a64.c | 296 ++---
 target/arm/tcg/vec_helper.c|  16 ++
 4 files changed, 107 insertions(+), 245 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 8441b49d1f0..32684773299 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1052,6 +1052,22 @@ DEF_HELPER_FLAGS_5(gvec_faddp_h, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_faddp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_faddp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
+DEF_HELPER_FLAGS_5(gvec_fmaxp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmaxp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmaxp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+
+DEF_HELPER_FLAGS_5(gvec_fminp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fminp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fminp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+
+DEF_HELPER_FLAGS_5(gvec_fmaxnump_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmaxnump_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmaxnump_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+
+DEF_HELPER_FLAGS_5(gvec_fminnump_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fminnump_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fminnump_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+
 #ifdef TARGET_AARCH64
 #include "tcg/helper-a64.h"
 #include "tcg/helper-sve.h"
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index d2a02365e15..43557fdccc6 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -746,6 +746,18 @@ FRSQRTS_s   0101 1110 1.1 . 1 1 . . 
@rrr_sd
 FADDP_s 0101 1110 0011  1101 10 . . @rr_h
 FADDP_s 0111 1110 0.11  1101 10 . . @rr_sd
 
+FMAXP_s 0101 1110 0011   10 . . @rr_h
+FMAXP_s 0111 1110 0.11   10 . . @rr_sd
+
+FMINP_s 0101 1110 1011   10 . . @rr_h
+FMINP_s 0111 1110 1.11   10 . . @rr_sd
+
+FMAXNMP_s   0101 1110 0011  1100 10 . . @rr_h
+FMAXNMP_s   0111 1110 0.11  1100 10 . . @rr_sd
+
+FMINNMP_s   0101 1110 1011  1100 10 . . @rr_h
+FMINNMP_s   0111 1110 1.11  1100 10 . . @rr_sd
+
 ### Advanced SIMD three same
 
 FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
@@ -808,6 +820,18 @@ FRSQRTS_v   0.00 1110 1.1 . 1 1 . . 
@qrrr_sd
 FADDP_v 0.10 1110 010 . 00010 1 . . @qrrr_h
 FADDP_v 0.10 1110 0.1 . 11010 1 . . @qrrr_sd
 
+FMAXP_v 0.10 1110 010 . 00110 1 . . @qrrr_h
+FMAXP_v 0.10 1110 0.1 . 0 1 . . @qrrr_sd
+
+FMINP_v 0.10 1110 110 . 00110 1 . . @qrrr_h
+FMINP_v 0.10 1110 1.1 . 0 1 . . @qrrr_sd
+
+FMAXNMP_v   0.10 1110 010 . 0 1 . . @qrrr_h
+FMAXNMP_v   0.10 1110 0.1 . 11000 1 . . @qrrr_sd
+
+FMINNMP_v   0.10 1110 110 . 0 1 . . @qrrr_h
+FMINNMP_v   0.10 1110 1.1 . 11000 1 . . @qrrr_sd
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 78949ab34f0..07415bd2855 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5217,6 +5217,34 @@ static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = 
{
 };
 TRANS(FADDP_v, do_fp3_vector, a, f_vector_faddp)
 
+static gen_helper_gvec_3_ptr * const f_vector_fmaxp[3] = {
+gen_helper_gvec_fmaxp_h,
+gen_helper_gvec_fmaxp_s,
+gen_helper_gvec_fmaxp_d,
+};
+TRANS(FMAXP_v, do_fp3_vector, a, f_vector_fmaxp)
+
+static gen_helper_gvec_3_ptr * const f_vector_fminp[3] = {
+gen_helper_gvec_fminp_h,
+gen_helper_gvec_fminp_s,
+gen_helper_gvec_fminp_d,
+};
+TRANS(FMINP_v, do_fp3_vector, a, f_vector_fminp)
+
+static gen_helper_gvec_3_ptr * const f_vector_fmaxnmp[3] = {
+gen_helper_gvec_fmaxnump_h,
+gen_helper_gvec_fmaxnump_s,
+gen_helper_gvec_fmaxnump_d,
+};
+TRANS(FMAXNMP_v, do_fp3_vector, a, f_vector_fmaxnmp)
+
+static gen_helper_gvec_3_ptr * const f_vector_fminnmp[3] = {
+gen_helper_gvec_fminnump_h,
+gen_helper_gvec_fminnump_s,
+

[PULL 16/42] target/arm: Convert Cryptographic 3-register SHA to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-11-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  | 11 +
 target/arm/tcg/translate-a64.c | 78 +-
 2 files changed, 21 insertions(+), 68 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 1de09903dc4..7590659ee68 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -30,6 +30,7 @@
 
 @rr_q1e0  .. rn:5 rd:5  &qrr_e q=1 esz=0
 @r2r_q1e0     .. rm:5 rd:5  &qrrr_e rn=%rd q=1 
esz=0
+@rrr_q1e0    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=0
 
 ### Data Processing - Immediate
 
@@ -603,3 +604,13 @@ AESE01001110 00 10100 00100 10 . .  
@r2r_q1e0
 AESD01001110 00 10100 00101 10 . .  @r2r_q1e0
 AESMC   01001110 00 10100 00110 10 . .  @rr_q1e0
 AESIMC  01001110 00 10100 00111 10 . .  @rr_q1e0
+
+### Cryptographic three-register SHA
+
+SHA1C   0101 1110 000 . 00 . .  @rrr_q1e0
+SHA1P   0101 1110 000 . 000100 . .  @rrr_q1e0
+SHA1M   0101 1110 000 . 001000 . .  @rrr_q1e0
+SHA1SU0 0101 1110 000 . 001100 . .  @rrr_q1e0
+SHA256H 0101 1110 000 . 01 . .  @rrr_q1e0
+SHA256H20101 1110 000 . 010100 . .  @rrr_q1e0
+SHA256SU1   0101 1110 000 . 011000 . .  @rrr_q1e0
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 3894db4bee2..5bef39d4e7d 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4589,7 +4589,7 @@ static bool trans_EXTR(DisasContext *s, arg_extract *a)
 }
 
 /*
- * Cryptographic AES
+ * Cryptographic AES, SHA
  */
 
 TRANS_FEAT(AESE, aa64_aes, do_gvec_op3_ool, a, 0, gen_helper_crypto_aese)
@@ -4597,6 +4597,15 @@ TRANS_FEAT(AESD, aa64_aes, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_aesd)
 TRANS_FEAT(AESMC, aa64_aes, do_gvec_op2_ool, a, 0, gen_helper_crypto_aesmc)
 TRANS_FEAT(AESIMC, aa64_aes, do_gvec_op2_ool, a, 0, gen_helper_crypto_aesimc)
 
+TRANS_FEAT(SHA1C, aa64_sha1, do_gvec_op3_ool, a, 0, gen_helper_crypto_sha1c)
+TRANS_FEAT(SHA1P, aa64_sha1, do_gvec_op3_ool, a, 0, gen_helper_crypto_sha1p)
+TRANS_FEAT(SHA1M, aa64_sha1, do_gvec_op3_ool, a, 0, gen_helper_crypto_sha1m)
+TRANS_FEAT(SHA1SU0, aa64_sha1, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha1su0)
+
+TRANS_FEAT(SHA256H, aa64_sha256, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha256h)
+TRANS_FEAT(SHA256H2, aa64_sha256, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha256h2)
+TRANS_FEAT(SHA256SU1, aa64_sha256, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha256su1)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -13497,72 +13506,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto three-reg SHA
- *  31 24 23  22  21 20  16  15 1412 11 10 95 40
- * +-+--+---+--+---++-+--+--+
- * | 0 1 0 1 1 1 1 0 | size | 0 |  Rm  | 0 | opcode | 0 0 |  Rn  |  Rd  |
- * +-+--+---+--+---++-+--+--+
- */
-static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
-{
-int size = extract32(insn, 22, 2);
-int opcode = extract32(insn, 12, 3);
-int rm = extract32(insn, 16, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-gen_helper_gvec_3 *genfn;
-bool feature;
-
-if (size != 0) {
-unallocated_encoding(s);
-return;
-}
-
-switch (opcode) {
-case 0: /* SHA1C */
-genfn = gen_helper_crypto_sha1c;
-feature = dc_isar_feature(aa64_sha1, s);
-break;
-case 1: /* SHA1P */
-genfn = gen_helper_crypto_sha1p;
-feature = dc_isar_feature(aa64_sha1, s);
-break;
-case 2: /* SHA1M */
-genfn = gen_helper_crypto_sha1m;
-feature = dc_isar_feature(aa64_sha1, s);
-break;
-case 3: /* SHA1SU0 */
-genfn = gen_helper_crypto_sha1su0;
-feature = dc_isar_feature(aa64_sha1, s);
-break;
-case 4: /* SHA256H */
-genfn = gen_helper_crypto_sha256h;
-feature = dc_isar_feature(aa64_sha256, s);
-break;
-case 5: /* SHA256H2 */
-genfn = gen_helper_crypto_sha256h2;
-feature = dc_isar_feature(aa64_sha256, s);
-break;
-case 6: /* SHA256SU1 */
-genfn = gen_helper_crypto_sha256su1;
-feature = dc_isar_feature(aa64_sha256, s);
-break;
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!feature) {
-unallocated_encoding(s);

[PULL 26/42] target/arm: Convert FMAX, FMIN, FMAXNM, FMINNM to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-21-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|   4 +
 target/arm/tcg/a64.decode  |  17 
 target/arm/tcg/translate-a64.c | 168 +
 target/arm/tcg/vec_helper.c|   4 +
 4 files changed, 113 insertions(+), 80 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 2b027333053..7ee15b96512 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -748,15 +748,19 @@ DEF_HELPER_FLAGS_5(gvec_facgt_s, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fmax_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmax_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fmin_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmin_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmin_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fmaxnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_fmaxnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmaxnum_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
 DEF_HELPER_FLAGS_5(gvec_fminnum_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_fminnum_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fminnum_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
 DEF_HELPER_FLAGS_5(gvec_recps_nf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_recps_nf_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 82daafbef52..e2678d919e5 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -704,6 +704,11 @@ FSUB_s  0001 1110 ..1 . 0011 10 . . 
@rrr_hsd
 FDIV_s  0001 1110 ..1 . 0001 10 . . @rrr_hsd
 FMUL_s  0001 1110 ..1 .  10 . . @rrr_hsd
 
+FMAX_s  0001 1110 ..1 . 0100 10 . . @rrr_hsd
+FMIN_s  0001 1110 ..1 . 0101 10 . . @rrr_hsd
+FMAXNM_s0001 1110 ..1 . 0110 10 . . @rrr_hsd
+FMINNM_s0001 1110 ..1 . 0111 10 . . @rrr_hsd
+
 FMULX_s 0101 1110 010 . 00011 1 . . @rrr_h
 FMULX_s 0101 1110 0.1 . 11011 1 . . @rrr_sd
 
@@ -721,6 +726,18 @@ FDIV_v  0.10 1110 0.1 . 1 1 . . 
@qrrr_sd
 FMUL_v  0.10 1110 010 . 00011 1 . . @qrrr_h
 FMUL_v  0.10 1110 0.1 . 11011 1 . . @qrrr_sd
 
+FMAX_v  0.00 1110 010 . 00110 1 . . @qrrr_h
+FMAX_v  0.00 1110 0.1 . 0 1 . . @qrrr_sd
+
+FMIN_v  0.00 1110 110 . 00110 1 . . @qrrr_h
+FMIN_v  0.00 1110 1.1 . 0 1 . . @qrrr_sd
+
+FMAXNM_v0.00 1110 010 . 0 1 . . @qrrr_h
+FMAXNM_v0.00 1110 0.1 . 11000 1 . . @qrrr_sd
+
+FMINNM_v0.00 1110 110 . 0 1 . . @qrrr_h
+FMINNM_v0.00 1110 1.1 . 11000 1 . . @qrrr_sd
+
 FMULX_v 0.00 1110 010 . 00011 1 . . @qrrr_h
 FMULX_v 0.00 1110 0.1 . 11011 1 . . @qrrr_sd
 
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 97c3d758d62..6f8207d842b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4915,6 +4915,34 @@ static const FPScalar f_scalar_fmul = {
 };
 TRANS(FMUL_s, do_fp3_scalar, a, &f_scalar_fmul)
 
+static const FPScalar f_scalar_fmax = {
+gen_helper_advsimd_maxh,
+gen_helper_vfp_maxs,
+gen_helper_vfp_maxd,
+};
+TRANS(FMAX_s, do_fp3_scalar, a, &f_scalar_fmax)
+
+static const FPScalar f_scalar_fmin = {
+gen_helper_advsimd_minh,
+gen_helper_vfp_mins,
+gen_helper_vfp_mind,
+};
+TRANS(FMIN_s, do_fp3_scalar, a, &f_scalar_fmin)
+
+static const FPScalar f_scalar_fmaxnm = {
+gen_helper_advsimd_maxnumh,
+gen_helper_vfp_maxnums,
+gen_helper_vfp_maxnumd,
+};
+TRANS(FMAXNM_s, do_fp3_scalar, a, &f_scalar_fmaxnm)
+
+static const FPScalar f_scalar_fminnm = {
+gen_helper_advsimd_minnumh,
+gen_helper_vfp_minnums,
+gen_helper_vfp_minnumd,
+};
+TRANS(FMINNM_s, do_fp3_scalar, a, &f_scalar_fminnm)
+
 static const FPScalar f_scalar_fmulx = {
 gen_helper_advsimd_mulxh,
 gen_helper_vfp_mulxs,
@@ -4978,6 +5006,34 @@ static gen_helper_gvec_3_ptr * const f_vector_fmul[3] = {
 };
 TRANS(FMUL_v, do_fp3_vector, a, f_vector_fmul)
 
+static gen_helper_gvec_3_ptr * const f_vector_fmax[3] = {
+gen_helper_gvec_fmax_h,
+gen_helper_gvec_fmax_s,
+gen_helper_gvec_fmax_d,
+};
+TRANS(FMAX_v, do_fp3_vector,

[PULL 34/42] target/arm: Convert FADDP to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-29-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|  4 ++
 target/arm/tcg/a64.decode  | 12 +
 target/arm/tcg/translate-a64.c | 87 ++
 target/arm/tcg/vec_helper.c| 23 +
 4 files changed, 105 insertions(+), 21 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index ff6e3094f41..8441b49d1f0 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1048,6 +1048,10 @@ DEF_HELPER_FLAGS_5(gvec_uclamp_s, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_5(gvec_uclamp_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_faddp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_faddp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_faddp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+
 #ifdef TARGET_AARCH64
 #include "tcg/helper-a64.h"
 #include "tcg/helper-sve.h"
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 84cb38f1dd0..d2a02365e15 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -29,6 +29,7 @@
 &ri rd imm
 &rri_sf rd rn imm sf
 &i  imm
+&rr_e   rd rn esz
 &rrr_e  rd rn rm esz
 &rrx_e  rd rn rm idx esz
 &qrr_e  q rd rn esz
@@ -36,6 +37,9 @@
 &qrrx_e q rd rn rm idx esz
 &q_eq rd rn rm ra esz
 
+@rr_h    ... . .. rn:5 rd:5 &rr_e esz=1
+@rr_sd   ... . .. rn:5 rd:5 &rr_e esz=%esz_sd
+
 @rrr_h   ... rm:5 .. rn:5 rd:5  &rrr_e esz=1
 @rrr_sd  ... rm:5 .. rn:5 rd:5  &rrr_e esz=%esz_sd
 @rrr_hsd ... rm:5 .. rn:5 rd:5  &rrr_e esz=%esz_hsd
@@ -737,6 +741,11 @@ FRECPS_s0101 1110 0.1 . 1 1 . . 
@rrr_sd
 FRSQRTS_s   0101 1110 110 . 00111 1 . . @rrr_h
 FRSQRTS_s   0101 1110 1.1 . 1 1 . . @rrr_sd
 
+### Advanced SIMD scalar pairwise
+
+FADDP_s 0101 1110 0011  1101 10 . . @rr_h
+FADDP_s 0111 1110 0.11  1101 10 . . @rr_sd
+
 ### Advanced SIMD three same
 
 FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
@@ -796,6 +805,9 @@ FRECPS_v0.00 1110 0.1 . 1 1 . . 
@qrrr_sd
 FRSQRTS_v   0.00 1110 110 . 00111 1 . . @qrrr_h
 FRSQRTS_v   0.00 1110 1.1 . 1 1 . . @qrrr_sd
 
+FADDP_v 0.10 1110 010 . 00010 1 . . @qrrr_h
+FADDP_v 0.10 1110 0.1 . 11010 1 . . @qrrr_sd
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a7537a5104f..78949ab34f0 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5210,6 +5210,13 @@ static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] 
= {
 };
 TRANS(FRSQRTS_v, do_fp3_vector, a, f_vector_frsqrts)
 
+static gen_helper_gvec_3_ptr * const f_vector_faddp[3] = {
+gen_helper_gvec_faddp_h,
+gen_helper_gvec_faddp_s,
+gen_helper_gvec_faddp_d,
+};
+TRANS(FADDP_v, do_fp3_vector, a, f_vector_faddp)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -5395,6 +5402,56 @@ static bool do_fmla_vector_idx(DisasContext *s, 
arg_qrrx_e *a, bool neg)
 TRANS(FMLA_vi, do_fmla_vector_idx, a, false)
 TRANS(FMLS_vi, do_fmla_vector_idx, a, true)
 
+/*
+ * Advanced SIMD scalar pairwise
+ */
+
+static bool do_fp3_scalar_pair(DisasContext *s, arg_rr_e *a, const FPScalar *f)
+{
+switch (a->esz) {
+case MO_64:
+if (fp_access_check(s)) {
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+
+read_vec_element(s, t0, a->rn, 0, MO_64);
+read_vec_element(s, t1, a->rn, 1, MO_64);
+f->gen_d(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+write_fp_dreg(s, a->rd, t0);
+}
+break;
+case MO_32:
+if (fp_access_check(s)) {
+TCGv_i32 t0 = tcg_temp_new_i32();
+TCGv_i32 t1 = tcg_temp_new_i32();
+
+read_vec_element_i32(s, t0, a->rn, 0, MO_32);
+read_vec_element_i32(s, t1, a->rn, 1, MO_32);
+f->gen_s(t0, t0, t1, fpstatus_ptr(FPST_FPCR));
+write_fp_sreg(s, a->rd, t0);
+}
+break;
+case MO_16:
+if (!dc_isar_feature(aa64_fp16, s)) {
+return false;
+}
+if (fp_access_check(s)) {
+TCGv_i32 t0 = tcg_temp_new_i32();
+TCGv_i32 t1 = tcg_temp_new_i32();
+
+read_vec_element_i32(s, t0, a->rn, 0, MO_16);
+read_vec_element_i32(s, t1, a->rn, 1, MO_16);
+f->gen_h(t0, t0, t1, fps

[PULL 11/42] target/arm: Fix decode of FMOV (hp) vs MOVI

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

The decode of FMOV (vector, immediate, half-precision) vs
invalid cases of MOVI are incorrect.

Fixes RISU mismatch for invalid insn 0x2f01fd31.

Fixes: 70b4e6a4457 ("arm/translate-a64: add FP16 FMOV to simd_mod_imm")
Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-id: 20240524232121.284515-6-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-a64.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d97acdbaf9a..5455ae36850 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -7904,27 +7904,31 @@ static void disas_simd_mod_imm(DisasContext *s, 
uint32_t insn)
 bool is_q = extract32(insn, 30, 1);
 uint64_t imm = 0;
 
-if (o2 != 0 || ((cmode == 0xf) && is_neg && !is_q)) {
-/* Check for FMOV (vector, immediate) - half-precision */
-if (!(dc_isar_feature(aa64_fp16, s) && o2 && cmode == 0xf)) {
+if (o2) {
+if (cmode != 0xf || is_neg) {
 unallocated_encoding(s);
 return;
 }
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-if (cmode == 15 && o2 && !is_neg) {
 /* FMOV (vector, immediate) - half-precision */
+if (!dc_isar_feature(aa64_fp16, s)) {
+unallocated_encoding(s);
+return;
+}
 imm = vfp_expand_imm(MO_16, abcdefgh);
 /* now duplicate across the lanes */
 imm = dup_const(MO_16, imm);
 } else {
+if (cmode == 0xf && is_neg && !is_q) {
+unallocated_encoding(s);
+return;
+}
 imm = asimd_imm_const(abcdefgh, cmode, is_neg);
 }
 
+if (!fp_access_check(s)) {
+return;
+}
+
 if (!((cmode & 0x9) == 0x1 || (cmode & 0xd) == 0x9)) {
 /* MOVI or MVNI, with MVNI negation handled above.  */
 tcg_gen_gvec_dup_imm(MO_64, vec_full_reg_offset(s, rd), is_q ? 16 : 8,
-- 
2.34.1

[PULL 06/42] hw/input/tsc2005: Fix -Wchar-subscripts warning in tsc2005_txrx()

2024-05-28 Thread Peter Maydell

From: Philippe Mathieu-Daudé 

Check the function index is in range and use an unsigned
variable to avoid the following warning with GCC 13.2.0:

  [666/5358] Compiling C object libcommon.fa.p/hw_input_tsc2005.c.o
  hw/input/tsc2005.c: In function 'tsc2005_timer_tick':
  hw/input/tsc2005.c:416:26: warning: array subscript has type 'char' 
[-Wchar-subscripts]
416 | s->dav |= mode_regs[s->function];
| ~^~

Signed-off-by: Philippe Mathieu-Daudé 
Message-id: 20240508143513.44996-1-phi...@linaro.org
Reviewed-by: Peter Maydell 
[PMM: fixed missing ')']
Signed-off-by: Peter Maydell 
---
 hw/input/tsc2005.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/input/tsc2005.c b/hw/input/tsc2005.c
index 941f163d364..ac7f54eeafb 100644
--- a/hw/input/tsc2005.c
+++ b/hw/input/tsc2005.c
@@ -406,6 +406,9 @@ uint32_t tsc2005_txrx(void *opaque, uint32_t value, int len)
 static void tsc2005_timer_tick(void *opaque)
 {
 TSC2005State *s = opaque;
+unsigned int function = s->function;
+
+assert(function < ARRAY_SIZE(mode_regs));
 
 /* Timer ticked -- a set of conversions has been finished.  */
 
@@ -413,7 +416,7 @@ static void tsc2005_timer_tick(void *opaque)
 return;
 
 s->busy = false;
-s->dav |= mode_regs[s->function];
+s->dav |= mode_regs[function];
 s->function = -1;
 tsc2005_pin_update(s);
 }
-- 
2.34.1

[PULL 27/42] target/arm: Introduce vfp_load_reg16

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Load and zero-extend float16 into a TCGv_i32 before
all scalar operations.

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-id: 20240524232121.284515-22-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-vfp.c | 39 +++---
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/target/arm/tcg/translate-vfp.c b/target/arm/tcg/translate-vfp.c
index b9af03b7c35..8e755fcde8a 100644
--- a/target/arm/tcg/translate-vfp.c
+++ b/target/arm/tcg/translate-vfp.c
@@ -48,6 +48,12 @@ static inline void vfp_store_reg32(TCGv_i32 var, int reg)
 tcg_gen_st_i32(var, tcg_env, vfp_reg_offset(false, reg));
 }
 
+static inline void vfp_load_reg16(TCGv_i32 var, int reg)
+{
+tcg_gen_ld16u_i32(var, tcg_env,
+  vfp_reg_offset(false, reg) + HOST_BIG_ENDIAN * 2);
+}
+
 /*
  * The imm8 encodes the sign bit, enough bits to represent an exponent in
  * the range 011xx to 100xx, and the most significant 4 bits of
@@ -902,8 +908,7 @@ static bool trans_VMOV_half(DisasContext *s, 
arg_VMOV_single *a)
 if (a->l) {
 /* VFP to general purpose register */
 tmp = tcg_temp_new_i32();
-vfp_load_reg32(tmp, a->vn);
-tcg_gen_andi_i32(tmp, tmp, 0x);
+vfp_load_reg16(tmp, a->vn);
 store_reg(s, a->rt, tmp);
 } else {
 /* general purpose register to VFP */
@@ -1453,11 +1458,11 @@ static bool do_vfp_3op_hp(DisasContext *s, 
VFPGen3OpSPFn *fn,
 fd = tcg_temp_new_i32();
 fpst = fpstatus_ptr(FPST_FPCR_F16);
 
-vfp_load_reg32(f0, vn);
-vfp_load_reg32(f1, vm);
+vfp_load_reg16(f0, vn);
+vfp_load_reg16(f1, vm);
 
 if (reads_vd) {
-vfp_load_reg32(fd, vd);
+vfp_load_reg16(fd, vd);
 }
 fn(fd, f0, f1, fpst);
 vfp_store_reg32(fd, vd);
@@ -1633,7 +1638,7 @@ static bool do_vfp_2op_hp(DisasContext *s, VFPGen2OpSPFn 
*fn, int vd, int vm)
 }
 
 f0 = tcg_temp_new_i32();
-vfp_load_reg32(f0, vm);
+vfp_load_reg16(f0, vm);
 fn(f0, f0);
 vfp_store_reg32(f0, vd);
 
@@ -2106,13 +2111,13 @@ static bool do_vfm_hp(DisasContext *s, arg_VFMA_sp *a, 
bool neg_n, bool neg_d)
 vm = tcg_temp_new_i32();
 vd = tcg_temp_new_i32();
 
-vfp_load_reg32(vn, a->vn);
-vfp_load_reg32(vm, a->vm);
+vfp_load_reg16(vn, a->vn);
+vfp_load_reg16(vm, a->vm);
 if (neg_n) {
 /* VFNMS, VFMS */
 gen_helper_vfp_negh(vn, vn);
 }
-vfp_load_reg32(vd, a->vd);
+vfp_load_reg16(vd, a->vd);
 if (neg_d) {
 /* VFNMA, VFNMS */
 gen_helper_vfp_negh(vd, vd);
@@ -2456,11 +2461,11 @@ static bool trans_VCMP_hp(DisasContext *s, arg_VCMP_sp 
*a)
 vd = tcg_temp_new_i32();
 vm = tcg_temp_new_i32();
 
-vfp_load_reg32(vd, a->vd);
+vfp_load_reg16(vd, a->vd);
 if (a->z) {
 tcg_gen_movi_i32(vm, 0);
 } else {
-vfp_load_reg32(vm, a->vm);
+vfp_load_reg16(vm, a->vm);
 }
 
 if (a->e) {
@@ -2700,7 +2705,7 @@ static bool trans_VRINTR_hp(DisasContext *s, 
arg_VRINTR_sp *a)
 }
 
 tmp = tcg_temp_new_i32();
-vfp_load_reg32(tmp, a->vm);
+vfp_load_reg16(tmp, a->vm);
 fpst = fpstatus_ptr(FPST_FPCR_F16);
 gen_helper_rinth(tmp, tmp, fpst);
 vfp_store_reg32(tmp, a->vd);
@@ -2773,7 +2778,7 @@ static bool trans_VRINTZ_hp(DisasContext *s, 
arg_VRINTZ_sp *a)
 }
 
 tmp = tcg_temp_new_i32();
-vfp_load_reg32(tmp, a->vm);
+vfp_load_reg16(tmp, a->vm);
 fpst = fpstatus_ptr(FPST_FPCR_F16);
 tcg_rmode = gen_set_rmode(FPROUNDING_ZERO, fpst);
 gen_helper_rinth(tmp, tmp, fpst);
@@ -2853,7 +2858,7 @@ static bool trans_VRINTX_hp(DisasContext *s, 
arg_VRINTX_sp *a)
 }
 
 tmp = tcg_temp_new_i32();
-vfp_load_reg32(tmp, a->vm);
+vfp_load_reg16(tmp, a->vm);
 fpst = fpstatus_ptr(FPST_FPCR_F16);
 gen_helper_rinth_exact(tmp, tmp, fpst);
 vfp_store_reg32(tmp, a->vd);
@@ -3270,7 +3275,7 @@ static bool trans_VCVT_hp_int(DisasContext *s, 
arg_VCVT_sp_int *a)
 
 fpst = fpstatus_ptr(FPST_FPCR_F16);
 vm = tcg_temp_new_i32();
-vfp_load_reg32(vm, a->vm);
+vfp_load_reg16(vm, a->vm);
 
 if (a->s) {
 if (a->rz) {
@@ -3383,8 +3388,8 @@ static bool trans_VINS(DisasContext *s, arg_VINS *a)
 /* Insert low half of Vm into high half of Vd */
 rm = tcg_temp_new_i32();
 rd = tcg_temp_new_i32();
-vfp_load_reg32(rm, a->vm);
-vfp_load_reg32(rd, a->vd);
+vfp_load_reg16(rm, a->vm);
+vfp_load_reg16(rd, a->vd);
 tcg_gen_deposit_i32(rd, rd, rm, 16, 16);
 vfp_store_reg32(rd, a->vd);
 return true;
-- 
2.34.1

[PULL 04/42] hw/char: Correct STM32L4x5 usart register CR2 field ADD_0 size

2024-05-28 Thread Peter Maydell

From: Inès Varhol 

Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
Message-id: 20240505141613.387508-1-ines.var...@telecom-paris.fr
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/char/stm32l4x5_usart.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/char/stm32l4x5_usart.c b/hw/char/stm32l4x5_usart.c
index 02f666308c0..fc5dcac0c45 100644
--- a/hw/char/stm32l4x5_usart.c
+++ b/hw/char/stm32l4x5_usart.c
@@ -56,7 +56,7 @@ REG32(CR1, 0x00)
 FIELD(CR1, UE, 0, 1) /* USART enable */
 REG32(CR2, 0x04)
 FIELD(CR2, ADD_1, 28, 4)/* ADD[7:4] */
-FIELD(CR2, ADD_0, 24, 1)/* ADD[3:0] */
+FIELD(CR2, ADD_0, 24, 4)/* ADD[3:0] */
 FIELD(CR2, RTOEN, 23, 1)/* Receiver timeout enable */
 FIELD(CR2, ABRMOD, 21, 2)   /* Auto baud rate mode */
 FIELD(CR2, ABREN, 20, 1)/* Auto baud rate enable */
-- 
2.34.1

[PULL 14/42] target/arm: Split out gengvec64.c

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Split some routines out of translate-a64.c and translate-sve.c
that are used by both.

Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-9-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-a64.h |   4 +
 target/arm/tcg/gengvec64.c | 190 +
 target/arm/tcg/translate-a64.c |  26 -
 target/arm/tcg/translate-sve.c | 145 +
 target/arm/tcg/meson.build |   1 +
 5 files changed, 197 insertions(+), 169 deletions(-)
 create mode 100644 target/arm/tcg/gengvec64.c

diff --git a/target/arm/tcg/translate-a64.h b/target/arm/tcg/translate-a64.h
index 7b811b8ac51..91750f0ca91 100644
--- a/target/arm/tcg/translate-a64.h
+++ b/target/arm/tcg/translate-a64.h
@@ -193,6 +193,10 @@ void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
   uint32_t rm_ofs, int64_t shift,
   uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_eor3(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
+   uint32_t a, uint32_t oprsz, uint32_t maxsz);
+void gen_gvec_bcax(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
+   uint32_t a, uint32_t oprsz, uint32_t maxsz);
 
 void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int 
imm);
 void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int 
imm);
diff --git a/target/arm/tcg/gengvec64.c b/target/arm/tcg/gengvec64.c
new file mode 100644
index 000..093b498b13d
--- /dev/null
+++ b/target/arm/tcg/gengvec64.c
@@ -0,0 +1,190 @@
+/*
+ *  AArch64 generic vector expansion
+ *
+ *  Copyright (c) 2013 Alexander Graf 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "translate.h"
+#include "translate-a64.h"
+
+
+static void gen_rax1_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m)
+{
+tcg_gen_rotli_i64(d, m, 1);
+tcg_gen_xor_i64(d, d, n);
+}
+
+static void gen_rax1_vec(unsigned vece, TCGv_vec d, TCGv_vec n, TCGv_vec m)
+{
+tcg_gen_rotli_vec(vece, d, m, 1);
+tcg_gen_xor_vec(vece, d, d, n);
+}
+
+void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+   uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static const TCGOpcode vecop_list[] = { INDEX_op_rotli_vec, 0 };
+static const GVecGen3 op = {
+.fni8 = gen_rax1_i64,
+.fniv = gen_rax1_vec,
+.opt_opc = vecop_list,
+.fno = gen_helper_crypto_rax1,
+.vece = MO_64,
+};
+tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &op);
+}
+
+static void gen_xar8_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, int64_t sh)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+uint64_t mask = dup_const(MO_8, 0xff >> sh);
+
+tcg_gen_xor_i64(t, n, m);
+tcg_gen_shri_i64(d, t, sh);
+tcg_gen_shli_i64(t, t, 8 - sh);
+tcg_gen_andi_i64(d, d, mask);
+tcg_gen_andi_i64(t, t, ~mask);
+tcg_gen_or_i64(d, d, t);
+}
+
+static void gen_xar16_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, int64_t sh)
+{
+TCGv_i64 t = tcg_temp_new_i64();
+uint64_t mask = dup_const(MO_16, 0x >> sh);
+
+tcg_gen_xor_i64(t, n, m);
+tcg_gen_shri_i64(d, t, sh);
+tcg_gen_shli_i64(t, t, 16 - sh);
+tcg_gen_andi_i64(d, d, mask);
+tcg_gen_andi_i64(t, t, ~mask);
+tcg_gen_or_i64(d, d, t);
+}
+
+static void gen_xar_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, int32_t sh)
+{
+tcg_gen_xor_i32(d, n, m);
+tcg_gen_rotri_i32(d, d, sh);
+}
+
+static void gen_xar_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, int64_t sh)
+{
+tcg_gen_xor_i64(d, n, m);
+tcg_gen_rotri_i64(d, d, sh);
+}
+
+static void gen_xar_vec(unsigned vece, TCGv_vec d, TCGv_vec n,
+TCGv_vec m, int64_t sh)
+{
+tcg_gen_xor_vec(vece, d, n, m);
+tcg_gen_rotri_vec(vece, d, d, sh);
+}
+
+void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+  uint32_t rm_ofs, int64_t shift,
+  uint32_t opr_sz, uint32_t max_sz)
+{
+static const TCGOpcode vecop[] = { INDEX_op_rotli_vec, 0 };
+static const GVecGen3i ops[4] = {
+{ .fni8 = gen_xar8_i64,
+  .fniv = gen_xar_vec,
+  .fno = gen_he

[PULL 25/42] target/arm: Convert FADD, FSUB, FDIV, FMUL to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-20-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/helper-a64.h|   4 +
 target/arm/tcg/translate.h |   5 +
 target/arm/tcg/a64.decode  |  27 +
 target/arm/tcg/translate-a64.c | 205 +
 target/arm/tcg/vec_helper.c|   4 +
 5 files changed, 143 insertions(+), 102 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index b79751a7170..371388f61b5 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -133,6 +133,10 @@ DEF_HELPER_4(cpyfp, void, env, i32, i32, i32)
 DEF_HELPER_4(cpyfm, void, env, i32, i32, i32)
 DEF_HELPER_4(cpyfe, void, env, i32, i32, i32)
 
+DEF_HELPER_FLAGS_5(gvec_fdiv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fdiv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fdiv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_5(gvec_fmulx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_fmulx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_fmulx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 80e85096a83..ecfa242eef3 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -252,6 +252,11 @@ static inline int shl_12(DisasContext *s, int x)
 return x << 12;
 }
 
+static inline int xor_2(DisasContext *s, int x)
+{
+return x ^ 2;
+}
+
 static inline int neon_3same_fp_size(DisasContext *s, int x)
 {
 /* Convert 0==fp32, 1==fp16 into a MO_* value */
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 2e0e01be017..82daafbef52 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -21,6 +21,7 @@
 
 %rd 0:5
 %esz_sd 22:1 !function=plus_2
+%esz_hsd22:2 !function=xor_2
 %hl 11:1 21:1
 %hlm11:1 20:2
 
@@ -37,6 +38,7 @@
 
 @rrr_h   ... rm:5 .. rn:5 rd:5  &rrr_e esz=1
 @rrr_sd  ... rm:5 .. rn:5 rd:5  &rrr_e esz=%esz_sd
+@rrr_hsd ... rm:5 .. rn:5 rd:5  &rrr_e esz=%esz_hsd
 
 @rrx_h   .. .. rm:4  . . rn:5 rd:5  &rrx_e esz=1 idx=%hlm
 @rrx_s   .. . rm:5   . . rn:5 rd:5  &rrx_e esz=2 idx=%hl
@@ -697,22 +699,47 @@ INS_element 0 1   10 1110 000 di:5  0 si:4 1 rn:5 rd:5
 
 ### Advanced SIMD scalar three same
 
+FADD_s  0001 1110 ..1 . 0010 10 . . @rrr_hsd
+FSUB_s  0001 1110 ..1 . 0011 10 . . @rrr_hsd
+FDIV_s  0001 1110 ..1 . 0001 10 . . @rrr_hsd
+FMUL_s  0001 1110 ..1 .  10 . . @rrr_hsd
+
 FMULX_s 0101 1110 010 . 00011 1 . . @rrr_h
 FMULX_s 0101 1110 0.1 . 11011 1 . . @rrr_sd
 
 ### Advanced SIMD three same
 
+FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
+FADD_v  0.00 1110 0.1 . 11010 1 . . @qrrr_sd
+
+FSUB_v  0.00 1110 110 . 00010 1 . . @qrrr_h
+FSUB_v  0.00 1110 1.1 . 11010 1 . . @qrrr_sd
+
+FDIV_v  0.10 1110 010 . 00111 1 . . @qrrr_h
+FDIV_v  0.10 1110 0.1 . 1 1 . . @qrrr_sd
+
+FMUL_v  0.10 1110 010 . 00011 1 . . @qrrr_h
+FMUL_v  0.10 1110 0.1 . 11011 1 . . @qrrr_sd
+
 FMULX_v 0.00 1110 010 . 00011 1 . . @qrrr_h
 FMULX_v 0.00 1110 0.1 . 11011 1 . . @qrrr_sd
 
 ### Advanced SIMD scalar x indexed element
 
+FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
+FMUL_si 0101  10 . . 1001 . 0 . .   @rrx_s
+FMUL_si 0101  11 0 . 1001 . 0 . .   @rrx_d
+
 FMULX_si0111  00 ..  1001 . 0 . .   @rrx_h
 FMULX_si0111  10 . . 1001 . 0 . .   @rrx_s
 FMULX_si0111  11 0 . 1001 . 0 . .   @rrx_d
 
 ### Advanced SIMD vector x indexed element
 
+FMUL_vi 0.00  00 ..  1001 . 0 . .   @qrrx_h
+FMUL_vi 0.00  10 . . 1001 . 0 . .   @qrrx_s
+FMUL_vi 0.00  11 0 . 1001 . 0 . .   @qrrx_d
+
 FMULX_vi0.10  00 ..  1001 . 0 . .   @qrrx_h
 FMULX_vi0.10  10 . . 1001 . 0 . .   @qrrx_s
 FMULX_vi0.10  11 0 . 1001 . 0 . .   @qrrx_d
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 8cbe6cd70f2..97c3d758d62 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4887,6 +4887,34 @@ static bool do_fp3_scalar(DisasContext *s, arg_rrr_e *a, 
const FPScalar *f)
 return true;
 }
 
+static const FPScalar f_scal

[PULL 40/42] target/arm: Use gvec for neon pmax, pmin

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-35-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-neon.c | 78 ++---
 1 file changed, 4 insertions(+), 74 deletions(-)

diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 6c5a7a98e1b..18b048611b3 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -831,6 +831,10 @@ DO_3SAME_NO_SZ_3(VABA_S, gen_gvec_saba)
 DO_3SAME_NO_SZ_3(VABD_U, gen_gvec_uabd)
 DO_3SAME_NO_SZ_3(VABA_U, gen_gvec_uaba)
 DO_3SAME_NO_SZ_3(VPADD, gen_gvec_addp)
+DO_3SAME_NO_SZ_3(VPMAX_S, gen_gvec_smaxp)
+DO_3SAME_NO_SZ_3(VPMIN_S, gen_gvec_sminp)
+DO_3SAME_NO_SZ_3(VPMAX_U, gen_gvec_umaxp)
+DO_3SAME_NO_SZ_3(VPMIN_U, gen_gvec_uminp)
 
 #define DO_3SAME_CMP(INSN, COND)\
 static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
@@ -1003,80 +1007,6 @@ DO_3SAME_32_ENV(VQSHL_U, qshl_u)
 DO_3SAME_32_ENV(VQRSHL_S, qrshl_s)
 DO_3SAME_32_ENV(VQRSHL_U, qrshl_u)
 
-static bool do_3same_pair(DisasContext *s, arg_3same *a, NeonGenTwoOpFn *fn)
-{
-/* Operations handled pairwise 32 bits at a time */
-TCGv_i32 tmp, tmp2, tmp3;
-
-if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-return false;
-}
-
-/* UNDEF accesses to D16-D31 if they don't exist. */
-if (!dc_isar_feature(aa32_simd_r32, s) &&
-((a->vd | a->vn | a->vm) & 0x10)) {
-return false;
-}
-
-if (a->size == 3) {
-return false;
-}
-
-if (!vfp_access_check(s)) {
-return true;
-}
-
-assert(a->q == 0); /* enforced by decode patterns */
-
-/*
- * Note that we have to be careful not to clobber the source operands
- * in the "vm == vd" case by storing the result of the first pass too
- * early. Since Q is 0 there are always just two passes, so instead
- * of a complicated loop over each pass we just unroll.
- */
-tmp = tcg_temp_new_i32();
-tmp2 = tcg_temp_new_i32();
-tmp3 = tcg_temp_new_i32();
-
-read_neon_element32(tmp, a->vn, 0, MO_32);
-read_neon_element32(tmp2, a->vn, 1, MO_32);
-fn(tmp, tmp, tmp2);
-
-read_neon_element32(tmp3, a->vm, 0, MO_32);
-read_neon_element32(tmp2, a->vm, 1, MO_32);
-fn(tmp3, tmp3, tmp2);
-
-write_neon_element32(tmp, a->vd, 0, MO_32);
-write_neon_element32(tmp3, a->vd, 1, MO_32);
-
-return true;
-}
-
-#define DO_3SAME_PAIR(INSN, func)   \
-static bool trans_##INSN##_3s(DisasContext *s, arg_3same *a)\
-{   \
-static NeonGenTwoOpFn * const fns[] = { \
-gen_helper_neon_##func##8,  \
-gen_helper_neon_##func##16, \
-gen_helper_neon_##func##32, \
-};  \
-if (a->size > 2) {  \
-return false;   \
-}   \
-return do_3same_pair(s, a, fns[a->size]);   \
-}
-
-/* 32-bit pairwise ops end up the same as the elementwise versions.  */
-#define gen_helper_neon_pmax_s32  tcg_gen_smax_i32
-#define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
-#define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
-#define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
-
-DO_3SAME_PAIR(VPMAX_S, pmax_s)
-DO_3SAME_PAIR(VPMIN_S, pmin_s)
-DO_3SAME_PAIR(VPMAX_U, pmax_u)
-DO_3SAME_PAIR(VPMIN_U, pmin_u)
-
 #define DO_3SAME_VQDMULH(INSN, FUNC)\
 WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16);\
 WRAP_ENV_FN(gen_##INSN##_tramp32, gen_helper_neon_##FUNC##_s32);\
-- 
2.34.1

[PULL 28/42] target/arm: Expand vfp neg and abs inline

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-23-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|  6 
 target/arm/tcg/translate.h | 30 +++
 target/arm/tcg/translate-a64.c | 44 +--
 target/arm/tcg/translate-vfp.c | 54 +-
 target/arm/vfp_helper.c| 30 ---
 5 files changed, 79 insertions(+), 85 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 7ee15b96512..0fd01c9c52d 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -132,12 +132,6 @@ DEF_HELPER_3(vfp_maxnumd, f64, f64, f64, ptr)
 DEF_HELPER_3(vfp_minnumh, f16, f16, f16, ptr)
 DEF_HELPER_3(vfp_minnums, f32, f32, f32, ptr)
 DEF_HELPER_3(vfp_minnumd, f64, f64, f64, ptr)
-DEF_HELPER_1(vfp_negh, f16, f16)
-DEF_HELPER_1(vfp_negs, f32, f32)
-DEF_HELPER_1(vfp_negd, f64, f64)
-DEF_HELPER_1(vfp_absh, f16, f16)
-DEF_HELPER_1(vfp_abss, f32, f32)
-DEF_HELPER_1(vfp_absd, f64, f64)
 DEF_HELPER_2(vfp_sqrth, f16, f16, env)
 DEF_HELPER_2(vfp_sqrts, f32, f32, env)
 DEF_HELPER_2(vfp_sqrtd, f64, f64, env)
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index ecfa242eef3..b05a9eb6685 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -406,6 +406,36 @@ static inline void gen_swstep_exception(DisasContext *s, 
int isv, int ex)
  */
 uint64_t vfp_expand_imm(int size, uint8_t imm8);
 
+static inline void gen_vfp_absh(TCGv_i32 d, TCGv_i32 s)
+{
+tcg_gen_andi_i32(d, s, INT16_MAX);
+}
+
+static inline void gen_vfp_abss(TCGv_i32 d, TCGv_i32 s)
+{
+tcg_gen_andi_i32(d, s, INT32_MAX);
+}
+
+static inline void gen_vfp_absd(TCGv_i64 d, TCGv_i64 s)
+{
+tcg_gen_andi_i64(d, s, INT64_MAX);
+}
+
+static inline void gen_vfp_negh(TCGv_i32 d, TCGv_i32 s)
+{
+tcg_gen_xori_i32(d, s, 1u << 15);
+}
+
+static inline void gen_vfp_negs(TCGv_i32 d, TCGv_i32 s)
+{
+tcg_gen_xori_i32(d, s, 1u << 31);
+}
+
+static inline void gen_vfp_negd(TCGv_i64 d, TCGv_i64 s)
+{
+tcg_gen_xori_i64(d, s, 1ull << 63);
+}
+
 /* Vector operations shared between ARM and AArch64.  */
 void gen_gvec_ceq0(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 6f8207d842b..878f83298f5 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -6591,10 +6591,10 @@ static void handle_fp_1src_half(DisasContext *s, int 
opcode, int rd, int rn)
 tcg_gen_mov_i32(tcg_res, tcg_op);
 break;
 case 0x1: /* FABS */
-tcg_gen_andi_i32(tcg_res, tcg_op, 0x7fff);
+gen_vfp_absh(tcg_res, tcg_op);
 break;
 case 0x2: /* FNEG */
-tcg_gen_xori_i32(tcg_res, tcg_op, 0x8000);
+gen_vfp_negh(tcg_res, tcg_op);
 break;
 case 0x3: /* FSQRT */
 fpst = fpstatus_ptr(FPST_FPCR_F16);
@@ -6645,10 +6645,10 @@ static void handle_fp_1src_single(DisasContext *s, int 
opcode, int rd, int rn)
 tcg_gen_mov_i32(tcg_res, tcg_op);
 goto done;
 case 0x1: /* FABS */
-gen_helper_vfp_abss(tcg_res, tcg_op);
+gen_vfp_abss(tcg_res, tcg_op);
 goto done;
 case 0x2: /* FNEG */
-gen_helper_vfp_negs(tcg_res, tcg_op);
+gen_vfp_negs(tcg_res, tcg_op);
 goto done;
 case 0x3: /* FSQRT */
 gen_helper_vfp_sqrts(tcg_res, tcg_op, tcg_env);
@@ -6720,10 +6720,10 @@ static void handle_fp_1src_double(DisasContext *s, int 
opcode, int rd, int rn)
 
 switch (opcode) {
 case 0x1: /* FABS */
-gen_helper_vfp_absd(tcg_res, tcg_op);
+gen_vfp_absd(tcg_res, tcg_op);
 goto done;
 case 0x2: /* FNEG */
-gen_helper_vfp_negd(tcg_res, tcg_op);
+gen_vfp_negd(tcg_res, tcg_op);
 goto done;
 case 0x3: /* FSQRT */
 gen_helper_vfp_sqrtd(tcg_res, tcg_op, tcg_env);
@@ -6949,7 +6949,7 @@ static void handle_fp_2src_single(DisasContext *s, int 
opcode,
 switch (opcode) {
 case 0x8: /* FNMUL */
 gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_helper_vfp_negs(tcg_res, tcg_res);
+gen_vfp_negs(tcg_res, tcg_res);
 break;
 default:
 case 0x0: /* FMUL */
@@ -6983,7 +6983,7 @@ static void handle_fp_2src_double(DisasContext *s, int 
opcode,
 switch (opcode) {
 case 0x8: /* FNMUL */
 gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_helper_vfp_negd(tcg_res, tcg_res);
+gen_vfp_negd(tcg_res, tcg_res);
 break;
 default:
 case 0x0: /* FMUL */
@@ -7017,7 +7017,7 @@ static void handle_fp_2src_half(DisasContext *s, int 
opcode,
 switch (opcode) {
 case 0x8: /* FNMUL */
 gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
-tcg_gen_xori_i32(tcg_res, tcg_res, 0x8000);
+gen_vfp_negh(t

[PULL 24/42] target/arm: Convert FMULX to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Convert all forms (scalar, vector, scalar indexed, vector indexed),
which allows us to remove switch table entries elsewhere.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-19-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/helper-a64.h|   8 ++
 target/arm/tcg/a64.decode  |  45 +++
 target/arm/tcg/translate-a64.c | 221 +++--
 target/arm/tcg/vec_helper.c|  39 +++---
 4 files changed, 259 insertions(+), 54 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index 05181653999..b79751a7170 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -132,3 +132,11 @@ DEF_HELPER_4(cpye, void, env, i32, i32, i32)
 DEF_HELPER_4(cpyfp, void, env, i32, i32, i32)
 DEF_HELPER_4(cpyfm, void, env, i32, i32, i32)
 DEF_HELPER_4(cpyfe, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_5(gvec_fmulx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmulx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_fmulx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+
+DEF_HELPER_FLAGS_5(gvec_fmulx_idx_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, 
ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmulx_idx_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, 
ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fmulx_idx_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, 
ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index d5bfeae7a82..2e0e01be017 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -20,21 +20,44 @@
 #
 
 %rd 0:5
+%esz_sd 22:1 !function=plus_2
+%hl 11:1 21:1
+%hlm11:1 20:2
 
 &r  rn
 &ri rd imm
 &rri_sf rd rn imm sf
 &i  imm
+&rrr_e  rd rn rm esz
+&rrx_e  rd rn rm idx esz
 &qrr_e  q rd rn esz
 &qrrr_e q rd rn rm esz
+&qrrx_e q rd rn rm idx esz
 &q_eq rd rn rm ra esz
 
+@rrr_h   ... rm:5 .. rn:5 rd:5  &rrr_e esz=1
+@rrr_sd  ... rm:5 .. rn:5 rd:5  &rrr_e esz=%esz_sd
+
+@rrx_h   .. .. rm:4  . . rn:5 rd:5  &rrx_e esz=1 idx=%hlm
+@rrx_s   .. . rm:5   . . rn:5 rd:5  &rrx_e esz=2 idx=%hl
+@rrx_d   .. . rm:5   idx:1 . rn:5 rd:5  &rrx_e esz=3
+
 @rr_q1e0  .. rn:5 rd:5  &qrr_e q=1 esz=0
 @r2r_q1e0     .. rm:5 rd:5  &qrrr_e rn=%rd q=1 
esz=0
 @rrr_q1e0    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=0
 @rrr_q1e3    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=3
 @_q1e3   ... rm:5 . ra:5 rn:5 rd:5  &q_e q=1 esz=3
 
+@qrrr_h . q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=1
+@qrrr_sd. q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=%esz_sd
+
+@qrrx_h . q:1 ..  .. .. rm:4  . . rn:5 rd:5 \
+&qrrx_e esz=1 idx=%hlm
+@qrrx_s . q:1 ..  .. . rm:5   . . rn:5 rd:5 \
+&qrrx_e esz=2 idx=%hl
+@qrrx_d . q:1 ..  .. . rm:5   idx:1 . rn:5 rd:5 \
+&qrrx_e esz=3
+
 ### Data Processing - Immediate
 
 # PC-rel addressing
@@ -671,3 +694,25 @@ INS_general 0 1   00 1110 000 imm:5 0 0011 1 rn:5 rd:5
 SMOV0 q:1 00 1110 000 imm:5 0 0101 1 rn:5 rd:5
 UMOV0 q:1 00 1110 000 imm:5 0 0111 1 rn:5 rd:5
 INS_element 0 1   10 1110 000 di:5  0 si:4 1 rn:5 rd:5
+
+### Advanced SIMD scalar three same
+
+FMULX_s 0101 1110 010 . 00011 1 . . @rrr_h
+FMULX_s 0101 1110 0.1 . 11011 1 . . @rrr_sd
+
+### Advanced SIMD three same
+
+FMULX_v 0.00 1110 010 . 00011 1 . . @qrrr_h
+FMULX_v 0.00 1110 0.1 . 11011 1 . . @qrrr_sd
+
+### Advanced SIMD scalar x indexed element
+
+FMULX_si0111  00 ..  1001 . 0 . .   @rrx_h
+FMULX_si0111  10 . . 1001 . 0 . .   @rrx_s
+FMULX_si0111  11 0 . 1001 . 0 . .   @rrx_d
+
+### Advanced SIMD vector x indexed element
+
+FMULX_vi0.10  00 ..  1001 . 0 . .   @qrrx_h
+FMULX_vi0.10  10 . . 1001 . 0 . .   @qrrx_s
+FMULX_vi0.10  11 0 . 1001 . 0 . .   @qrrx_d
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 1a12bf22fd8..8cbe6cd70f2 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4841,6 +4841,178 @@ static bool trans_INS_element(DisasContext *s, 
arg_INS_element *a)
 return true;
 }
 
+/*
+ * Advanced SIMD three same
+ */
+
+typedef struct FPScalar {
+void (*gen_h)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+void (*gen_s)(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+void (*gen_d)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_p

[PULL 30/42] target/arm: Convert FMLA, FMLS to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-25-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|   2 +
 target/arm/tcg/a64.decode  |  22 +++
 target/arm/tcg/translate-a64.c | 241 +
 target/arm/tcg/vec_helper.c|  14 ++
 4 files changed, 163 insertions(+), 116 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 0fd01c9c52d..e021c185178 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -770,9 +770,11 @@ DEF_HELPER_FLAGS_5(gvec_fmls_s, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_vfma_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vfma_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vfma_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_vfms_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_vfms_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_vfms_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_ftsmul_h, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index cde4b86303d..11527bb5e5e 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -742,12 +742,26 @@ FMINNM_v0.00 1110 1.1 . 11000 1 . . 
@qrrr_sd
 FMULX_v 0.00 1110 010 . 00011 1 . . @qrrr_h
 FMULX_v 0.00 1110 0.1 . 11011 1 . . @qrrr_sd
 
+FMLA_v  0.00 1110 010 . 1 1 . . @qrrr_h
+FMLA_v  0.00 1110 0.1 . 11001 1 . . @qrrr_sd
+
+FMLS_v  0.00 1110 110 . 1 1 . . @qrrr_h
+FMLS_v  0.00 1110 1.1 . 11001 1 . . @qrrr_sd
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
 FMUL_si 0101  10 . . 1001 . 0 . .   @rrx_s
 FMUL_si 0101  11 0 . 1001 . 0 . .   @rrx_d
 
+FMLA_si 0101  00 ..  0001 . 0 . .   @rrx_h
+FMLA_si 0101  10 ..  0001 . 0 . .   @rrx_s
+FMLA_si 0101  11 0.  0001 . 0 . .   @rrx_d
+
+FMLS_si 0101  00 ..  0101 . 0 . .   @rrx_h
+FMLS_si 0101  10 ..  0101 . 0 . .   @rrx_s
+FMLS_si 0101  11 0.  0101 . 0 . .   @rrx_d
+
 FMULX_si0111  00 ..  1001 . 0 . .   @rrx_h
 FMULX_si0111  10 . . 1001 . 0 . .   @rrx_s
 FMULX_si0111  11 0 . 1001 . 0 . .   @rrx_d
@@ -758,6 +772,14 @@ FMUL_vi 0.00  00 ..  1001 . 0 . .  
 @qrrx_h
 FMUL_vi 0.00  10 . . 1001 . 0 . .   @qrrx_s
 FMUL_vi 0.00  11 0 . 1001 . 0 . .   @qrrx_d
 
+FMLA_vi 0.00  00 ..  0001 . 0 . .   @qrrx_h
+FMLA_vi 0.00  10 . . 0001 . 0 . .   @qrrx_s
+FMLA_vi 0.00  11 0 . 0001 . 0 . .   @qrrx_d
+
+FMLS_vi 0.00  00 ..  0101 . 0 . .   @qrrx_h
+FMLS_vi 0.00  10 . . 0101 . 0 . .   @qrrx_s
+FMLS_vi 0.00  11 0 . 0101 . 0 . .   @qrrx_d
+
 FMULX_vi0.10  00 ..  1001 . 0 . .   @qrrx_h
 FMULX_vi0.10  10 . . 1001 . 0 . .   @qrrx_s
 FMULX_vi0.10  11 0 . 1001 . 0 . .   @qrrx_d
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5ba30ba7c86..f84c12378dc 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5066,6 +5066,20 @@ static gen_helper_gvec_3_ptr * const f_vector_fmulx[3] = 
{
 };
 TRANS(FMULX_v, do_fp3_vector, a, f_vector_fmulx)
 
+static gen_helper_gvec_3_ptr * const f_vector_fmla[3] = {
+gen_helper_gvec_vfma_h,
+gen_helper_gvec_vfma_s,
+gen_helper_gvec_vfma_d,
+};
+TRANS(FMLA_v, do_fp3_vector, a, f_vector_fmla)
+
+static gen_helper_gvec_3_ptr * const f_vector_fmls[3] = {
+gen_helper_gvec_vfms_h,
+gen_helper_gvec_vfms_s,
+gen_helper_gvec_vfms_d,
+};
+TRANS(FMLS_v, do_fp3_vector, a, f_vector_fmls)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -5115,6 +5129,64 @@ static bool do_fp3_scalar_idx(DisasContext *s, arg_rrx_e 
*a, const FPScalar *f)
 TRANS(FMUL_si, do_fp3_scalar_idx, a, &f_scalar_fmul)
 TRANS(FMULX_si, do_fp3_scalar_idx, a, &f_scalar_fmulx)
 
+static bool do_fmla_scalar_idx(DisasContext *s, arg_rrx_e *a, bool neg)
+{
+switch (a->esz) {
+case MO_64:
+if (fp_access_check(s)) {
+TCGv_i64 t0 = read_fp_dreg(s, a->rd);
+TCGv_i64 t1 = read_fp_dreg(s, a->rn);
+TCGv_i64 t2 = tcg_temp_new_i64();
+
+read_vec_element(s, t2

[PULL 37/42] target/arm: Convert ADDP to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-32-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|   5 ++
 target/arm/tcg/translate.h |   3 +
 target/arm/tcg/a64.decode  |   6 ++
 target/arm/tcg/gengvec.c   |  12 
 target/arm/tcg/translate-a64.c | 128 ++---
 target/arm/tcg/vec_helper.c|  30 
 6 files changed, 77 insertions(+), 107 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 065460ea80e..d3579a101f4 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1061,6 +1061,11 @@ DEF_HELPER_FLAGS_5(gvec_fminnump_h, TCG_CALL_NO_RWG, 
void, ptr, ptr, ptr, ptr, i
 DEF_HELPER_FLAGS_5(gvec_fminnump_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_fminnump_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
+DEF_HELPER_FLAGS_4(gvec_addp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_addp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_addp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_addp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "tcg/helper-a64.h"
 #include "tcg/helper-sve.h"
diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index b05a9eb6685..04771f483b6 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -514,6 +514,9 @@ void gen_gvec_saba(unsigned vece, uint32_t rd_ofs, uint32_t 
rn_ofs,
 void gen_gvec_uaba(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
 
+void gen_gvec_addp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+   uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
+
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
  */
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 43557fdccc6..84f5bcc0e08 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -38,6 +38,7 @@
 &q_eq rd rn rm ra esz
 
 @rr_h    ... . .. rn:5 rd:5 &rr_e esz=1
+@rr_d    ... . .. rn:5 rd:5 &rr_e esz=3
 @rr_sd   ... . .. rn:5 rd:5 &rr_e esz=%esz_sd
 
 @rrr_h   ... rm:5 .. rn:5 rd:5  &rrr_e esz=1
@@ -56,6 +57,7 @@
 
 @qrrr_h . q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=1
 @qrrr_sd. q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=%esz_sd
+@qrrr_e . q:1 .. esz:2 . rm:5 .. rn:5 rd:5  &qrrr_e
 
 @qrrx_h . q:1 ..  .. .. rm:4  . . rn:5 rd:5 \
 &qrrx_e esz=1 idx=%hlm
@@ -758,6 +760,8 @@ FMAXNMP_s   0111 1110 0.11  1100 10 . . 
@rr_sd
 FMINNMP_s   0101 1110 1011  1100 10 . . @rr_h
 FMINNMP_s   0111 1110 1.11  1100 10 . . @rr_sd
 
+ADDP_s  0101 1110  0001 1011 10 . . @rr_d
+
 ### Advanced SIMD three same
 
 FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
@@ -832,6 +836,8 @@ FMAXNMP_v   0.10 1110 0.1 . 11000 1 . . 
@qrrr_sd
 FMINNMP_v   0.10 1110 110 . 0 1 . . @qrrr_h
 FMINNMP_v   0.10 1110 1.1 . 11000 1 . . @qrrr_sd
 
+ADDP_v  0.00 1110 ..1 . 10111 1 . . @qrrr_e
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 7a1856253ff..f010dd5a0e8 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -1610,3 +1610,15 @@ void gen_gvec_uaba(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 };
 tcg_gen_gvec_3(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, &ops[vece]);
 }
+
+void gen_gvec_addp(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+   uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz)
+{
+static gen_helper_gvec_3 * const fns[4] = {
+gen_helper_gvec_addp_b,
+gen_helper_gvec_addp_h,
+gen_helper_gvec_addp_s,
+gen_helper_gvec_addp_d,
+};
+tcg_gen_gvec_3_ool(rd_ofs, rn_ofs, rm_ofs, opr_sz, max_sz, 0, fns[vece]);
+}
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 07415bd2855..b8add91112d 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5245,6 +5245,8 @@ static gen_helper_gvec_3_ptr * const f_vector_fminnmp[3] 
= {
 };
 TRANS(FMINNMP_v, do_fp3_vector, a, f_vector_fminnmp)
 
+TRANS(ADDP_v, do_gvec_fn3, a, gen_gvec_addp)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -5485,6 +5487,20 @@ TRANS(FMINP_s, do_fp3_scalar_pair, a, &f_scalar_fmin)
 TRANS(FMAXNMP_s, do_fp3_scalar_pair, a, &f_scalar_fmaxnm)
 TRANS(FMINNMP_s, do_fp3_scalar_pair, a, &f_scalar_fminnm)
 
+static bool tr

[PULL 07/42] hw: arm: Remove use of tabs in some source files

2024-05-28 Thread Peter Maydell

From: Tanmay Patil 

Some of the source files for older devices use hardcoded tabs
instead of our current coding standard's required spaces.
Fix these in the following files:
- hw/arm/boot.c
- hw/char/omap_uart.c
- hw/gpio/zaurus.c
- hw/input/tsc2005.c

This commit is mostly whitespace-only changes; it also
adds curly-braces to some 'if' statements.

This addresses part of https://gitlab.com/qemu-project/qemu/-/issues/373
but some other files remain to be handled.

Signed-off-by: Tanmay Patil 
Message-id: 20240508081502.88375-1-tanmaynpatil...@gmail.com
Reviewed-by: Peter Maydell 
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell 
---
 hw/arm/boot.c   |   8 +--
 hw/char/omap_uart.c |  49 +
 hw/gpio/zaurus.c|  59 ++--
 hw/input/tsc2005.c  | 130 
 4 files changed, 130 insertions(+), 116 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 84ea6a807a4..d480a7da02c 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -347,13 +347,13 @@ static void set_kernel_args_old(const struct 
arm_boot_info *info,
 WRITE_WORD(p, info->ram_size / 4096);
 /* ramdisk_size */
 WRITE_WORD(p, 0);
-#define FLAG_READONLY  1
-#define FLAG_RDLOAD4
-#define FLAG_RDPROMPT  8
+#define FLAG_READONLY 1
+#define FLAG_RDLOAD   4
+#define FLAG_RDPROMPT 8
 /* flags */
 WRITE_WORD(p, FLAG_READONLY | FLAG_RDLOAD | FLAG_RDPROMPT);
 /* rootdev */
-WRITE_WORD(p, (31 << 8) | 0);  /* /dev/mtdblock0 */
+WRITE_WORD(p, (31 << 8) | 0); /* /dev/mtdblock0 */
 /* video_num_cols */
 WRITE_WORD(p, 0);
 /* video_num_rows */
diff --git a/hw/char/omap_uart.c b/hw/char/omap_uart.c
index 6848bddb4e2..c2ef4c137e1 100644
--- a/hw/char/omap_uart.c
+++ b/hw/char/omap_uart.c
@@ -61,7 +61,7 @@ struct omap_uart_s *omap_uart_init(hwaddr base,
 s->fclk = fclk;
 s->irq = irq;
 s->serial = serial_mm_init(get_system_memory(), base, 2, irq,
-   omap_clk_getrate(fclk)/16,
+   omap_clk_getrate(fclk) / 16,
chr ?: qemu_chr_new(label, "null", NULL),
DEVICE_NATIVE_ENDIAN);
 return s;
@@ -76,27 +76,27 @@ static uint64_t omap_uart_read(void *opaque, hwaddr addr, 
unsigned size)
 }
 
 switch (addr) {
-case 0x20: /* MDR1 */
+case 0x20:  /* MDR1 */
 return s->mdr[0];
-case 0x24: /* MDR2 */
+case 0x24:  /* MDR2 */
 return s->mdr[1];
-case 0x40: /* SCR */
+case 0x40:  /* SCR */
 return s->scr;
-case 0x44: /* SSR */
+case 0x44:  /* SSR */
 return 0x0;
-case 0x48: /* EBLR (OMAP2) */
+case 0x48:  /* EBLR (OMAP2) */
 return s->eblr;
-case 0x4C: /* OSC_12M_SEL (OMAP1) */
+case 0x4C:  /* OSC_12M_SEL (OMAP1) */
 return s->clksel;
-case 0x50: /* MVR */
+case 0x50:  /* MVR */
 return 0x30;
-case 0x54: /* SYSC (OMAP2) */
+case 0x54:  /* SYSC (OMAP2) */
 return s->syscontrol;
-case 0x58: /* SYSS (OMAP2) */
+case 0x58:  /* SYSS (OMAP2) */
 return 1;
-case 0x5c: /* WER (OMAP2) */
+case 0x5c:  /* WER (OMAP2) */
 return s->wkup;
-case 0x60: /* CFPS (OMAP2) */
+case 0x60:  /* CFPS (OMAP2) */
 return s->cfps;
 }
 
@@ -115,35 +115,36 @@ static void omap_uart_write(void *opaque, hwaddr addr,
 }
 
 switch (addr) {
-case 0x20: /* MDR1 */
+case 0x20:  /* MDR1 */
 s->mdr[0] = value & 0x7f;
 break;
-case 0x24: /* MDR2 */
+case 0x24:  /* MDR2 */
 s->mdr[1] = value & 0xff;
 break;
-case 0x40: /* SCR */
+case 0x40:  /* SCR */
 s->scr = value & 0xff;
 break;
-case 0x48: /* EBLR (OMAP2) */
+case 0x48:  /* EBLR (OMAP2) */
 s->eblr = value & 0xff;
 break;
-case 0x4C: /* OSC_12M_SEL (OMAP1) */
+case 0x4C:  /* OSC_12M_SEL (OMAP1) */
 s->clksel = value & 1;
 break;
-case 0x44: /* SSR */
-case 0x50: /* MVR */
-case 0x58: /* SYSS (OMAP2) */
+case 0x44:  /* SSR */
+case 0x50:  /* MVR */
+case 0x58:  /* SYSS (OMAP2) */
 OMAP_RO_REG(addr);
 break;
-case 0x54: /* SYSC (OMAP2) */
+case 0x54:  /* SYSC (OMAP2) */
 s->syscontrol = value & 0x1d;
-if (value & 2)
+if (value & 2) {
 omap_uart_reset(s);
+}
 break;
-case 0x5c: /* WER (OMAP2) */
+case 0x5c:  /* WER (OMAP2) */
 s->wkup = value & 0x7f;
 break;
-case 0x60: /* CFPS (OMAP2) */
+case 0x60:  /* CFPS (OMAP2) */
 s->cfps = value & 0xff;
 break;
 default:
diff --git a/hw/gpio/zaurus.c b/hw/gpio/zaurus.c
index 5884804c589..7342440b958 100644
--- a/hw/gpio/zaurus.c
+++ b/hw/gpio/zaurus.c
@@ -49,19 +49,20 @@ struct ScoopInfo {
 uint16_t isr;
 };
 
-#define SCOOP_MCR  0x00
-#define SCOOP_CDR  0x04
-#

[PULL 42/42] target/arm: Convert disas_simd_3same_logic to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

This includes AND, ORR, EOR, BIC, ORN, BSF, BIT, BIF.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-37-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  | 10 +
 target/arm/tcg/translate-a64.c | 68 ++
 2 files changed, 29 insertions(+), 49 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7e993ed345f..f48adef5bba 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -55,6 +55,7 @@
 @rrr_q1e3    ... rm:5 .. rn:5 rd:5  &qrrr_e q=1 esz=3
 @_q1e3   ... rm:5 . ra:5 rn:5 rd:5  &q_e q=1 esz=3
 
+@qrrr_b . q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=0
 @qrrr_h . q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=1
 @qrrr_sd. q:1 .. ... rm:5 .. rn:5 rd:5  &qrrr_e esz=%esz_sd
 @qrrr_e . q:1 .. esz:2 . rm:5 .. rn:5 rd:5  &qrrr_e
@@ -847,6 +848,15 @@ SMINP_v 0.00 1110 ..1 . 10101 1 . . 
@qrrr_e
 UMAXP_v 0.10 1110 ..1 . 10100 1 . . @qrrr_e
 UMINP_v 0.10 1110 ..1 . 10101 1 . . @qrrr_e
 
+AND_v   0.00 1110 001 . 00011 1 . . @qrrr_b
+BIC_v   0.00 1110 011 . 00011 1 . . @qrrr_b
+ORR_v   0.00 1110 101 . 00011 1 . . @qrrr_b
+ORN_v   0.00 1110 111 . 00011 1 . . @qrrr_b
+EOR_v   0.10 1110 001 . 00011 1 . . @qrrr_b
+BSL_v   0.10 1110 011 . 00011 1 . . @qrrr_b
+BIT_v   0.10 1110 101 . 00011 1 . . @qrrr_b
+BIF_v   0.10 1110 111 . 00011 1 . . @qrrr_b
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index a4ff1fd2027..9167e4d0bd6 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5280,6 +5280,24 @@ TRANS(SMINP_v, do_gvec_fn3_no64, a, gen_gvec_sminp)
 TRANS(UMAXP_v, do_gvec_fn3_no64, a, gen_gvec_umaxp)
 TRANS(UMINP_v, do_gvec_fn3_no64, a, gen_gvec_uminp)
 
+TRANS(AND_v, do_gvec_fn3, a, tcg_gen_gvec_and)
+TRANS(BIC_v, do_gvec_fn3, a, tcg_gen_gvec_andc)
+TRANS(ORR_v, do_gvec_fn3, a, tcg_gen_gvec_or)
+TRANS(ORN_v, do_gvec_fn3, a, tcg_gen_gvec_orc)
+TRANS(EOR_v, do_gvec_fn3, a, tcg_gen_gvec_xor)
+
+static bool do_bitsel(DisasContext *s, bool is_q, int d, int a, int b, int c)
+{
+if (fp_access_check(s)) {
+gen_gvec_fn4(s, is_q, d, a, b, c, tcg_gen_gvec_bitsel, 0);
+}
+return true;
+}
+
+TRANS(BSL_v, do_bitsel, a->q, a->rd, a->rd, a->rn, a->rm)
+TRANS(BIT_v, do_bitsel, a->q, a->rd, a->rm, a->rn, a->rd)
+TRANS(BIF_v, do_bitsel, a->q, a->rd, a->rm, a->rd, a->rn)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -10901,52 +10919,6 @@ static void disas_simd_three_reg_diff(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Logic op (opcode == 3) subgroup of C3.6.16. */
-static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
-{
-int rd = extract32(insn, 0, 5);
-int rn = extract32(insn, 5, 5);
-int rm = extract32(insn, 16, 5);
-int size = extract32(insn, 22, 2);
-bool is_u = extract32(insn, 29, 1);
-bool is_q = extract32(insn, 30, 1);
-
-if (!fp_access_check(s)) {
-return;
-}
-
-switch (size + 4 * is_u) {
-case 0: /* AND */
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_and, 0);
-return;
-case 1: /* BIC */
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_andc, 0);
-return;
-case 2: /* ORR */
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_or, 0);
-return;
-case 3: /* ORN */
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_orc, 0);
-return;
-case 4: /* EOR */
-gen_gvec_fn3(s, is_q, rd, rn, rm, tcg_gen_gvec_xor, 0);
-return;
-
-case 5: /* BSL bitwise select */
-gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
-return;
-case 6: /* BIT, bitwise insert if true */
-gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
-return;
-case 7: /* BIF, bitwise insert if false */
-gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
-return;
-
-default:
-g_assert_not_reached();
-}
-}
-
 /* Integer op subgroup of C3.6.16. */
 static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
 {
@@ -11212,12 +11184,10 @@ static void disas_simd_three_reg_same(DisasContext 
*s, uint32_t insn)
 int opcode = extract32(insn, 11, 5);
 
 switch (opcode) {
-case 0x3: /* logic ops */
-disas_simd_3same_logic(s, insn);
-break;
 default:
 disas_simd_3same_int(s, insn);
 break;
+case 0x3: /* logic ops */
 case 0x14: /* SMAXP, UMAXP */

[PULL 10/42] target/arm: Zero-extend writeback for fp16 FCVTZS (scalar, integer)

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Fixes RISU mismatch for "fcvtzs h31, h0, #14".

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-id: 20240524232121.284515-5-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/translate-a64.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 4126aaa27e6..d97acdbaf9a 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -8707,6 +8707,9 @@ static void handle_simd_shift_fpint_conv(DisasContext *s, 
bool is_scalar,
 read_vec_element_i32(s, tcg_op, rn, pass, size);
 fn(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
 if (is_scalar) {
+if (size == MO_16 && !is_u) {
+tcg_gen_ext16u_i32(tcg_op, tcg_op);
+}
 write_fp_sreg(s, rd, tcg_op);
 } else {
 write_vec_element_i32(s, tcg_op, rd, pass, size);
-- 
2.34.1

[PULL 38/42] target/arm: Use gvec for neon padd

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-33-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h | 2 --
 target/arm/tcg/neon_helper.c| 5 -
 target/arm/tcg/translate-neon.c | 3 +--
 3 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index d3579a101f4..51ed49aa50c 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -354,8 +354,6 @@ DEF_HELPER_3(neon_qrshl_s64, i64, env, i64, i64)
 
 DEF_HELPER_2(neon_add_u8, i32, i32, i32)
 DEF_HELPER_2(neon_add_u16, i32, i32, i32)
-DEF_HELPER_2(neon_padd_u8, i32, i32, i32)
-DEF_HELPER_2(neon_padd_u16, i32, i32, i32)
 DEF_HELPER_2(neon_sub_u8, i32, i32, i32)
 DEF_HELPER_2(neon_sub_u16, i32, i32, i32)
 DEF_HELPER_2(neon_mul_u8, i32, i32, i32)
diff --git a/target/arm/tcg/neon_helper.c b/target/arm/tcg/neon_helper.c
index bc6c4a54e9d..a0b51c88096 100644
--- a/target/arm/tcg/neon_helper.c
+++ b/target/arm/tcg/neon_helper.c
@@ -745,11 +745,6 @@ uint32_t HELPER(neon_add_u16)(uint32_t a, uint32_t b)
 return (a + b) ^ mask;
 }
 
-#define NEON_FN(dest, src1, src2) dest = src1 + src2
-NEON_POP(padd_u8, neon_u8, 4)
-NEON_POP(padd_u16, neon_u16, 2)
-#undef NEON_FN
-
 #define NEON_FN(dest, src1, src2) dest = src1 - src2
 NEON_VOP(sub_u8, neon_u8, 4)
 NEON_VOP(sub_u16, neon_u16, 2)
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 2326a05a0aa..6c5a7a98e1b 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -830,6 +830,7 @@ DO_3SAME_NO_SZ_3(VABD_S, gen_gvec_sabd)
 DO_3SAME_NO_SZ_3(VABA_S, gen_gvec_saba)
 DO_3SAME_NO_SZ_3(VABD_U, gen_gvec_uabd)
 DO_3SAME_NO_SZ_3(VABA_U, gen_gvec_uaba)
+DO_3SAME_NO_SZ_3(VPADD, gen_gvec_addp)
 
 #define DO_3SAME_CMP(INSN, COND)\
 static void gen_##INSN##_3s(unsigned vece, uint32_t rd_ofs, \
@@ -1070,13 +1071,11 @@ static bool do_3same_pair(DisasContext *s, arg_3same 
*a, NeonGenTwoOpFn *fn)
 #define gen_helper_neon_pmax_u32  tcg_gen_umax_i32
 #define gen_helper_neon_pmin_s32  tcg_gen_smin_i32
 #define gen_helper_neon_pmin_u32  tcg_gen_umin_i32
-#define gen_helper_neon_padd_u32  tcg_gen_add_i32
 
 DO_3SAME_PAIR(VPMAX_S, pmax_s)
 DO_3SAME_PAIR(VPMIN_S, pmin_s)
 DO_3SAME_PAIR(VPMAX_U, pmax_u)
 DO_3SAME_PAIR(VPMIN_U, pmin_u)
-DO_3SAME_PAIR(VPADD, padd_u)
 
 #define DO_3SAME_VQDMULH(INSN, FUNC)\
 WRAP_ENV_FN(gen_##INSN##_tramp16, gen_helper_neon_##FUNC##_s16);\
-- 
2.34.1

[PULL v2 00/42] target-arm queue

2024-05-28 Thread Peter Maydell

Hi; most of this is the first half of the A64 simd decodetree
conversion; the rest is a mix of fixes from the last couple of weeks.

v2 uses patches from the v2 decodetree series to avoid a few
regressions in some A32 insns.

(Richard: I'm still planning to review the second half of the
v2 decodetree series; I just wanted to get the respin of this
pullreq out today...)

thanks
-- PMM

The following changes since commit ad10b4badc1dd5b28305f9b9f1168cf0aa3ae946:

  Merge tag 'pull-error-2024-05-27' of https://repo.or.cz/qemu/armbru into 
staging (2024-05-27 06:40:42 -0700)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20240528

for you to fetch changes up to f240df3c31b40e4cf1af1f156a88efc1a1df406c:

  target/arm: Convert disas_simd_3same_logic to decodetree (2024-05-28 14:29:01 
+0100)


target-arm queue:
 * xlnx_dpdma: fix descriptor endianness bug
 * hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers
 * hw/arm/npcm7xx: remove setting of mp-affinity
 * hw/char: Correct STM32L4x5 usart register CR2 field ADD_0 size
 * hw/intc/arm_gic: Fix handling of NS view of GICC_APR
 * hw/input/tsc2005: Fix -Wchar-subscripts warning in tsc2005_txrx()
 * hw: arm: Remove use of tabs in some source files
 * docs/system: Remove ADC from raspi documentation
 * target/arm: Start of the conversion of A64 SIMD to decodetree


Alexandra Diupina (1):
  xlnx_dpdma: fix descriptor endianness bug

Andrey Shumilin (1):
  hw/intc/arm_gic: Fix handling of NS view of GICC_APR

Dorjoy Chowdhury (1):
  hw/arm/npcm7xx: remove setting of mp-affinity

Inès Varhol (1):
  hw/char: Correct STM32L4x5 usart register CR2 field ADD_0 size

Philippe Mathieu-Daudé (1):
  hw/input/tsc2005: Fix -Wchar-subscripts warning in tsc2005_txrx()

Rayhan Faizel (1):
  docs/system: Remove ADC from raspi documentation

Richard Henderson (34):
  target/arm: Use PLD, PLDW, PLI not NOP for t32
  target/arm: Zero-extend writeback for fp16 FCVTZS (scalar, integer)
  target/arm: Fix decode of FMOV (hp) vs MOVI
  target/arm: Verify sz=0 for Advanced SIMD scalar pairwise (fp16)
  target/arm: Split out gengvec.c
  target/arm: Split out gengvec64.c
  target/arm: Convert Cryptographic AES to decodetree
  target/arm: Convert Cryptographic 3-register SHA to decodetree
  target/arm: Convert Cryptographic 2-register SHA to decodetree
  target/arm: Convert Cryptographic 3-register SHA512 to decodetree
  target/arm: Convert Cryptographic 2-register SHA512 to decodetree
  target/arm: Convert Cryptographic 4-register to decodetree
  target/arm: Convert Cryptographic 3-register, imm2 to decodetree
  target/arm: Convert XAR to decodetree
  target/arm: Convert Advanced SIMD copy to decodetree
  target/arm: Convert FMULX to decodetree
  target/arm: Convert FADD, FSUB, FDIV, FMUL to decodetree
  target/arm: Convert FMAX, FMIN, FMAXNM, FMINNM to decodetree
  target/arm: Introduce vfp_load_reg16
  target/arm: Expand vfp neg and abs inline
  target/arm: Convert FNMUL to decodetree
  target/arm: Convert FMLA, FMLS to decodetree
  target/arm: Convert FCMEQ, FCMGE, FCMGT, FACGE, FACGT to decodetree
  target/arm: Convert FABD to decodetree
  target/arm: Convert FRECPS, FRSQRTS to decodetree
  target/arm: Convert FADDP to decodetree
  target/arm: Convert FMAXP, FMINP, FMAXNMP, FMINNMP to decodetree
  target/arm: Use gvec for neon faddp, fmaxp, fminp
  target/arm: Convert ADDP to decodetree
  target/arm: Use gvec for neon padd
  target/arm: Convert SMAXP, SMINP, UMAXP, UMINP to decodetree
  target/arm: Use gvec for neon pmax, pmin
  target/arm: Convert FMLAL, FMLSL to decodetree
  target/arm: Convert disas_simd_3same_logic to decodetree

Tanmay Patil (1):
  hw: arm: Remove use of tabs in some source files

Zenghui Yu (1):
  hvf: arm: Fix encodings for ID_AA64PFR1_EL1 and debug System registers

 docs/system/arm/raspi.rst   |1 -
 target/arm/helper.h |   68 +-
 target/arm/tcg/helper-a64.h |   12 +
 target/arm/tcg/translate-a64.h  |4 +
 target/arm/tcg/translate.h  |   51 +
 target/arm/tcg/a64.decode   |  315 +++-
 target/arm/tcg/t32.decode   |   25 +-
 hw/arm/boot.c   |8 +-
 hw/arm/npcm7xx.c|3 -
 hw/char/omap_uart.c |   49 +-
 hw/char/stm32l4x5_usart.c   |2 +-
 hw/dma/xlnx_dpdma.c |   68 +-
 hw/gpio/zaurus.c|   59 +-
 hw/input/tsc2005.c  |  135 +-
 hw/intc/arm_gic.c   |4 +-
 target/arm/hvf/hvf.c|  130 +-
 target/arm/tcg/gengvec.c| 1672 +
 target/arm/tcg/gengvec64.c  |  190 +++
 target/arm/tcg/neon_hel

[PULL 32/42] target/arm: Convert FABD to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-27-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|  1 +
 target/arm/tcg/a64.decode  |  6 
 target/arm/tcg/translate-a64.c | 60 ++
 target/arm/tcg/vec_helper.c|  6 
 4 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 8d076011c18..ff6e3094f41 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -724,6 +724,7 @@ DEF_HELPER_FLAGS_5(gvec_fmul_d, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fabd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fabd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7fc3277be67..a852b5f06f0 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -728,6 +728,9 @@ FACGE_s 0111 1110 0.1 . 11101 1 . . 
@rrr_sd
 FACGT_s 0111 1110 110 . 00101 1 . . @rrr_h
 FACGT_s 0111 1110 1.1 . 11101 1 . . @rrr_sd
 
+FABD_s  0111 1110 110 . 00010 1 . . @rrr_h
+FABD_s  0111 1110 1.1 . 11010 1 . . @rrr_sd
+
 ### Advanced SIMD three same
 
 FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
@@ -778,6 +781,9 @@ FACGE_v 0.10 1110 0.1 . 11101 1 . . 
@qrrr_sd
 FACGT_v 0.10 1110 110 . 00101 1 . . @qrrr_h
 FACGT_v 0.10 1110 1.1 . 11101 1 . . @qrrr_sd
 
+FABD_v  0.10 1110 110 . 00010 1 . . @qrrr_h
+FABD_v  0.10 1110 1.1 . 11010 1 . . @qrrr_sd
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 75b0c1a005e..633384d2a56 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5010,6 +5010,31 @@ static const FPScalar f_scalar_facgt = {
 };
 TRANS(FACGT_s, do_fp3_scalar, a, &f_scalar_facgt)
 
+static void gen_fabd_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+gen_helper_vfp_subh(d, n, m, s);
+gen_vfp_absh(d, d);
+}
+
+static void gen_fabd_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+gen_helper_vfp_subs(d, n, m, s);
+gen_vfp_abss(d, d);
+}
+
+static void gen_fabd_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+gen_helper_vfp_subd(d, n, m, s);
+gen_vfp_absd(d, d);
+}
+
+static const FPScalar f_scalar_fabd = {
+gen_fabd_h,
+gen_fabd_s,
+gen_fabd_d,
+};
+TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5150,6 +5175,13 @@ static gen_helper_gvec_3_ptr * const f_vector_facgt[3] = 
{
 };
 TRANS(FACGT_v, do_fp3_vector, a, f_vector_facgt)
 
+static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
+gen_helper_gvec_fabd_h,
+gen_helper_gvec_fabd_s,
+gen_helper_gvec_fabd_d,
+};
+TRANS(FABD_v, do_fp3_vector, a, f_vector_fabd)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -9303,10 +9335,6 @@ static void handle_3same_float(DisasContext *s, int 
size, int elements,
 case 0x3f: /* FRSQRTS */
 gen_helper_rsqrtsf_f64(tcg_res, tcg_op1, tcg_op2, fpst);
 break;
-case 0x7a: /* FABD */
-gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_vfp_absd(tcg_res, tcg_res);
-break;
 default:
 case 0x18: /* FMAXNM */
 case 0x19: /* FMLA */
@@ -9322,6 +9350,7 @@ static void handle_3same_float(DisasContext *s, int size, 
int elements,
 case 0x5c: /* FCMGE */
 case 0x5d: /* FACGE */
 case 0x5f: /* FDIV */
+case 0x7a: /* FABD */
 case 0x7c: /* FCMGT */
 case 0x7d: /* FACGT */
 g_assert_not_reached();
@@ -9344,10 +9373,6 @@ static void handle_3same_float(DisasContext *s, int 
size, int elements,
 case 0x3f: /* FRSQRTS */
 gen_helper_rsqrtsf_f32(tcg_res, tcg_op1, tcg_op2, fpst);
 break;
-case 0x7a: /* FABD */
-gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_vfp_abss(tcg_res, tcg_res);
-break;
 default:
 case 0x18: /* FMAXNM */
 case 0x19: /* FMLA */
@@ -9363,6 +9388,7 @@ static void handle_3same_float(Disas

[PULL 09/42] target/arm: Use PLD, PLDW, PLI not NOP for t32

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

This fixes a bug in that neither PLI nor PLDW are present in ARMv6T2,
but are introduced with ARMv7 and ARMv7MP respectively.
For clarity, do not use NOP for PLD.

Note that there is no PLDW (literal). Architecturally in the
T1 encoding of "PLD (literal)" bit 5 is "(0)", which means
that it should be zero and if it is not then the behaviour
is CONSTRAINED UNPREDICTABLE (might UNDEF, NOP, or ignore the
value of the bit).

In our implementation we have patterns for both:

+PLD   1000 -001   # (literal)
+PLD   1000 -011   # (literal)

and so we effectively ignore the value of bit 5.  (This is a
permitted option for this CONSTRAINED UNPREDICTABLE.) This isn't a
behaviour change in this commit, since we previously had NOP lines
for both those patterns.

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-id: 20240524232121.284515-3-richard.hender...@linaro.org
[PMM: adjusted commit message to note that PLD (lit) T1 bit 5
being 1 is an UNPREDICTABLE case.]
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/t32.decode  | 25 -
 target/arm/tcg/translate.c |  4 ++--
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/target/arm/tcg/t32.decode b/target/arm/tcg/t32.decode
index f21ad0167ab..d327178829d 100644
--- a/target/arm/tcg/t32.decode
+++ b/target/arm/tcg/t32.decode
@@ -458,41 +458,41 @@ STR_ri    1000 1100   
@ldst_ri_pos
 # Note that Load, unsigned (literal) overlaps all other load encodings.
 {
   {
-NOP   1000 -001   # PLD
+PLD   1000 -001   # (literal)
 LDRB_ri   1000 .001   @ldst_ri_lit
   }
   {
-NOP   1000 1001   # PLD
+PLD   1000 1001   # (immediate T1)
 LDRB_ri   1000 1001   @ldst_ri_pos
   }
   LDRB_ri 1000 0001   1..1    @ldst_ri_idx
   {
-NOP   1000 0001   1100    # PLD
+PLD   1000 0001   1100    # (immediate T2)
 LDRB_ri   1000 0001   1100    @ldst_ri_neg
   }
   LDRBT_ri    1000 0001   1110    @ldst_ri_unp
   {
-NOP   1000 0001   00 --   # PLD
+PLD   1000 0001   00 --   # (register)
 LDRB_rr   1000 0001   00 ..   @ldst_rr
   }
 }
 {
   {
-NOP   1000 -011   # PLD
+PLD   1000 -011   # (literal)
 LDRH_ri   1000 .011   @ldst_ri_lit
   }
   {
-NOP   1000 1011   # PLDW
+PLDW  1000 1011   # (immediate T1)
 LDRH_ri   1000 1011   @ldst_ri_pos
   }
   LDRH_ri 1000 0011   1..1    @ldst_ri_idx
   {
-NOP   1000 0011   1100    # PLDW
+PLDW  1000 0011   1100    # (immediate T2)
 LDRH_ri   1000 0011   1100    @ldst_ri_neg
   }
   LDRHT_ri    1000 0011   1110    @ldst_ri_unp
   {
-NOP   1000 0011   00 --   # PLDW
+PLDW  1000 0011   00 --   # (register)
 LDRH_rr   1000 0011   00 ..   @ldst_rr
   }
 }
@@ -504,24 +504,23 @@ STR_ri    1000 1100   
@ldst_ri_pos
   LDRT_ri 1000 0101   1110    @ldst_ri_unp
   LDR_rr  1000 0101   00 ..   @ldst_rr
 }
-# NOPs here are PLI.
 {
   {
-NOP   1001 -001   
+PLI   1001 -001   # (literal T3)
 LDRSB_ri  1001 .001   @ldst_ri_lit
   }
   {
-NOP   1001 1001   
+PLI   1001 1001   # (immediate T1)
 LDRSB_ri  1001 1001   @ldst_ri_pos
   }
   LDRSB_ri    1001 0001   1..1    @ldst_ri_idx
   {
-NOP   1001 0001   1100 
+PLI   1001 0001   1100    # (immediate T2)
 LDRSB_ri  1001 0001   1100    @ldst_ri_neg
   }
   LDRSBT_ri   1001 0001   1110    @ldst_ri_unp
   {
-NOP   1001 0001   00 -- 
+PLI   1001 0001   00 -

[PULL 19/42] target/arm: Convert Cryptographic 2-register SHA512 to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-14-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |  5 
 target/arm/tcg/translate-a64.c | 50 ++
 2 files changed, 8 insertions(+), 47 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index c342c276089..5a46205751c 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -631,3 +631,8 @@ RAX11100 1110 011 . 100011 . .  
@rrr_q1e3
 SM3PARTW1   1100 1110 011 . 11 . .  @rrr_q1e0
 SM3PARTW2   1100 1110 011 . 110001 . .  @rrr_q1e0
 SM4EKEY 1100 1110 011 . 110010 . .  @rrr_q1e0
+
+### Cryptographic two-register SHA512
+
+SHA512SU0   1100 1110 110 0 10 . .  @rr_q1e0
+SM4E1100 1110 110 0 11 . .  @r2r_q1e0
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 77b24cd52ed..eed0abe9121 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4629,6 +4629,9 @@ TRANS_FEAT(SM3PARTW1, aa64_sm3, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sm3part
 TRANS_FEAT(SM3PARTW2, aa64_sm3, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sm3partw2)
 TRANS_FEAT(SM4EKEY, aa64_sm4, do_gvec_op3_ool, a, 0, gen_helper_crypto_sm4ekey)
 
+TRANS_FEAT(SHA512SU0, aa64_sha512, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha512su0)
+TRANS_FEAT(SM4E, aa64_sm4, do_gvec_op3_ool, a, 0, gen_helper_crypto_sm4e)
+
 
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
@@ -13530,52 +13533,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto two-reg SHA512
- *  31 12  11  10  95 40
- * +-++--+--+
- * | 1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 | opcode |  Rn  |  Rd  |
- * +-++--+--+
- */
-static void disas_crypto_two_reg_sha512(DisasContext *s, uint32_t insn)
-{
-int opcode = extract32(insn, 10, 2);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-bool feature;
-
-switch (opcode) {
-case 0: /* SHA512SU0 */
-feature = dc_isar_feature(aa64_sha512, s);
-break;
-case 1: /* SM4E */
-feature = dc_isar_feature(aa64_sm4, s);
-break;
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!feature) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-switch (opcode) {
-case 0: /* SHA512SU0 */
-gen_gvec_op2_ool(s, true, rd, rn, 0, gen_helper_crypto_sha512su0);
-break;
-case 1: /* SM4E */
-gen_gvec_op3_ool(s, true, rd, rd, rn, 0, gen_helper_crypto_sm4e);
-break;
-default:
-g_assert_not_reached();
-}
-}
-
 /* Crypto four-register
  *  31   23 22 21 20  16 15  14  10 95 40
  * +---+-+--+---+--+--+--+
@@ -13750,7 +13707,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
 { 0x5e000400, 0xdfe08400, disas_simd_scalar_copy },
 { 0x5f00, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
 { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
-{ 0xcec08000, 0xf000, disas_crypto_two_reg_sha512 },
 { 0xce00, 0xff808000, disas_crypto_four_reg },
 { 0xce80, 0xffe0, disas_crypto_xar },
 { 0xce408000, 0xffe0c000, disas_crypto_three_reg_imm2 },
-- 
2.34.1

[PULL 31/42] target/arm: Convert FCMEQ, FCMGE, FCMGT, FACGE, FACGT to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-26-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/helper.h|   5 +
 target/arm/tcg/a64.decode  |  30 ++
 target/arm/tcg/translate-a64.c | 188 +++--
 target/arm/tcg/vec_helper.c|  30 ++
 4 files changed, 174 insertions(+), 79 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index e021c185178..8d076011c18 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -727,18 +727,23 @@ DEF_HELPER_FLAGS_5(gvec_fabd_s, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fceq_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fceq_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fceq_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fcge_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fcge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcge_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_fcgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fcgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_fcgt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(gvec_facge_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_facge_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_facge_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
 DEF_HELPER_FLAGS_5(gvec_facgt_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 DEF_HELPER_FLAGS_5(gvec_facgt_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_5(gvec_facgt_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, 
i32)
 
 DEF_HELPER_FLAGS_5(gvec_fmax_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(gvec_fmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 11527bb5e5e..7fc3277be67 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -713,6 +713,21 @@ FMINNM_s0001 1110 ..1 . 0111 10 . . 
@rrr_hsd
 FMULX_s 0101 1110 010 . 00011 1 . . @rrr_h
 FMULX_s 0101 1110 0.1 . 11011 1 . . @rrr_sd
 
+FCMEQ_s 0101 1110 010 . 00100 1 . . @rrr_h
+FCMEQ_s 0101 1110 0.1 . 11100 1 . . @rrr_sd
+
+FCMGE_s 0111 1110 010 . 00100 1 . . @rrr_h
+FCMGE_s 0111 1110 0.1 . 11100 1 . . @rrr_sd
+
+FCMGT_s 0111 1110 110 . 00100 1 . . @rrr_h
+FCMGT_s 0111 1110 1.1 . 11100 1 . . @rrr_sd
+
+FACGE_s 0111 1110 010 . 00101 1 . . @rrr_h
+FACGE_s 0111 1110 0.1 . 11101 1 . . @rrr_sd
+
+FACGT_s 0111 1110 110 . 00101 1 . . @rrr_h
+FACGT_s 0111 1110 1.1 . 11101 1 . . @rrr_sd
+
 ### Advanced SIMD three same
 
 FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
@@ -748,6 +763,21 @@ FMLA_v  0.00 1110 0.1 . 11001 1 . . 
@qrrr_sd
 FMLS_v  0.00 1110 110 . 1 1 . . @qrrr_h
 FMLS_v  0.00 1110 1.1 . 11001 1 . . @qrrr_sd
 
+FCMEQ_v 0.00 1110 010 . 00100 1 . . @qrrr_h
+FCMEQ_v 0.00 1110 0.1 . 11100 1 . . @qrrr_sd
+
+FCMGE_v 0.10 1110 010 . 00100 1 . . @qrrr_h
+FCMGE_v 0.10 1110 0.1 . 11100 1 . . @qrrr_sd
+
+FCMGT_v 0.10 1110 110 . 00100 1 . . @qrrr_h
+FCMGT_v 0.10 1110 1.1 . 11100 1 . . @qrrr_sd
+
+FACGE_v 0.10 1110 010 . 00101 1 . . @qrrr_h
+FACGE_v 0.10 1110 0.1 . 11101 1 . . @qrrr_sd
+
+FACGT_v 0.10 1110 110 . 00101 1 . . @qrrr_h
+FACGT_v 0.10 1110 1.1 . 11101 1 . . @qrrr_sd
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index f84c12378dc..75b0c1a005e 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4975,6 +4975,41 @@ static const FPScalar f_scalar_fnmul = {
 };
 TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
 
+static const FPScalar f_scalar_fcmeq = {
+gen_helper_advsimd_ceq_f16,
+gen_helper_neon_ceq_f32,
+gen_helper_neon_ceq_f64,
+};
+TRANS(FCMEQ_s, do_fp3_scalar, a, &f_scalar_fcmeq)
+
+static const FPScalar f_scalar_fcmge = {
+gen_helper_advsimd_cge_f16,
+gen_helper_neon_cge_f32,
+gen_helper_neon_cge_f64,
+};
+TRANS(FCMGE_s, do_fp3_scalar, a, &f_scalar_fcmge)
+
+static const FPScalar f_scalar_fcmgt = {
+gen_helper_advsimd_cgt_f16,

[PULL 17/42] target/arm: Convert Cryptographic 2-register SHA to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-12-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |  6 
 target/arm/tcg/translate-a64.c | 54 +++---
 2 files changed, 10 insertions(+), 50 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 7590659ee68..350afabc779 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -614,3 +614,9 @@ SHA1SU0 0101 1110 000 . 001100 . .  
@rrr_q1e0
 SHA256H 0101 1110 000 . 01 . .  @rrr_q1e0
 SHA256H20101 1110 000 . 010100 . .  @rrr_q1e0
 SHA256SU1   0101 1110 000 . 011000 . .  @rrr_q1e0
+
+### Cryptographic two-register SHA
+
+SHA1H   0101 1110 0010 1000  10 . . @rr_q1e0
+SHA1SU1 0101 1110 0010 1000 0001 10 . . @rr_q1e0
+SHA256SU0   0101 1110 0010 1000 0010 10 . . @rr_q1e0
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5bef39d4e7d..1d20bf0c35b 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4606,6 +4606,10 @@ TRANS_FEAT(SHA256H, aa64_sha256, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha256
 TRANS_FEAT(SHA256H2, aa64_sha256, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha256h2)
 TRANS_FEAT(SHA256SU1, aa64_sha256, do_gvec_op3_ool, a, 0, 
gen_helper_crypto_sha256su1)
 
+TRANS_FEAT(SHA1H, aa64_sha1, do_gvec_op2_ool, a, 0, gen_helper_crypto_sha1h)
+TRANS_FEAT(SHA1SU1, aa64_sha1, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha1su1)
+TRANS_FEAT(SHA256SU0, aa64_sha256, do_gvec_op2_ool, a, 0, 
gen_helper_crypto_sha256su0)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -13506,55 +13510,6 @@ static void disas_simd_indexed(DisasContext *s, 
uint32_t insn)
 }
 }
 
-/* Crypto two-reg SHA
- *  31 24 23  22 21   17 1612 11 10 95 40
- * +-+--+---++-+--+--+
- * | 0 1 0 1 1 1 1 0 | size | 1 0 1 0 0 | opcode | 1 0 |  Rn  |  Rd  |
- * +-+--+---++-+--+--+
- */
-static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
-{
-int size = extract32(insn, 22, 2);
-int opcode = extract32(insn, 12, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-gen_helper_gvec_2 *genfn;
-bool feature;
-
-if (size != 0) {
-unallocated_encoding(s);
-return;
-}
-
-switch (opcode) {
-case 0: /* SHA1H */
-feature = dc_isar_feature(aa64_sha1, s);
-genfn = gen_helper_crypto_sha1h;
-break;
-case 1: /* SHA1SU1 */
-feature = dc_isar_feature(aa64_sha1, s);
-genfn = gen_helper_crypto_sha1su1;
-break;
-case 2: /* SHA256SU0 */
-feature = dc_isar_feature(aa64_sha256, s);
-genfn = gen_helper_crypto_sha256su0;
-break;
-default:
-unallocated_encoding(s);
-return;
-}
-
-if (!feature) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-gen_gvec_op2_ool(s, true, rd, rn, 0, genfn);
-}
-
 /* Crypto three-reg SHA512
  *  31   21 20  16 15  14  13 12  11  10  95 40
  * +---+--+---+---+-++--+--+
@@ -13849,7 +13804,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
 { 0x5e000400, 0xdfe08400, disas_simd_scalar_copy },
 { 0x5f00, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
 { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
-{ 0x5e280800, 0xff3e0c00, disas_crypto_two_reg_sha },
 { 0xce608000, 0xffe0b000, disas_crypto_three_reg_sha512 },
 { 0xcec08000, 0xf000, disas_crypto_two_reg_sha512 },
 { 0xce00, 0xff808000, disas_crypto_four_reg },
-- 
2.34.1

[PULL 08/42] docs/system: Remove ADC from raspi documentation

2024-05-28 Thread Peter Maydell

From: Rayhan Faizel 

None of the RPi boards have ADC on-board. In real life, an external ADC chip
is required to operate on analog signals.

Signed-off-by: Rayhan Faizel 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240512085716.222326-1-rayhan.fai...@gmail.com
Signed-off-by: Peter Maydell 
---
 docs/system/arm/raspi.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/system/arm/raspi.rst b/docs/system/arm/raspi.rst
index fbec1da6a1e..44eec3f1c33 100644
--- a/docs/system/arm/raspi.rst
+++ b/docs/system/arm/raspi.rst
@@ -40,7 +40,6 @@ Implemented devices
 Missing devices
 ---
 
- * Analog to Digital Converter (ADC)
  * Pulse Width Modulation (PWM)
  * PCIE Root Port (raspi4b)
  * GENET Ethernet Controller (raspi4b)
-- 
2.34.1

[PULL 29/42] target/arm: Convert FNMUL to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

This is the last instruction within disas_fp_2src,
so remove that and its subroutines.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-24-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |   1 +
 target/arm/tcg/translate-a64.c | 177 +
 2 files changed, 27 insertions(+), 151 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index e2678d919e5..cde4b86303d 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -703,6 +703,7 @@ FADD_s  0001 1110 ..1 . 0010 10 . . 
@rrr_hsd
 FSUB_s  0001 1110 ..1 . 0011 10 . . @rrr_hsd
 FDIV_s  0001 1110 ..1 . 0001 10 . . @rrr_hsd
 FMUL_s  0001 1110 ..1 .  10 . . @rrr_hsd
+FNMUL_s 0001 1110 ..1 . 1000 10 . . @rrr_hsd
 
 FMAX_s  0001 1110 ..1 . 0100 10 . . @rrr_hsd
 FMIN_s  0001 1110 ..1 . 0101 10 . . @rrr_hsd
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 878f83298f5..5ba30ba7c86 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4950,6 +4950,31 @@ static const FPScalar f_scalar_fmulx = {
 };
 TRANS(FMULX_s, do_fp3_scalar, a, &f_scalar_fmulx)
 
+static void gen_fnmul_h(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+gen_helper_vfp_mulh(d, n, m, s);
+gen_vfp_negh(d, d);
+}
+
+static void gen_fnmul_s(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_ptr s)
+{
+gen_helper_vfp_muls(d, n, m, s);
+gen_vfp_negs(d, d);
+}
+
+static void gen_fnmul_d(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_ptr s)
+{
+gen_helper_vfp_muld(d, n, m, s);
+gen_vfp_negd(d, d);
+}
+
+static const FPScalar f_scalar_fnmul = {
+gen_fnmul_h,
+gen_fnmul_s,
+gen_fnmul_d,
+};
+TRANS(FNMUL_s, do_fp3_scalar, a, &f_scalar_fnmul)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -6932,156 +6957,6 @@ static void disas_fp_1src(DisasContext *s, uint32_t 
insn)
 }
 }
 
-/* Floating-point data-processing (2 source) - single precision */
-static void handle_fp_2src_single(DisasContext *s, int opcode,
-  int rd, int rn, int rm)
-{
-TCGv_i32 tcg_op1;
-TCGv_i32 tcg_op2;
-TCGv_i32 tcg_res;
-TCGv_ptr fpst;
-
-tcg_res = tcg_temp_new_i32();
-fpst = fpstatus_ptr(FPST_FPCR);
-tcg_op1 = read_fp_sreg(s, rn);
-tcg_op2 = read_fp_sreg(s, rm);
-
-switch (opcode) {
-case 0x8: /* FNMUL */
-gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_vfp_negs(tcg_res, tcg_res);
-break;
-default:
-case 0x0: /* FMUL */
-case 0x1: /* FDIV */
-case 0x2: /* FADD */
-case 0x3: /* FSUB */
-case 0x4: /* FMAX */
-case 0x5: /* FMIN */
-case 0x6: /* FMAXNM */
-case 0x7: /* FMINNM */
-g_assert_not_reached();
-}
-
-write_fp_sreg(s, rd, tcg_res);
-}
-
-/* Floating-point data-processing (2 source) - double precision */
-static void handle_fp_2src_double(DisasContext *s, int opcode,
-  int rd, int rn, int rm)
-{
-TCGv_i64 tcg_op1;
-TCGv_i64 tcg_op2;
-TCGv_i64 tcg_res;
-TCGv_ptr fpst;
-
-tcg_res = tcg_temp_new_i64();
-fpst = fpstatus_ptr(FPST_FPCR);
-tcg_op1 = read_fp_dreg(s, rn);
-tcg_op2 = read_fp_dreg(s, rm);
-
-switch (opcode) {
-case 0x8: /* FNMUL */
-gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_vfp_negd(tcg_res, tcg_res);
-break;
-default:
-case 0x0: /* FMUL */
-case 0x1: /* FDIV */
-case 0x2: /* FADD */
-case 0x3: /* FSUB */
-case 0x4: /* FMAX */
-case 0x5: /* FMIN */
-case 0x6: /* FMAXNM */
-case 0x7: /* FMINNM */
-g_assert_not_reached();
-}
-
-write_fp_dreg(s, rd, tcg_res);
-}
-
-/* Floating-point data-processing (2 source) - half precision */
-static void handle_fp_2src_half(DisasContext *s, int opcode,
-int rd, int rn, int rm)
-{
-TCGv_i32 tcg_op1;
-TCGv_i32 tcg_op2;
-TCGv_i32 tcg_res;
-TCGv_ptr fpst;
-
-tcg_res = tcg_temp_new_i32();
-fpst = fpstatus_ptr(FPST_FPCR_F16);
-tcg_op1 = read_fp_hreg(s, rn);
-tcg_op2 = read_fp_hreg(s, rm);
-
-switch (opcode) {
-case 0x8: /* FNMUL */
-gen_helper_advsimd_mulh(tcg_res, tcg_op1, tcg_op2, fpst);
-gen_vfp_negh(tcg_res, tcg_res);
-break;
-default:
-case 0x0: /* FMUL */
-case 0x1: /* FDIV */
-case 0x2: /* FADD */
-case 0x3: /* FSUB */
-case 0x4: /* FMAX */
-case 0x5: /* FMIN */
-case 0x6: /* FMAXNM */
-case 0x7: /* FMINNM */
-g_assert_not_reached();
-}
-
-write_fp_sreg(s, rd, tcg_res);
-}
-
-/* Floating point data-processing (2 source)
- *   31

[PULL 21/42] target/arm: Convert Cryptographic 3-register, imm2 to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-16-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  | 10 
 target/arm/tcg/translate-a64.c | 43 ++
 2 files changed, 22 insertions(+), 31 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index ef6902e86a5..1292312a7f9 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -644,3 +644,13 @@ SM4E1100 1110 110 0 11 . .  
@r2r_q1e0
 EOR31100 1110 000 . 0 . . . @_q1e3
 BCAX1100 1110 001 . 0 . . . @_q1e3
 SM3SS1  1100 1110 010 . 0 . . . @_q1e3
+
+### Cryptographic three-register, imm2
+
+&crypto3i   rd rn rm imm
+@crypto3i    ... rm:5 .. imm:2 .. rn:5 rd:5 &crypto3i
+
+SM3TT1A 11001110 010 . 10 .. 00 . . @crypto3i
+SM3TT1B 11001110 010 . 10 .. 01 . . @crypto3i
+SM3TT2A 11001110 010 . 10 .. 10 . . @crypto3i
+SM3TT2B 11001110 010 . 10 .. 11 . . @crypto3i
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 2951e7eb59e..cf3a7dfa99f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4676,6 +4676,18 @@ static bool trans_SM3SS1(DisasContext *s, arg_SM3SS1 *a)
 return true;
 }
 
+static bool do_crypto3i(DisasContext *s, arg_crypto3i *a, gen_helper_gvec_3 
*fn)
+{
+if (fp_access_check(s)) {
+gen_gvec_op3_ool(s, true, a->rd, a->rn, a->rm, a->imm, fn);
+}
+return true;
+}
+TRANS_FEAT(SM3TT1A, aa64_sm3, do_crypto3i, a, gen_helper_crypto_sm3tt1a)
+TRANS_FEAT(SM3TT1B, aa64_sm3, do_crypto3i, a, gen_helper_crypto_sm3tt1b)
+TRANS_FEAT(SM3TT2A, aa64_sm3, do_crypto3i, a, gen_helper_crypto_sm3tt2a)
+TRANS_FEAT(SM3TT2B, aa64_sm3, do_crypto3i, a, gen_helper_crypto_sm3tt2b)
+
 /* Shift a TCGv src by TCGv shift_amount, put result in dst.
  * Note that it is the caller's responsibility to ensure that the
  * shift amount is in range (ie 0..31 or 0..63) and provide the ARM
@@ -13604,36 +13616,6 @@ static void disas_crypto_xar(DisasContext *s, uint32_t 
insn)
  vec_full_reg_size(s));
 }
 
-/* Crypto three-reg imm2
- *  31   21 20  16 15  14 13 12  11  10  95 40
- * +---+--+-+--++--+--+
- * | 1 1 0 0 1 1 1 0 0 1 0 |  Rm  | 1 0 | imm2 | opcode |  Rn  |  Rd  |
- * +---+--+-+--++--+--+
- */
-static void disas_crypto_three_reg_imm2(DisasContext *s, uint32_t insn)
-{
-static gen_helper_gvec_3 * const fns[4] = {
-gen_helper_crypto_sm3tt1a, gen_helper_crypto_sm3tt1b,
-gen_helper_crypto_sm3tt2a, gen_helper_crypto_sm3tt2b,
-};
-int opcode = extract32(insn, 10, 2);
-int imm2 = extract32(insn, 12, 2);
-int rm = extract32(insn, 16, 5);
-int rn = extract32(insn, 5, 5);
-int rd = extract32(insn, 0, 5);
-
-if (!dc_isar_feature(aa64_sm3, s)) {
-unallocated_encoding(s);
-return;
-}
-
-if (!fp_access_check(s)) {
-return;
-}
-
-gen_gvec_op3_ool(s, true, rd, rn, rm, imm2, fns[opcode]);
-}
-
 /* C3.6 Data processing - SIMD, inc Crypto
  *
  * As the decode gets a little complex we are using a table based
@@ -13663,7 +13645,6 @@ static const AArch64DecodeTable data_proc_simd[] = {
 { 0x5f00, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
 { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
 { 0xce80, 0xffe0, disas_crypto_xar },
-{ 0xce408000, 0xffe0c000, disas_crypto_three_reg_imm2 },
 { 0x0e400400, 0x9f60c400, disas_simd_three_reg_same_fp16 },
 { 0x0e780800, 0x8f7e0c00, disas_simd_two_reg_misc_fp16 },
 { 0x5e400400, 0xdf60c400, disas_simd_scalar_three_reg_same_fp16 },
-- 
2.34.1

[PULL 33/42] target/arm: Convert FRECPS, FRSQRTS to decodetree

2024-05-28 Thread Peter Maydell

From: Richard Henderson 

These are the last instructions within handle_3same_float
and disas_simd_scalar_three_reg_same_fp16 so remove them.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20240524232121.284515-28-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/tcg/a64.decode  |  12 ++
 target/arm/tcg/translate-a64.c | 293 -
 2 files changed, 46 insertions(+), 259 deletions(-)

diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index a852b5f06f0..84cb38f1dd0 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -731,6 +731,12 @@ FACGT_s 0111 1110 1.1 . 11101 1 . . 
@rrr_sd
 FABD_s  0111 1110 110 . 00010 1 . . @rrr_h
 FABD_s  0111 1110 1.1 . 11010 1 . . @rrr_sd
 
+FRECPS_s0101 1110 010 . 00111 1 . . @rrr_h
+FRECPS_s0101 1110 0.1 . 1 1 . . @rrr_sd
+
+FRSQRTS_s   0101 1110 110 . 00111 1 . . @rrr_h
+FRSQRTS_s   0101 1110 1.1 . 1 1 . . @rrr_sd
+
 ### Advanced SIMD three same
 
 FADD_v  0.00 1110 010 . 00010 1 . . @qrrr_h
@@ -784,6 +790,12 @@ FACGT_v 0.10 1110 1.1 . 11101 1 . . 
@qrrr_sd
 FABD_v  0.10 1110 110 . 00010 1 . . @qrrr_h
 FABD_v  0.10 1110 1.1 . 11010 1 . . @qrrr_sd
 
+FRECPS_v0.00 1110 010 . 00111 1 . . @qrrr_h
+FRECPS_v0.00 1110 0.1 . 1 1 . . @qrrr_sd
+
+FRSQRTS_v   0.00 1110 110 . 00111 1 . . @qrrr_h
+FRSQRTS_v   0.00 1110 1.1 . 1 1 . . @qrrr_sd
+
 ### Advanced SIMD scalar x indexed element
 
 FMUL_si 0101  00 ..  1001 . 0 . .   @rrx_h
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 633384d2a56..a7537a5104f 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -5035,6 +5035,20 @@ static const FPScalar f_scalar_fabd = {
 };
 TRANS(FABD_s, do_fp3_scalar, a, &f_scalar_fabd)
 
+static const FPScalar f_scalar_frecps = {
+gen_helper_recpsf_f16,
+gen_helper_recpsf_f32,
+gen_helper_recpsf_f64,
+};
+TRANS(FRECPS_s, do_fp3_scalar, a, &f_scalar_frecps)
+
+static const FPScalar f_scalar_frsqrts = {
+gen_helper_rsqrtsf_f16,
+gen_helper_rsqrtsf_f32,
+gen_helper_rsqrtsf_f64,
+};
+TRANS(FRSQRTS_s, do_fp3_scalar, a, &f_scalar_frsqrts)
+
 static bool do_fp3_vector(DisasContext *s, arg_qrrr_e *a,
   gen_helper_gvec_3_ptr * const fns[3])
 {
@@ -5182,6 +5196,20 @@ static gen_helper_gvec_3_ptr * const f_vector_fabd[3] = {
 };
 TRANS(FABD_v, do_fp3_vector, a, f_vector_fabd)
 
+static gen_helper_gvec_3_ptr * const f_vector_frecps[3] = {
+gen_helper_gvec_recps_h,
+gen_helper_gvec_recps_s,
+gen_helper_gvec_recps_d,
+};
+TRANS(FRECPS_v, do_fp3_vector, a, f_vector_frecps)
+
+static gen_helper_gvec_3_ptr * const f_vector_frsqrts[3] = {
+gen_helper_gvec_rsqrts_h,
+gen_helper_gvec_rsqrts_s,
+gen_helper_gvec_rsqrts_d,
+};
+TRANS(FRSQRTS_v, do_fp3_vector, a, f_vector_frsqrts)
+
 /*
  * Advanced SIMD scalar/vector x indexed element
  */
@@ -9308,107 +9336,6 @@ static void handle_3same_64(DisasContext *s, int 
opcode, bool u,
 }
 }
 
-/* Handle the 3-same-operands float operations; shared by the scalar
- * and vector encodings. The caller must filter out any encodings
- * not allocated for the encoding it is dealing with.
- */
-static void handle_3same_float(DisasContext *s, int size, int elements,
-   int fpopcode, int rd, int rn, int rm)
-{
-int pass;
-TCGv_ptr fpst = fpstatus_ptr(FPST_FPCR);
-
-for (pass = 0; pass < elements; pass++) {
-if (size) {
-/* Double */
-TCGv_i64 tcg_op1 = tcg_temp_new_i64();
-TCGv_i64 tcg_op2 = tcg_temp_new_i64();
-TCGv_i64 tcg_res = tcg_temp_new_i64();
-
-read_vec_element(s, tcg_op1, rn, pass, MO_64);
-read_vec_element(s, tcg_op2, rm, pass, MO_64);
-
-switch (fpopcode) {
-case 0x1f: /* FRECPS */
-gen_helper_recpsf_f64(tcg_res, tcg_op1, tcg_op2, fpst);
-break;
-case 0x3f: /* FRSQRTS */
-gen_helper_rsqrtsf_f64(tcg_res, tcg_op1, tcg_op2, fpst);
-break;
-default:
-case 0x18: /* FMAXNM */
-case 0x19: /* FMLA */
-case 0x1a: /* FADD */
-case 0x1b: /* FMULX */
-case 0x1c: /* FCMEQ */
-case 0x1e: /* FMAX */
-case 0x38: /* FMINNM */
-case 0x39: /* FMLS */
-case 0x3a: /* FSUB */
-case 0x3e: /* FMIN */
-case 0x5b: /* FMUL */
-case 0x5c: /* FCMGE */
-case 0x5d: /* FACGE */
-case 0x5f: /* FDIV */
-case 0x7a: /* FABD */
-

Re: hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-28 Thread Peter Maydell

On Mon, 20 May 2024 at 23:24, Cord Amfmgm  wrote:
> On Mon, May 20, 2024 at 12:05 PM Peter Maydell  
> wrote:
>> For the "zero length buffer" case, do you have a more detailed
>> pointer to the bit of the spec that says that "cbp = be + 1" is a
>> valid way to specify a zero length buffer? Table 4-2 in the copy I
>> have says for CurrentBufferPointer "a value of 0 indicates
>> a zero-length data packet or that all bytes have been transferred",
>> and the sample host OS driver function QueueGeneralRequest()
>> later in the spec handles a 0 length packet by setting
>>   TD->HcTD.CBP = TD->HcTD.BE = 0;
>> (which our emulation's code does handle).
>
>
> Reading the spec carefully, a CBP set to 0 should always mean the zero-length 
> buffer case (or that all bytes have been transferred, so the buffer is now 
> zero-length - the same thing).

Yeah, I can see the argument for "we should treat all cbp == 0 as
zero-length buffer, not just cbp == be == 0".

> Table 4-2 is the correct reference, and this part is clear. It's the part you 
> quoted. "Contains the physical address of the next memory location that will 
> be accessed for transfer to/from the endpoint. A value of 0 indicates a 
> zero-length data packet or that all bytes have been transferred."
>
> Table 4-2 has this additional nugget that may be confusingly worded, for 
> BufferEnd: "Contains physical address of the last byte in the buffer for this 
> TD"

Mmm, but for a zero length buffer there is no "last byte" to
have this be the physical address of.

> And here's an example buffer of length 0 -- you probably already know what 
> I'm going to do here:
>
> char buf[0];
> char * CurrentBufferPointer = &buf[0];
> char * BufferEnd = &buf[-1]; // "address of the last byte in the buffer"
> // The OHCI Host Controller than advances CurrentBufferPointer like this: 
> CurrentBufferPointer += 0
> // After the transfer:
> // CurrentBufferPointer = &buf[0];
> // BufferEnd = &buf[-1];

Right, but why do you think this is valid, rather than
being a guest software bug? My reading of the spec is that it's
pretty clear about how to say "zero length buffer", and this
isn't it.

Is there some real-world guest OS that programs the OHCI
controller this way that we're trying to accommodate?

thanks
-- PMM

[PULL v2 0/7] Block jobs patches for 2024-04-29

2024-05-28 Thread Vladimir Sementsov-Ogievskiy

The following changes since commit ad10b4badc1dd5b28305f9b9f1168cf0aa3ae946:

  Merge tag 'pull-error-2024-05-27' of https://repo.or.cz/qemu/armbru into 
staging (2024-05-27 06:40:42 -0700)

are available in the Git repository at:

  https://gitlab.com/vsementsov/qemu.git tags/pull-block-jobs-2024-04-29-v2

for you to fetch changes up to a149401048481247bcbaf6035a7a1308974fb464:

  iotests/pylintrc: allow up to 10 similar lines (2024-05-28 15:52:15 +0300)


Block jobs patches for 2024-04-29

v2: add "iotests/pylintrc: allow up to 10 similar lines" to fix
check-python-minreqs

- backup: discard-source parameter
- blockcommit: Reopen base image as RO after abort


Alexander Ivanov (1):
  blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (6):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source
  iotests/pylintrc: allow up to 10 similar lines

 block/backup.c |   5 +-
 block/block-copy.c |  12 ++-
 block/copy-before-write.c  |  39 +++--
 block/copy-before-write.h  |   1 +
 block/mirror.c |  11 ++-
 block/replication.c|   4 +-
 blockdev.c |   2 +-
 include/block/block-common.h   |   2 +
 include/block/block-copy.h |   2 +
 include/block/block_int-global-state.h |   2 +-
 qapi/block-core.json   |   4 +
 tests/qemu-iotests/257.out | 112 
+-
 tests/qemu-iotests/pylintrc|   2 +-
 tests/qemu-iotests/tests/backup-discard-source | 152 

 tests/qemu-iotests/tests/backup-discard-source.out |   5 ++
 15 files changed, 282 insertions(+), 73 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out


Alexander Ivanov (1):
  blockcommit: Reopen base image as RO after abort

Vladimir Sementsov-Ogievskiy (6):
  block/copy-before-write: fix permission
  block/copy-before-write: support unligned snapshot-discard
  block/copy-before-write: create block_copy bitmap in filter node
  qapi: blockdev-backup: add discard-source parameter
  iotests: add backup-discard-source
  iotests/pylintrc: allow up to 10 similar lines

 block/backup.c|   5 +-
 block/block-copy.c|  12 +-
 block/copy-before-write.c |  39 -
 block/copy-before-write.h |   1 +
 block/mirror.c|  11 +-
 block/replication.c   |   4 +-
 blockdev.c|   2 +-
 include/block/block-common.h  |   2 +
 include/block/block-copy.h|   2 +
 include/block/block_int-global-state.h|   2 +-
 qapi/block-core.json  |   4 +
 tests/qemu-iotests/257.out| 112 ++---
 tests/qemu-iotests/pylintrc   |   2 +-
 .../qemu-iotests/tests/backup-discard-source  | 152 ++
 .../tests/backup-discard-source.out   |   5 +
 15 files changed, 282 insertions(+), 73 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/backup-discard-source
 create mode 100644 tests/qemu-iotests/tests/backup-discard-source.out

-- 
2.34.1

[PULL v2 7/7] iotests/pylintrc: allow up to 10 similar lines

2024-05-28 Thread Vladimir Sementsov-Ogievskiy

We want to have similar QMP objects in different tests. Reworking these
objects to make common parts by calling some helper functions doesn't
seem good. It's a lot more comfortable to see the whole QAPI request in
one place.

So, let's increase the limit, to unblock further commit
"iotests: add backup-discard-source"

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/pylintrc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
index de2e0c2781..05b75ee59b 100644
--- a/tests/qemu-iotests/pylintrc
+++ b/tests/qemu-iotests/pylintrc
@@ -55,4 +55,4 @@ max-line-length=79
 
 [SIMILARITIES]
 
-min-similarity-lines=6
+min-similarity-lines=10
-- 
2.34.1

Re: [PATCH 3/4] usb/ohci-pci: deprecate, don't build by default

2024-05-28 Thread Paolo Bonzini

On Tue, May 28, 2024 at 12:35 PM Thomas Huth  wrote:
> > diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig
> > index 84bc7fbe36cd..c4a6ea5a687f 100644
> > --- a/hw/usb/Kconfig
> > +++ b/hw/usb/Kconfig
> > @@ -17,7 +17,6 @@ config USB_OHCI_SYSBUS
> >
> >   config USB_OHCI_PCI
> >   bool
> > -default y if PCI_DEVICES
> >   depends on PCI
> >   select USB_OHCI
>
> Not sure whether we should disable it by default just because it is
> deprecated. We don't do that for any other devices as far as I know.
> Anyway, you should add the device to docs/about/deprecated.rst to really
> mark it as deprecated, since that's our official list (AFAIK).

That would mean removing it, but that's a bad idea. It's not like the
device is blocking improvements elsewhere (in fact it's not even
removing any code because the sysbus OHCI is still there).

> Also, there are still some machines that use this device:
>
> $ grep -r USB_OHCI_PCI *
> hw/hppa/Kconfig:imply USB_OHCI_PCI
> hw/mips/Kconfig:imply USB_OHCI_PCI
> hw/ppc/Kconfig:imply USB_OHCI_PCI
> hw/ppc/Kconfig:imply USB_OHCI_PCI
>
> pseries could certainly continue without OHCI AFAICT, but the others?

Yeah, this needs to be a per-machine type choice to warn about
discouraged devices. Some, such as Cirrus, can probably be
unconditional, but still I wouldn't remove them.


Paolo

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Fabiano Rosas

Nicholas Piggin  writes:

> There is no need to use /dev/shm for file-backed memory devices, and
> it is too small to be usable in gitlab CI. Switch to using a regular
> file in /tmp/ which will usually have more space available.
>
> Signed-off-by: Nicholas Piggin 
> ---
> Am I missing something? AFAIKS there is not even any point using
> /dev/shm aka tmpfs anyway, there is not much special about it as a
> filesystem. This applies on top of the series just sent, and passes
> gitlab CI qtests including aarch64.

/dev/shm however will be mounted on tmpfs while /tmp may not. I don't
know if this has any implication to this test. Probably not.

>
> Thanks,
> Nick
>
>  tests/qtest/migration-test.c | 41 
>  1 file changed, 13 insertions(+), 28 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 45830eb213..86eace354e 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -552,7 +552,7 @@ typedef struct {
>   * unconditionally, because it means the user would like to be verbose.
>   */
>  bool hide_stderr;
> -bool use_shmem;
> +bool use_memfile;
>  /* only launch the target process */
>  bool only_target;
>  /* Use dirty ring if true; dirty logging otherwise */
> @@ -672,29 +672,14 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  g_autofree gchar *cmd_source = NULL;
>  g_autofree gchar *cmd_target = NULL;
>  const gchar *ignore_stderr;
> -g_autofree char *shmem_opts = NULL;
> -g_autofree char *shmem_path = NULL;
> +g_autofree char *memfile_opts = NULL;
> +g_autofree char *memfile_path = NULL;
>  const char *kvm_opts = NULL;
>  const char *arch = qtest_get_arch();
>  const char *memory_size;
>  const char *machine_alias, *machine_opts = "";
>  g_autofree char *machine = NULL;
>  
> -if (args->use_shmem) {
> -if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
> -g_test_skip("/dev/shm is not supported");
> -return -1;
> -}
> -if (getenv("GITLAB_CI")) {
> -/*
> - * Gitlab runners are limited to 64MB shm size. See:
> - * https://lore.kernel.org/all/87ttq5fvh7@suse.de/
> - */
> -g_test_skip("/dev/shm is not supported in Gitlab CI 
> environment");
> -return -1;
> -}
> -}
> -
>  dst_state = (QTestMigrationState) { };
>  src_state = (QTestMigrationState) { };
>  bootfile_create(tmpfs, args->suspend_me);
> @@ -754,12 +739,12 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  ignore_stderr = "";
>  }
>  
> -if (args->use_shmem) {
> -shmem_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
> -shmem_opts = g_strdup_printf(
> +if (args->use_memfile) {
> +memfile_path = g_strdup_printf("/%s/qemu-%d", tmpfs, getpid());

The variable tmpfs already contains the leading slash. Strictly speaking
we don't need the pid because 'tmpfs' is unique for each migration-test
run. If you use a fixed string such as qemu-mem, you can then clean it
up at test_migrate_end() along with the others.

> +memfile_opts = g_strdup_printf(
>  "-object memory-backend-file,id=mem0,size=%s"
>  ",mem-path=%s,share=on -numa node,memdev=mem0",
> -memory_size, shmem_path);
> +memory_size, memfile_path);
>  }
>  
>  if (args->use_dirty_ring) {
> @@ -788,7 +773,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   memory_size, tmpfs,
>   arch_opts ? arch_opts : "",
>   arch_source ? arch_source : "",
> - shmem_opts ? shmem_opts : "",
> + memfile_opts ? memfile_opts : "",
>   args->opts_source ? args->opts_source : "",
>   ignore_stderr);
>  if (!args->only_target) {
> @@ -810,7 +795,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   memory_size, tmpfs, uri,
>   arch_opts ? arch_opts : "",
>   arch_target ? arch_target : "",
> - shmem_opts ? shmem_opts : "",
> + memfile_opts ? memfile_opts : "",
>   args->opts_target ? args->opts_target : "",
>   ignore_stderr);
>  *to = qtest_init_with_env(QEMU_ENV_DST, cmd_target);
> @@ -822,8 +807,8 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   * Remove shmem file immediately to avoid memory leak in test failed 
> case.
>   * It's valid because QEMU has already opened this file
>   */

I'm n

Re: [PATCH RESEND v2 1/3] target/riscv/kvm: add software breakpoints support

2024-05-28 Thread Andrew Jones

On Tue, May 28, 2024 at 08:07:57AM GMT, Chao Du wrote:
> This patch implements insert/remove software breakpoint process.
> 
> For RISC-V, GDB treats single-step similarly to breakpoint: add a
> breakpoint at the next step address, then continue. So this also
> works for single-step debugging.
> 
> Implement kvm_arch_update_guest_debug(): Set the control flag
> when there are active breakpoints. This will help KVM to know
> the status in the userspace.
> 
> Add some stubs which are necessary for building, and will be
> implemented later.
> 
> Signed-off-by: Chao Du 
> ---
>  target/riscv/kvm/kvm-cpu.c | 69 ++
>  1 file changed, 69 insertions(+)
> 
> diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
> index 235e2cdaca..c50f058aff 100644
> --- a/target/riscv/kvm/kvm-cpu.c
> +++ b/target/riscv/kvm/kvm-cpu.c
> @@ -1969,3 +1969,72 @@ static const TypeInfo riscv_kvm_cpu_type_infos[] = {
>  };
> 
>  DEFINE_TYPES(riscv_kvm_cpu_type_infos)
> +
> +static const uint32_t ebreak_insn = 0x00100073;
> +static const uint16_t c_ebreak_insn = 0x9002;
> +
> +int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
> +{
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 2, 0)) {
> +return -EINVAL;
> +}
> +
> +if ((bp->saved_insn & 0x3) == 0x3) {
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 4, 0)
> +|| cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&ebreak_insn, 4, 
> 1)) {
> +return -EINVAL;
> +}
> +} else {
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&c_ebreak_insn, 2, 
> 1)) {
> +return -EINVAL;
> +}
> +}
> +
> +return 0;
> +}
> +
> +int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
> +{
> +uint32_t ebreak;
> +uint16_t c_ebreak;
> +
> +if ((bp->saved_insn & 0x3) == 0x3) {
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&ebreak, 4, 0) ||
> +ebreak != ebreak_insn ||
> +cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 4, 
> 1)) {
> +return -EINVAL;
> +}
> +} else {
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&c_ebreak, 2, 0) ||
> +c_ebreak != c_ebreak_insn ||
> +cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 2, 
> 1)) {
> +return -EINVAL;
> +}
> +}
> +
> +return 0;
> +}
> +
> +int kvm_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type)
> +{
> +/* TODO; To be implemented later. */
> +return -EINVAL;
> +}
> +
> +int kvm_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type)
> +{
> +/* TODO; To be implemented later. */
> +return -EINVAL;
> +}
> +
> +void kvm_arch_remove_all_hw_breakpoints(void)
> +{
> +/* TODO; To be implemented later. */
> +}
> +
> +void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
> +{
> +if (kvm_sw_breakpoints_active(cs)) {
> +dbg->control |= KVM_GUESTDBG_ENABLE;
> +}
> +}
> --
> 2.17.1
>

Reviewed-by: Andrew Jones

RE: [PATCH v6 2/7] migration/multifd: put IOV initialization into compression method

2024-05-28 Thread Liu, Yuan1

> -Original Message-
> From: Peter Xu 
> Sent: Tuesday, May 28, 2024 4:51 AM
> To: Liu, Yuan1 
> Cc: faro...@suse.de; qemu-devel@nongnu.org; Zou, Nanhai
> 
> Subject: Re: [PATCH v6 2/7] migration/multifd: put IOV initialization into
> compression method
> 
> On Mon, May 06, 2024 at 12:57:46AM +0800, Yuan Liu wrote:
> > Different compression methods may require different numbers of IOVs.
> > Based on streaming compression of zlib and zstd, all pages will be
> > compressed to a data block, so two IOVs are needed for packet header
> > and compressed data block.
> >
> > Signed-off-by: Yuan Liu 
> > Reviewed-by: Nanhai Zou 
> > ---
> >  migration/multifd-zlib.c |  7 +++
> >  migration/multifd-zstd.c |  8 +++-
> >  migration/multifd.c  | 22 --
> >  3 files changed, 26 insertions(+), 11 deletions(-)
> >
> > diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
> > index 737a9645d2..2ced69487e 100644
> > --- a/migration/multifd-zlib.c
> > +++ b/migration/multifd-zlib.c
> > @@ -70,6 +70,10 @@ static int zlib_send_setup(MultiFDSendParams *p,
> Error **errp)
> >  goto err_free_zbuff;
> >  }
> >  p->compress_data = z;
> > +
> > +/* Needs 2 IOVs, one for packet header and one for compressed data
> */
> > +p->iov = g_new0(struct iovec, 2);
> > +
> >  return 0;
> >
> >  err_free_zbuff:
> > @@ -101,6 +105,9 @@ static void zlib_send_cleanup(MultiFDSendParams *p,
> Error **errp)
> >  z->buf = NULL;
> >  g_free(p->compress_data);
> >  p->compress_data = NULL;
> > +
> > +g_free(p->iov);
> > +p->iov = NULL;
> >  }
> >
> >  /**
> > diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
> > index 256858df0a..ca17b7e310 100644
> > --- a/migration/multifd-zstd.c
> > +++ b/migration/multifd-zstd.c
> > @@ -52,7 +52,6 @@ static int zstd_send_setup(MultiFDSendParams *p, Error
> **errp)
> >  struct zstd_data *z = g_new0(struct zstd_data, 1);
> >  int res;
> >
> > -p->compress_data = z;
> >  z->zcs = ZSTD_createCStream();
> >  if (!z->zcs) {
> >  g_free(z);
> > @@ -77,6 +76,10 @@ static int zstd_send_setup(MultiFDSendParams *p,
> Error **errp)
> >  error_setg(errp, "multifd %u: out of memory for zbuff", p->id);
> >  return -1;
> >  }
> > +p->compress_data = z;
> > +
> > +/* Needs 2 IOVs, one for packet header and one for compressed data
> */
> > +p->iov = g_new0(struct iovec, 2);
> >  return 0;
> >  }
> >
> > @@ -98,6 +101,9 @@ static void zstd_send_cleanup(MultiFDSendParams *p,
> Error **errp)
> >  z->zbuff = NULL;
> >  g_free(p->compress_data);
> >  p->compress_data = NULL;
> > +
> > +g_free(p->iov);
> > +p->iov = NULL;
> >  }
> >
> >  /**
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index f317bff077..d82885fdbb 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -137,6 +137,13 @@ static int nocomp_send_setup(MultiFDSendParams *p,
> Error **errp)
> >  p->write_flags |= QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
> >  }
> >
> > +if (multifd_use_packets()) {
> > +/* We need one extra place for the packet header */
> > +p->iov = g_new0(struct iovec, p->page_count + 1);
> > +} else {
> > +p->iov = g_new0(struct iovec, p->page_count);
> > +}
> > +
> >  return 0;
> >  }
> >
> > @@ -150,6 +157,8 @@ static int nocomp_send_setup(MultiFDSendParams *p,
> Error **errp)
> >   */
> >  static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
> >  {
> > +g_free(p->iov);
> > +p->iov = NULL;
> >  return;
> >  }
> >
> > @@ -228,6 +237,7 @@ static int nocomp_send_prepare(MultiFDSendParams *p,
> Error **errp)
> >   */
> >  static int nocomp_recv_setup(MultiFDRecvParams *p, Error **errp)
> >  {
> > +p->iov = g_new0(struct iovec, p->page_count);
> >  return 0;
> >  }
> >
> > @@ -240,6 +250,8 @@ static int nocomp_recv_setup(MultiFDRecvParams *p,
> Error **errp)
> >   */
> >  static void nocomp_recv_cleanup(MultiFDRecvParams *p)
> >  {
> > +g_free(p->iov);
> > +p->iov = NULL;
> >  }
> 
> Are recv_setup()/recv_cleanup() for zstd/zlib missing?

Zstd/zlib does not request the IOVs on the receiving side.
The zstd/zlib reads the compressed data into the buffer directly

qio_channel_read_all(p->c, (void *)z->zbuff, in_size, errp);

> If the iov will be managed by the compressors, I'm wondering whether we
> should start assert(p->iov) after send|recv_setup(), and assert(!p->iov)
> after send|recv_cleanup().  That'll guard all these.

Yes, I agree with adding a check of IOV in multifd.c since different 
compressors 
may have different requirements for IOV.

We can add assert(p->iov) after send_setup, not recv_setup. The sending side 
always
uses IOV to send pages by qio_channel_writev_full_all in multifd.c, but the 
receiving
side may not require the IOV for reading pages.

It is better to add assert(!p->iov) after send|recv_cleanup()

> >
> >  /**
> > @@ -78

Re: [PATCH] tests/qtest/migrate-test: Use regular file file for shared-memory tests

2024-05-28 Thread Peter Xu

On Tue, May 28, 2024 at 02:27:57PM +1000, Nicholas Piggin wrote:
> There is no need to use /dev/shm for file-backed memory devices, and
> it is too small to be usable in gitlab CI. Switch to using a regular
> file in /tmp/ which will usually have more space available.
> 
> Signed-off-by: Nicholas Piggin 
> ---
> Am I missing something? AFAIKS there is not even any point using
> /dev/shm aka tmpfs anyway, there is not much special about it as a
> filesystem. This applies on top of the series just sent, and passes
> gitlab CI qtests including aarch64.

I think it's just that /dev/shm guarantees shmem usage, while the var
"tmpfs" implies g_dir_make_tmp() which may be another non-ram based file
system, while that'll be slightly different comparing to what a real user
would use - we don't suggest user to put guest RAM on things like btrfs.

One real implication is if we add a postcopy test it'll fail with
g_dir_make_tmp() when it is not pointing to a shmem mount, as
UFFDIO_REGISTER will fail there.  But that test doesn't yet exist as the
QEMU paths should be the same even if Linux will trigger different paths
when different types of mem is used (anonymous v.s. shmem).

If the goal here is to properly handle the case where tmpfs doesn't have
enough space, how about what I suggested in the other email?

https://lore.kernel.org/r/ZlSppKDE6wzjCF--@x1n

IOW, try populate the shmem region before starting the guest, skip if
population failed.  Would that work?

Thanks,

> 
> Thanks,
> Nick
> 
>  tests/qtest/migration-test.c | 41 
>  1 file changed, 13 insertions(+), 28 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 45830eb213..86eace354e 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -552,7 +552,7 @@ typedef struct {
>   * unconditionally, because it means the user would like to be verbose.
>   */
>  bool hide_stderr;
> -bool use_shmem;
> +bool use_memfile;
>  /* only launch the target process */
>  bool only_target;
>  /* Use dirty ring if true; dirty logging otherwise */
> @@ -672,29 +672,14 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  g_autofree gchar *cmd_source = NULL;
>  g_autofree gchar *cmd_target = NULL;
>  const gchar *ignore_stderr;
> -g_autofree char *shmem_opts = NULL;
> -g_autofree char *shmem_path = NULL;
> +g_autofree char *memfile_opts = NULL;
> +g_autofree char *memfile_path = NULL;
>  const char *kvm_opts = NULL;
>  const char *arch = qtest_get_arch();
>  const char *memory_size;
>  const char *machine_alias, *machine_opts = "";
>  g_autofree char *machine = NULL;
>  
> -if (args->use_shmem) {
> -if (!g_file_test("/dev/shm", G_FILE_TEST_IS_DIR)) {
> -g_test_skip("/dev/shm is not supported");
> -return -1;
> -}
> -if (getenv("GITLAB_CI")) {
> -/*
> - * Gitlab runners are limited to 64MB shm size. See:
> - * https://lore.kernel.org/all/87ttq5fvh7@suse.de/
> - */
> -g_test_skip("/dev/shm is not supported in Gitlab CI 
> environment");
> -return -1;
> -}
> -}
> -
>  dst_state = (QTestMigrationState) { };
>  src_state = (QTestMigrationState) { };
>  bootfile_create(tmpfs, args->suspend_me);
> @@ -754,12 +739,12 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  ignore_stderr = "";
>  }
>  
> -if (args->use_shmem) {
> -shmem_path = g_strdup_printf("/dev/shm/qemu-%d", getpid());
> -shmem_opts = g_strdup_printf(
> +if (args->use_memfile) {
> +memfile_path = g_strdup_printf("/%s/qemu-%d", tmpfs, getpid());
> +memfile_opts = g_strdup_printf(
>  "-object memory-backend-file,id=mem0,size=%s"
>  ",mem-path=%s,share=on -numa node,memdev=mem0",
> -memory_size, shmem_path);
> +memory_size, memfile_path);
>  }
>  
>  if (args->use_dirty_ring) {
> @@ -788,7 +773,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   memory_size, tmpfs,
>   arch_opts ? arch_opts : "",
>   arch_source ? arch_source : "",
> - shmem_opts ? shmem_opts : "",
> + memfile_opts ? memfile_opts : "",
>   args->opts_source ? args->opts_source : "",
>   ignore_stderr);
>  if (!args->only_target) {
> @@ -810,7 +795,7 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>   memory_size, tmpfs, uri,
>   arch_opts ? arch_opts : "",
>   arch_target ? arch_target : "",
> -

Re: [PATCH 3/4] usb/ohci-pci: deprecate, don't build by default

2024-05-28 Thread Helge Deller


On 5/28/24 12:35, Thomas Huth wrote:

On 28/05/2024 11.54, Gerd Hoffmann wrote:

The xhci host adapter is the much better choice.

Signed-off-by: Gerd Hoffmann 
---
  hw/usb/hcd-ohci-pci.c | 1 +
  hw/usb/Kconfig    | 1 -
  2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ohci-pci.c b/hw/usb/hcd-ohci-pci.c
index 33ed9b6f5a52..88de657def71 100644
--- a/hw/usb/hcd-ohci-pci.c
+++ b/hw/usb/hcd-ohci-pci.c
@@ -143,6 +143,7 @@ static void ohci_pci_class_init(ObjectClass *klass, void 
*data)
  dc->hotpluggable = false;
  dc->vmsd = &vmstate_ohci;
  dc->reset = usb_ohci_reset_pci;
+    klass->deprecation_note = "use qemu-xhci instead";
  }
  static const TypeInfo ohci_pci_info = {
diff --git a/hw/usb/Kconfig b/hw/usb/Kconfig
index 84bc7fbe36cd..c4a6ea5a687f 100644
--- a/hw/usb/Kconfig
+++ b/hw/usb/Kconfig
@@ -17,7 +17,6 @@ config USB_OHCI_SYSBUS
  config USB_OHCI_PCI
  bool
-    default y if PCI_DEVICES
  depends on PCI
  select USB_OHCI


Not sure whether we should disable it by default just because it is deprecated. 
We don't do that for any other devices as far as I know.

Anyway, you should add the device to docs/about/deprecated.rst to really mark 
it as deprecated, since that's our official list (AFAIK).

Also, there are still some machines that use this device:

$ grep -r USB_OHCI_PCI *
hw/hppa/Kconfig:    imply USB_OHCI_PCI
hw/mips/Kconfig:    imply USB_OHCI_PCI
hw/ppc/Kconfig:    imply USB_OHCI_PCI
hw/ppc/Kconfig:    imply USB_OHCI_PCI

pseries could certainly continue without OHCI AFAICT, but the others? Maybe 
this needs some discussion first... (thus putting some more people on CC:)


There was never a XHCI host on any of the hppa machines, but
the latest generation of HP machines do have built-in OHCI controllers.
So, deprecating OHCI in favor of XHCI will prevent emulation of HP-UX
on the hppa target.
So, for hppa the "xhci host adapter is NOT the much better choice.".

Helge

Re: [PATCH v2 03/67] target/arm: Reject incorrect operands to PLD, PLDW, PLI

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:25, Richard Henderson
 wrote:
>
> For all, rm == 15 is invalid.
> Prior to v8, thumb with rm == 13 is invalid.
> For PLDW, rn == 15 is invalid.

> Fixes a RISU mismatch for the HINTSPACE pattern in t32.risu
> compared to a neoverse-n1 host.

These are UNPREDICTABLE cases, not invalid. In general
we don't try to match a specific implementation's
UNPREDICTABLE choices.

I think we're better off avoiding the mismatch by improving
the risu patterns to avoid the UNPREDICTABLE cases.

thanks
-- PMM

Re: [PATCH v2 02/67] target/arm: Use PLD, PLDW, PLI not NOP for t32

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:23, Richard Henderson
 wrote:
>
> This fixes a bug in that neither PLI nor PLDW are present in ARMv6T2,
> but are introduced with ARMv7 and ARMv7MP respectively.
> For clarity, do not use NOP for PLD.
>
> Note that there is no PLDW (literal) -- bit 5 of the first word
> is not decoded, and is PLD (literal).  Confirmed on neoverse-n1
> host which does *not* trap on the (0) bit in the decode.

Handling of "(0)" and "(1)" bits in decode is CONSTRAINED UNPREDICTABLE
(see Arm ARM DDI0487K.a F1.7.2): implementations might:
 * UNDEF
 * NOP
 * ignore the bit and execute the insn
 * set any destination registers to UNKNOWN values

Usually (but not always) in QEMU we opt to UNDEF. If I'm
reading the decode lines correctly here in this case we're
opting to ignore the bit because we have both:

+PLD   1000 -001   # (literal)
+PLD   1000 -011   # (literal)

And this isn't a change in this commit because the NOP lines
we used to have also meant we ignored the bit.

With a tweaked commit message
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust

2024-05-28 Thread Stefan Hajnoczi

On Tue, May 28, 2024 at 02:48:42PM +0800, Zhao Liu wrote:
> Hi Stefan,
> 
> On Mon, May 27, 2024 at 03:59:44PM -0400, Stefan Hajnoczi wrote:
> > Date: Mon, 27 May 2024 15:59:44 -0400
> > From: Stefan Hajnoczi 
> > Subject: Re: [RFC 0/6] scripts: Rewrite simpletrace printer in Rust
> > 
> > On Mon, May 27, 2024 at 04:14:15PM +0800, Zhao Liu wrote:
> > > Hi maintainers and list,
> > > 
> > > This RFC series attempts to re-implement simpletrace.py with Rust, which
> > > is the 1st task of Paolo's GSoC 2024 proposal.
> > > 
> > > There are two motivations for this work:
> > > 1. This is an open chance to discuss how to integrate Rust into QEMU.
> > > 2. Rust delivers faster parsing.
> > > 
> > > 
> > > Introduction
> > > 
> > > 
> > > Code framework
> > > --
> > > 
> > > I choose "cargo" to organize the code, because the current
> > > implementation depends on external crates (Rust's library), such as
> > > "backtrace" for getting frameinfo, "clap" for parsing the cli, "rex" for
> > > regular matching, and so on. (Meson's support for external crates is
> > > still incomplete. [2])
> > > 
> > > The simpletrace-rust created in this series is not yet integrated into
> > > the QEMU compilation chain, so it can only be compiled independently, e.g.
> > > under ./scripts/simpletrace/, compile it be:
> > > 
> > > cargo build --release
> > 
> > Please make sure it's built by .gitlab-ci.d/ so that the continuous
> > integration system prevents bitrot. You can add a job that runs the
> > cargo build.
> 
> Thanks! I'll do this.
> 
> > > 
> > > The code tree for the entire simpletrace-rust is as follows:
> > > 
> > > $ script/simpletrace-rust .
> > > .
> > > ├── Cargo.toml
> > > └── src
> > > └── main.rs   // The simpletrace logic (similar to simpletrace.py).
> > > └── trace.rs  // The Argument and Event abstraction (refer to
> > >   // tracetool/__init__.py).
> > > 
> > > My question about meson v.s. cargo, I put it at the end of the cover
> > > letter (the section "Opens on Rust Support").
> > > 
> > > The following two sections are lessons I've learned from this Rust
> > > practice.
> > > 
> > > 
> > > Performance
> > > ---
> > > 
> > > I did the performance comparison using the rust-simpletrace prototype with
> > > the python one:
> > > 
> > > * On the i7-10700 CPU @ 2.90GHz machine, parsing and outputting a 35M
> > > trace binary file for 10 times on each item:
> > > 
> > >   AVE (ms)   Rust v.s. Python
> > > Rust   (stdout)   12687.16114.46%
> > > Python (stdout)   14521.85
> > > 
> > > Rust   (file)  1422.44264.99%
> > > Python (file)  3769.37
> > > 
> > > - The "stdout" lines represent output to the screen.
> > > - The "file" lines represent output to a file (via "> file").
> > > 
> > > This Rust version contains some optimizations (including print, regular
> > > matching, etc.), but there should be plenty of room for optimization.
> > > 
> > > The current performance bottleneck is the reading binary trace file,
> > > since I am parsing headers and event payloads one after the other, so
> > > that the IO read overhead accounts for 33%, which can be further
> > > optimized in the future.
> > 
> > Performance will become more important when large amounts of TCG data is
> > captured, as described in the project idea:
> > https://wiki.qemu.org/Internships/ProjectIdeas/TCGBinaryTracing
> > 
> > While I can't think of a time in the past where simpletrace.py's
> > performance bothered me, improving performance is still welcome. Just
> > don't spend too much time on performance (and making the code more
> > complex) unless there is a real need.
> 
> Yes, I agree that it shouldn't be over-optimized.
> 
> The logic in the current Rust version is pretty much a carbon copy of
> the Python version, without additional complex logic introduced, but the
> improvements in x2.6 were obtained by optimizing IO:
> 
> * reading the event configuration file, where I called the buffered
>   interface,
> * and the output formatted trace log, which I output all via std_out.lock()
>   followed by write_all().
> 
> So, just the simple tweak of the interfaces brings much benefits. :-)
> 
> > > Security
> > > 
> > > 
> > > This is an example.
> > > 
> > > Rust is very strict about type-checking, and it found timestamp reversal
> > > issue in simpletrace-rust [3] (sorry, haven't gotten around to digging
> > > deeper with more time)...in this RFC, I workingaround it by allowing
> > > negative values. And the python version, just silently covered this
> > > issue up.
> > >
> > > Opens on Rust Support
> > > =
> > > 
> > > Meson v.s. Cargo
> > > 
> > > 
> > > The first question is whether all Rust code (including under scripts)
> > > must be integrated into meson?
> > > 
> > > If so, because of [2] then I have to discard the external crates and
> > > build some more Rust wheel

[PATCH v2] target/riscv: zvbb implies zvkb

2024-05-28 Thread Jerry Zhang Jian

- According to RISC-V crypto spec, Zvkb extension is a proper subset of the 
Zvbb extension.

- Reference: 
https://github.com/riscv/riscv-crypto/blob/1769c2609bf4535632e0c0fd715778f212bb272e/doc/vector/riscv-crypto-vector-zvkb.adoc?plain=1#L10

Signed-off-by: Jerry Zhang Jian 
---
 target/riscv/tcg/tcg-cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 40054a391a..f1a1306ab2 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -658,6 +658,10 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbc), true);
 }
 
+if (cpu->cfg.ext_zvbb) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkb), true);
+}
+
 /*
  * In principle Zve*x would also suffice here, were they supported
  * in qemu
-- 
2.44.0

Re: [RFC v2 1/2] target/loongarch: Add loongson binary translation feature

2024-05-28 Thread gaosong


在 2024/5/28 上午9:07, maobibo 写道:

Hi Philippe,

Thanks for reviewing my patch.
I reply inline.

On 2024/5/27 下午6:37, Philippe Mathieu-Daudé wrote:

Hi Bibo,

On 27/5/24 10:35, Bibo Mao wrote:

Loongson Binary Translation (LBT) is used to accelerate binary
translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM
eflags (eflags) and x87 fpu stack pointer (ftop).

Now LBT feature is added in kvm mode, not supported in TCG mode since
it is not emulated. There are two feature flags such as forced_features
and default_features for each vcpu, the real feature is still in 
cpucfg.

Flag forced_features is parsed from command line, default_features is
parsed from cpu type.

Flag forced_features has higher priority than flag default_features,
default_features will be used if there is no command line option for 
LBT
feature. If the feature is not supported with KVM host, it reports 
error
and exits if forced_features is set, else it disables feature and 
continues

if default_features is set.

Signed-off-by: Bibo Mao 
---
  target/loongarch/cpu.c    | 69 
+++

  target/loongarch/cpu.h    | 12 +
  target/loongarch/kvm/kvm.c    | 26 ++
  target/loongarch/kvm/kvm_loongarch.h  | 16 +++
  target/loongarch/loongarch-qmp-cmds.c |  2 +-
  5 files changed, 124 insertions(+), 1 deletion(-)




+static void loongarch_set_lbt(Object *obj, bool value, Error **errp)
+{
+    LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+
+    if (!kvm_enabled()) {


Either set errp, ...


+    return;
+    }
+
+    if (value) {
+    /* Enable binary translation for all architectures */
+    cpu->env.forced_features |= BIT_ULL(LOONGARCH_FEATURE_LBT);
+    } else {
+    /* Disable default features also */
+    cpu->env.default_features &= ~BIT_ULL(LOONGARCH_FEATURE_LBT);
+    }
+}
+
  void loongarch_cpu_post_init(Object *obj)
  {
  object_property_add_bool(obj, "lsx", loongarch_get_lsx,
   loongarch_set_lsx);
  object_property_add_bool(obj, "lasx", loongarch_get_lasx,
   loongarch_set_lasx);


... or only add the property if KVM is enabled:

    if (kvm_enabled()) {

Sure, will do. I think this method is better.

By the way bitmap method forced_features/default_feature is variant
of OnOffAuto method. Bitmap method uses two bit, OnOffAuto method uses 
separate feature variable. We do not know which method is better or 
which is the future trend.



I think the OnOffAuto variable is better.

The default_features is just a copy of cpucfg, and is not required.

Thanks.
Song Gao

Regards
Bibo Mao



+    object_property_add_bool(obj, "lbt", loongarch_get_lbt,
+ loongarch_set_lbt);
  }

Re: [PATCH 0/4] testing/next: purging remaining centos 8 bits

2024-05-28 Thread Alex Bennée

Philippe Mathieu-Daudé  writes:

> Hi Alex,
>
> On 21/5/24 14:53, Alex Bennée wrote:
>> There are a few more bits referencing centos8 in the tree which needed
>> cleaning up. After this we can remove the dedicated runner from the
>> gitlab registration. If we want to keep a dedicated Centos runner then
>> we can add back the bits needed to set it up (although arguably we
>> could have a single build-environment setup that handles all distros
>> and integrates with lcitool).
>
> Do you you mean we should generate
> scripts/ci/setup/build-environment.yml with lcitool?
> Otherwise should we update it?

Yes - I've got some WIP patches which I'll post soon.

>
>> Alex.
>> Alex Bennée (4):
>>ci: remove centos-steam-8 customer runner
>>docs/devel: update references to centos to later version
>>tests/vm: update centos.aarch64 image to 9
>>tests/vm: remove plain centos image
>>   docs/devel/ci-jobs.rst.inc|   7 -
>>   docs/devel/testing.rst|   8 +-
>>   .gitlab-ci.d/custom-runners.yml   |   1 -
>>   .../custom-runners/centos-stream-8-x86_64.yml |  24 ---
>>   .../org.centos/stream/8/build-environment.yml |  82 
>>   .../ci/org.centos/stream/8/x86_64/configure   | 198 --
>>   .../org.centos/stream/8/x86_64/test-avocado   |  65 --
>>   scripts/ci/org.centos/stream/README   |  17 --
>>   tests/vm/Makefile.include |   1 -
>>   tests/vm/centos   |  51 -
>>   tests/vm/centos.aarch64   |  10 +-
>>   11 files changed, 9 insertions(+), 455 deletions(-)
>>   delete mode 100644 .gitlab-ci.d/custom-runners/centos-stream-8-x86_64.yml
>>   delete mode 100644 scripts/ci/org.centos/stream/8/build-environment.yml
>>   delete mode 100755 scripts/ci/org.centos/stream/8/x86_64/configure
>>   delete mode 100755 scripts/ci/org.centos/stream/8/x86_64/test-avocado
>>   delete mode 100644 scripts/ci/org.centos/stream/README
>>   delete mode 100755 tests/vm/centos
>> 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] iotests/pylintrc: allow up to 10 similar lines

2024-05-28 Thread Vladimir Sementsov-Ogievskiy


On 30.04.24 12:13, Vladimir Sementsov-Ogievskiy wrote:

We want to have similar QMP objects in different tests. Reworking these
objects to make common parts by calling some helper functions doesn't
seem good. It's a lot more comfortable to see the whole QAPI request in
one place.

So, let's increase the limit, to unblock further commit
"iotests: add backup-discard-source"

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! That's a patch to unblock my PR
"[PULL 0/6] Block jobs patches for 2024-04-29"
   <20240429115157.2260885-1-vsement...@yandex-team.ru>
   https://patchew.org/QEMU/20240429115157.2260885-1-vsement...@yandex-team.ru/


  tests/qemu-iotests/pylintrc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
index de2e0c2781..05b75ee59b 100644
--- a/tests/qemu-iotests/pylintrc
+++ b/tests/qemu-iotests/pylintrc
@@ -55,4 +55,4 @@ max-line-length=79
  
  [SIMILARITIES]
  
-min-similarity-lines=6

+min-similarity-lines=10



Hi! I hope it's OK, if I just apply this to my block branch, and resend "[PULL 0/6] 
Block jobs patches for 2024-04-29" which is blocked on this problem.

Thanks for reviewing,
applied to my block branch

--
Best regards,
Vladimir

Re: [PATCH v2 04/67] target/arm: Zero-extend writeback for fp16 FCVTZS (scalar, integer)

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:23, Richard Henderson
 wrote:
>
> Fixes RISU mismatch for "fcvtzs h31, h0, #14".
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/tcg/translate-a64.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
> index 4126aaa27e..d97acdbaf9 100644
> --- a/target/arm/tcg/translate-a64.c
> +++ b/target/arm/tcg/translate-a64.c
> @@ -8707,6 +8707,9 @@ static void handle_simd_shift_fpint_conv(DisasContext 
> *s, bool is_scalar,
>  read_vec_element_i32(s, tcg_op, rn, pass, size);
>  fn(tcg_op, tcg_op, tcg_shift, tcg_fpstatus);
>  if (is_scalar) {
> +if (size == MO_16 && !is_u) {
> +tcg_gen_ext16u_i32(tcg_op, tcg_op);
> +}
>  write_fp_sreg(s, rd, tcg_op);
>  } else {
>  write_vec_element_i32(s, tcg_op, rd, pass, size);
> --
> 2.34.1

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v6 12/12] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-05-28 Thread Philippe Mathieu-Daudé


On 28/5/24 12:38, Stefano Garzarella wrote:

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
  tests/qtest/vhost-user-test.c | 23 +++
  1 file changed, 23 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH] target/riscv: zvbb implies zvkb

2024-05-28 Thread Jerry ZJ



Canary Mail
You've received a secure email
jerry.zhangj...@sifive.com has sent you a secure email via Canary Mail.

Read Secure Email 
(https://secure.canarymail.io/read?obj_id=04c03de7-d745-472e-b026-7dd839bc34a0&obj_key=eGJUOWtORkFXenBvWTJyMSt4VGpWdz09&thr_id=04c03de7-d745-472e-b026-7dd839bc34a0)

If you expect to correspond often with jerry.zhangj...@sifive.com, we recommend 
you download Canary Mail for free.

Download Canary (https://canarymail.io)

Privacy (https://canarymail.io/privacy.html) | Terms 
(https://canarymail.io/terms.html) | Docs (https://help.canarymail.io/) | 
Support (https://canarymail.zendesk.com/hc/en-us/requests/new)

Copyright © 2021 Canary Mail, All rights reserved.

Re: [PATCH v2 05/67] target/arm: Fix decode of FMOV (hp) vs MOVI

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:30, Richard Henderson
 wrote:
>
> The decode of FMOV (vector, immediate, half-precision) vs
> invalid cases of MOVI are incorrect.
>
> Fixes RISU mismatch for invalid insn 0x2f01fd31.
>
> Fixes: 70b4e6a4457 ("arm/translate-a64: add FP16 FMOV to simd_mod_imm")
> Signed-off-by: Richard Henderson 


Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH v2 06/67] target/arm: Verify sz=0 for Advanced SIMD scalar pairwise (fp16)

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:22, Richard Henderson
 wrote:
>
> All of these insns have "if sz == '1' then UNDEFINED" in their pseudocode.
> Fixes a RISU miscompare for invalid insn 0x5ef0c87a.
>
> Fixes: 5c36d89567c ("arm/translate-a64: add all FP16 ops in 
> simd_scalar_pairwise")
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH] target/riscv: zvbb implies zvkb

2024-05-28 Thread Jerry Zhang Jian

Sorry, I had the bad mail client setting. Please ignore the previous email,
and I will resubmit the patch.

--
Jerry

Jerry ZJ  於 2024年5月28日 週二 下午8:12寫道：

>
> *Canary Mail You've received a secure email*
> jerry.zhangj...@sifive.com has sent you a secure email via Canary Mail.
> Read Secure Email
> 
> If you expect to correspond often with jerry.zhangj...@sifive.com, we
> recommend you download Canary Mail for free.
> Download Canary 
> [image: Twitter] 
> [image: Website] 
> Privacy  | Terms
>  | Docs  |
> Support 
>
> Copyright © 2021 Canary Mail, All rights reserved.
>

Re: [PATCH v2 21/67] target/arm: Introduce vfp_load_reg16

2024-05-28 Thread Peter Maydell

On Sat, 25 May 2024 at 00:23, Richard Henderson
 wrote:
>
> Load and zero-extend float16 into a TCGv_i32 before
> all scalar operations.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Peter Maydell 

thanks
-- PMM

[PATCH v2 0/2] backup: allow specifying minimum cluster size

2024-05-28 Thread Fiona Ebner

Based-on: 
https://lore.kernel.org/qemu-devel/20240429115157.2260885-1-vsement...@yandex-team.ru/

Discussion for v1:
https://lore.kernel.org/qemu-devel/20240308155158.830258-1-f.eb...@proxmox.com/

Changes in v2:
* Use 'size' type in QAPI.
* Remove option in cbw_parse_options(), i.e. before parsing generic
  blockdev options.
* Reword commit messages hoping to describe the issue in a more
  straight-forward way.

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

Fiona Ebner (2):
  copy-before-write: allow specifying minimum cluster size
  backup: add minimum cluster size to performance options

 block/backup.c |  2 +-
 block/block-copy.c | 22 ++
 block/copy-before-write.c  | 18 +-
 block/copy-before-write.h  |  1 +
 blockdev.c |  3 +++
 include/block/block-copy.h |  1 +
 qapi/block-core.json   | 17 ++---
 7 files changed, 55 insertions(+), 9 deletions(-)

-- 
2.39.2

[PATCH v2 2/2] backup: add minimum cluster size to performance options

2024-05-28 Thread Fiona Ebner

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Use 'size' type in QAPI.

 block/backup.c| 2 +-
 block/copy-before-write.c | 8 
 block/copy-before-write.h | 1 +
 blockdev.c| 3 +++
 qapi/block-core.json  | 9 +++--
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 3dd2e229d2..a1292c01ec 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -458,7 +458,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 }
 
 cbw = bdrv_cbw_append(bs, target, filter_node_name, discard_source,
-  &bcs, errp);
+  perf->min_cluster_size, &bcs, errp);
 if (!cbw) {
 goto error;
 }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index ef0bc4dcfe..183eed42e5 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -553,6 +553,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
   bool discard_source,
+  uint64_t min_cluster_size,
   BlockCopyState **bcs,
   Error **errp)
 {
@@ -572,6 +573,13 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 qdict_put_str(opts, "file", bdrv_get_node_name(source));
 qdict_put_str(opts, "target", bdrv_get_node_name(target));
 
+if (min_cluster_size > INT64_MAX) {
+error_setg(errp, "min-cluster-size too large: %lu > %ld",
+   min_cluster_size, INT64_MAX);
+return NULL;
+}
+qdict_put_int(opts, "min-cluster-size", (int64_t)min_cluster_size);
+
 top = bdrv_insert_node(source, opts, flags, errp);
 if (!top) {
 return NULL;
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 01af0cd3c4..2a5d4ba693 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -40,6 +40,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
   bool discard_source,
+  uint64_t min_cluster_size,
   BlockCopyState **bcs,
   Error **errp);
 void bdrv_cbw_drop(BlockDriverState *bs);
diff --git a/blockdev.c b/blockdev.c
index 835064ed03..6740663fda 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2655,6 +2655,9 @@ static BlockJob *do_backup_common(BackupCommon *backup,
 if (backup->x_perf->has_max_chunk) {
 perf.max_chunk = backup->x_perf->max_chunk;
 }
+if (backup->x_perf->has_min_cluster_size) {
+perf.min_cluster_size = backup->x_perf->min_cluster_size;
+}
 }
 
 if ((backup->sync == MIRROR_SYNC_MODE_BITMAP) ||
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8fc0a4b234..f1219a9dfb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1551,11 +1551,16 @@
 # it should not be less than job cluster size which is calculated
 # as maximum of target image cluster size and 64k.  Default 0.
 #
+# @min-cluster-size: Minimum size of blocks used by copy-before-write
+# and background copy operations.  Has to be a power of 2.  No
+# effect if smaller than the maximum of the target's cluster size
+# and 64 KiB.  Default 0.  (Since 9.1)
+#
 # Since: 6.0
 ##
 { 'struct': 'BackupPerf',
-  'data': { '*use-copy-range': 'bool',
-'*max-workers': 'int', '*max-chunk': 'int64' } }
+  'data': { '*use-copy-range': 'bool', '*max-workers': 'int',
+'*max-chunk': 'int64', '*min-cluster-size': 'size' } }
 
 ##
 # @BackupCommon:
-- 
2.39.2

[PATCH v2 1/2] copy-before-write: allow specifying minimum cluster size

2024-05-28 Thread Fiona Ebner

In the context of backup fleecing, discarding the source will not work
when the fleecing image has a larger granularity than the one used for
block-copy operations (can happen if the backup target has smaller
cluster size), because cbw_co_pdiscard_snapshot() will align down the
discard requests and thus effectively ignore then.

To make @discard-source work in such a scenario, allow specifying the
minimum cluster size used for block-copy operations and thus in
particular also the granularity for discard requests to the source.

The type 'size' (corresponding to uint64_t in C) is used in QAPI to
rule out negative inputs and for consistency with already existing
@cluster-size parameters. Since block_copy_calculate_cluster_size()
uses int64_t for its result, a check that the input is not too large
is added in block_copy_state_new() before calling it. The calculation
in block_copy_calculate_cluster_size() is done in the target int64_t
type.

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Fiona Ebner 
---

Changes in v2:
* Use 'size' type in QAPI.
* Remove option in cbw_parse_options(), i.e. before parsing generic
  blockdev options.

 block/block-copy.c | 22 ++
 block/copy-before-write.c  | 10 +-
 include/block/block-copy.h |  1 +
 qapi/block-core.json   |  8 +++-
 4 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 7e3b378528..36eaecaaf4 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -310,6 +310,7 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
 }
 
 static int64_t block_copy_calculate_cluster_size(BlockDriverState *target,
+ int64_t min_cluster_size,
  Error **errp)
 {
 int ret;
@@ -335,7 +336,7 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 "used. If the actual block size of the target exceeds "
 "this default, the backup may be unusable",
 BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
 } else if (ret < 0 && !target_does_cow) {
 error_setg_errno(errp, -ret,
 "Couldn't determine the cluster size of the target image, "
@@ -345,16 +346,18 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 return ret;
 } else if (ret < 0 && target_does_cow) {
 /* Not fatal; just trudge on ahead. */
-return BLOCK_COPY_CLUSTER_SIZE_DEFAULT;
+return MAX(min_cluster_size, (int64_t)BLOCK_COPY_CLUSTER_SIZE_DEFAULT);
 }
 
-return MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, bdi.cluster_size);
+return MAX(min_cluster_size,
+   (int64_t)MAX(BLOCK_COPY_CLUSTER_SIZE_DEFAULT, 
bdi.cluster_size));
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  BlockDriverState *copy_bitmap_bs,
  const BdrvDirtyBitmap *bitmap,
  bool discard_source,
+ uint64_t min_cluster_size,
  Error **errp)
 {
 ERRP_GUARD();
@@ -365,7 +368,18 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 
 GLOBAL_STATE_CODE();
 
-cluster_size = block_copy_calculate_cluster_size(target->bs, errp);
+if (min_cluster_size > INT64_MAX) {
+error_setg(errp, "min-cluster-size too large: %lu > %ld",
+   min_cluster_size, INT64_MAX);
+return NULL;
+} else if (min_cluster_size && !is_power_of_2(min_cluster_size)) {
+error_setg(errp, "min-cluster-size needs to be a power of 2");
+return NULL;
+}
+
+cluster_size = block_copy_calculate_cluster_size(target->bs,
+ (int64_t)min_cluster_size,
+ errp);
 if (cluster_size < 0) {
 return NULL;
 }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index cd65524e26..ef0bc4dcfe 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -417,6 +417,7 @@ static BlockdevOptions *cbw_parse_options(QDict *options, 
Error **errp)
 qdict_extract_subqdict(options, NULL, "bitmap");
 qdict_del(options, "on-cbw-error");
 qdict_del(options, "cbw-timeout");
+qdict_del(options, "min-cluster-size");
 
 out:
 visit_free(v);
@@ -432,6 +433,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 BDRVCopyBeforeWriteState *s = bs->opaque;
 BdrvDirtyBitmap *bitmap = NULL;
 int64_t cluster_size;
+uint64_t min_cluster_size = 0;
 g_autoptr(BlockdevOptions) full_opts = NULL;
 BlockdevOptionsCbw *opt

Re: [PATCH v4 3/4] qapi: Do not cast function pointers

2024-05-28 Thread Markus Armbruster

Akihiko Odaki  writes:

> -fsanitize=undefined complains if function pointers are casted. It
> also prevents enabling teh strict mode of CFI which is currently

Typo: the

> disabled with -fsanitize-cfi-icall-generalize-pointers.

The above describes the problem the patch solves.  Good!  Two
suggestions:

1. Quote the error message.

2. Briefly describe the solution as well.  Perhaps:

  The problematic casts are necessary to pass visit_type_T() and
  visit_type_T_members() as callbacks to qapi_clone() and
  qapi_clone_members(), respectively.  Open-code these two functions to
  avoid the callbacks, and thus the type casts.

> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2346
> Signed-off-by: Akihiko Odaki 

Always kind of sad to move implementation code to headers, but getting
rid of the function pointer casts makes sense, and I don't have better
ideas for doing that.

With an improved commit message
Reviewed-by: Markus Armbruster

Re: [PATCH 2/4] usb: add config options for the hub and hid devices

2024-05-28 Thread Thomas Huth


On 28/05/2024 11.54, Gerd Hoffmann wrote:

Signed-off-by: Gerd Hoffmann 
---
  hw/usb/Kconfig | 10 ++
  hw/usb/meson.build |  4 ++--
  2 files changed, 12 insertions(+), 2 deletions(-)


Reviewed-by: Thomas Huth

[PATCH v6 11/12] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-05-28 Thread Stefano Garzarella

`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Since we are here, let`s remove `share=on` which is the default for shm
(and also for memfd).

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
v6
- removed `share=on` since it's the default [David]
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..ea90d41232 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.45.1

[PATCH v6 12/12] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-05-28 Thread Stefano Garzarella

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Acked-by: Thomas Huth 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, &opts);
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, &opts);
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.45.1

[PATCH v6 10/12] hostmem: add a new memory backend based on POSIX shm_open()

2024-05-28 Thread Stefano Garzarella

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Acked-by: David Hildenbrand 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5
- fixed documentation in qapi/qom.json and qemu-options.hx [Markus]
v4
- fail if we find "share=off" in shm_backend_memory_alloc() [David]
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  19 +
 backends/hostmem-shm.c | 123 +
 backends/meson.build   |   1 +
 qemu-options.hx|  16 
 5 files changed, 162 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 38dde6d785..d40592d863 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,21 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm. Setting it to false
+# will cause a failure during allocation because it is not supported by this
+# backend.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -985,6 +1000,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1056,6 +1073,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
   'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
new file mode 100644
index 00..374edc3db8
--- /dev/null
+++ b/backends/hostmem-shm.c
@@ -0,0 +1,123 @@
+/*
+ * QEMU host POSIX shared memory object backend
+ *
+ * Copyright (C) 2024 Red Hat Inc
+ *
+ * Authors:
+ *   Stefano Garzarella 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+
+#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
+
+OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendShm, MEMORY_BACKEND_SHM)
+
+struct HostMemoryBackendShm {
+HostMemoryBackend parent_obj;
+};
+
+static bool
+shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+g_autoptr(GString) shm_name = g_string_new(NULL);
+g_autofree char *backend_name = NULL;
+uint32_t ram_flags;
+int fd, oflag;
+mode_t mode;
+
+if (!backend->size) {
+error_setg(errp, "can't create shm backend with size 0");
+return false;

[PATCH v6 09/12] contrib/vhost-user-blk: enable it on any POSIX system

2024-05-28 Thread Stefano Garzarella

Let's make the code more portable by adding defines from
block/file-posix.c to support O_DIRECT in other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Acked-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefano Garzarella 
---
v6:
- reverted v5 changes since we can't move O_DSYNC and O_DIRECT in osdep
  [Daniel, failing tests on Windows]
v5:
- O_DSYNC and O_DIRECT definition are now in osdep [Phil]
- commit updated since we moved out all code changes
v4:
- moved using of "qemu/bswap.h" API in a separate patch [Phil]
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 14 ++
 util/meson.build|  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 48e476b237..c89ee7b578 100644
--- a/meson.build
+++ b/meson.build
@@ -1981,8 +1981,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 9492146855..a450337685 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -25,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
diff --git a/util/meson.build b/util/meson.build
index 72b505df11..c414178ace 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,10 +112,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
   util_ss.add(files('yank.c'))
 endif
 
-- 
2.45.1

[PATCH v6 03/12] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-05-28 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..2c20cdc16e 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,17 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+#ifndef MFD_ALLOW_SEALING
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
+
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.45.1

[PATCH v6 08/12] libvhost-user: enable it on any POSIX system

2024-05-28 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is available only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include QEMU's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 
Acked-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---
v5:
- fixed typos in the commit description [Phil]
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index a72500be77..48e476b237 100644
--- a/meson.build
+++ b/meson.build
@@ -3162,7 +3162,7 @@ if have_system and vfio_user_server_allowed
 endif
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 2c20cdc16e..57e58d4adb 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, &value, sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.45.1

< 1 2 3 4 >

201 - 300 of 381 matches

Mail list logo