date:20230821

Re: [PATCH v4 10/15] hw/loongarch: Remove restriction of la464 cores in the virt machine

2023-08-21 Thread Philippe Mathieu-Daudé


On 22/8/23 05:27, Song Gao wrote:

Allow virt machine to be used with la132 instead of la464.

Co-authored-by: Jiajie Chen 
Signed-off-by: Song Gao 
---
  hw/loongarch/virt.c | 5 -
  1 file changed, 5 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v4 09/15] target/loongarch: Add LoongArch32 cpu la132

2023-08-21 Thread Philippe Mathieu-Daudé


On 22/8/23 05:27, Song Gao wrote:

From: Jiajie Chen 

Add LoongArch32 cpu la132.

Due to lack of public documentation of la132, it is currently a
synthetic LoongArch32 cpu model. Details need to be added in the future.

Signed-off-by: Jiajie Chen 
Signed-off-by: Song Gao 
---
  target/loongarch/cpu.c | 30 ++


Thanks for splitting the hw/ patch out, ...


  1 file changed, 30 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 67eb6c3135..d3c3e0d8a1 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -440,6 +440,35 @@ static void loongarch_la464_initfn(Object *obj)
  env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
  }
  
+static void loongarch_la132_initfn(Object *obj)

+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = &cpu->env;
+
+int i;
+
+for (i = 0; i < 21; i++) {
+env->cpucfg[i] = 0x0;
+}
+
+cpu->dtb_compatible = "loongarch,Loongson-1C103";
+env->cpucfg[0] = 0x148042;  /* PRID */


... and filling the PRid register.

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 41/58] i386/tdx: handle TDG.VP.VMCALL

2023-08-21 Thread Markus Armbruster

Xiaoyao Li  writes:

> From: Isaku Yamahata 
>
> For GetQuote, delegate a request to Quote Generation Service.  Add property
> of address of quote generation server and On request, connect to the
> server, read request buffer from shared guest memory, send the request
> buffer to the server and store the response into shared guest memory and
> notify TD guest by interrupt.
>
> "quote-generation-service" is a property to specify Quote Generation
> Service(QGS) in qemu socket address format.  The examples of the supported
> format are "vsock:2:1234", "unix:/run/qgs", "localhost:1234".
>
> command line example:
>   qemu-system-x86_64 \
> -object 'tdx-guest,id=tdx0,quote-generation-service=localhost:1234' \
> -machine confidential-guest-support=tdx0
>
> Signed-off-by: Isaku Yamahata 
> Signed-off-by: Xiaoyao Li 
> ---
>  qapi/qom.json |   5 +-
>  target/i386/kvm/tdx.c | 380 ++
>  target/i386/kvm/tdx.h |   7 +
>  3 files changed, 391 insertions(+), 1 deletion(-)
>
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 87c1d440f331..37139949d761 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -879,13 +879,16 @@
>  #
>  # @mrownerconfig: MROWNERCONFIG SHA384 hex string of 48 * 2 length (default: 
> 0)
>  #
> +# @quote-generation-service: socket address for Quote Generation Service(QGS)
> +#
>  # Since: 8.2
>  ##
>  { 'struct': 'TdxGuestProperties',
>'data': { '*sept-ve-disable': 'bool',
>  '*mrconfigid': 'str',
>  '*mrowner': 'str',
> -'*mrownerconfig': 'str' } }
> +'*mrownerconfig': 'str',
> +'*quote-generation-service': 'str' } }

Why not type SocketAddress?

>  
>  ##
>  # @ThreadContextProperties:

[...]

Re: [PATCH v2 20/58] i386/tdx: Allows mrconfigid/mrowner/mrownerconfig for TDX_INIT_VM

2023-08-21 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Fri, Aug 18, 2023 at 05:50:03AM -0400, Xiaoyao Li wrote:
>> From: Isaku Yamahata 
>> 
>> When creating TDX vm, three sha384 hash values can be provided for
>> TDX attestation.
>> 
>> So far they were hard coded as 0. Now allow user to specify those values
>> via property mrconfigid, mrowner and mrownerconfig. Choose hex-encoded
>> string as format since it's friendly for user to input.
>> 
>> example
>> -object tdx-guest, \
>>   
>> mrconfigid=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef,
>>  \
>>   
>> mrowner=fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210fedcba9876543210,
>>  \
>>   
>> mrownerconfig=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef
>> 
>> Signed-off-by: Isaku Yamahata 
>> Signed-off-by: Xiaoyao Li 
>> ---
>> TODO:
>>  - community requests to use base64 encoding if no special reason
>> ---
>>  qapi/qom.json | 11 ++-
>>  target/i386/kvm/tdx.c | 13 +
>>  target/i386/kvm/tdx.h |  3 +++
>>  3 files changed, 26 insertions(+), 1 deletion(-)
>> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index cc08b9a98df9..87c1d440f331 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -873,10 +873,19 @@
>>  #
>>  # @sept-ve-disable: bit 28 of TD attributes (default: 0)
>>  #
>> +# @mrconfigid: MRCONFIGID SHA384 hex string of 48 * 2 length (default: 0)
>> +#
>> +# @mrowner: MROWNER SHA384 hex string of 48 * 2 length (default: 0)
>> +#
>> +# @mrownerconfig: MROWNERCONFIG SHA384 hex string of 48 * 2 length 
>> (default: 0)
>
> Per previous patch, I suggest these should all be passed in base64
> instead of hex.

I'm upgrading this suggestion to a demand: we use base64 for encoding
binary data everywhere in QAPI/QMP.  Consistency matters.

> Also 'default: 0' makes no sense for a string,
> which would be 'default: nil', and no need to document that as
> the default is implicit from the fact that its an optional string
> field. So eg
>
>   @mrconfigid: base64 encoded MRCONFIGID SHA384 digest

Agree.

The member names are abbreviations all run together, wheras QAPI/QMP
favors words-separated-with-dashes.  If you invented them, please change
them to QAPI/QMP style.  If they are established TDX terminology, keep
them as they are, but please point to your evidence.

>> +#
>>  # Since: 8.2
>>  ##
>>  { 'struct': 'TdxGuestProperties',
>> -  'data': { '*sept-ve-disable': 'bool' } }
>> +  'data': { '*sept-ve-disable': 'bool',
>> +'*mrconfigid': 'str',
>> +'*mrowner': 'str',
>> +'*mrownerconfig': 'str' } }
>>  
>>  ##
>>  # @ThreadContextProperties:

[...]

Re: [PATCH v4 8/9] targer/arm: Inform helpers whether a PAC instruction is 'combined'

2023-08-21 Thread Philippe Mathieu-Daudé


On 22/8/23 06:25, Richard Henderson wrote:

From: Aaron Lindsay 

An instruction is a 'combined' Pointer Authentication instruction
if it does something in addition to PAC -- for instance, branching
to or loading an address from the authenticated pointer.

Knowing whether a PAC operation is 'combined' is needed to
implement FEAT_FPACCOMBINE.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Richard Henderson 
Message-Id: <20230609172324.982888-7-aa...@os.amperecomputing.com>
Signed-off-by: Richard Henderson 
---
  target/arm/tcg/helper-a64.h|  4 ++
  target/arm/tcg/pauth_helper.c  | 71 +++---
  target/arm/tcg/translate-a64.c | 12 +++---
  3 files changed, 68 insertions(+), 19 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 15/58] i386/tdx: Add property sept-ve-disable for tdx-guest object

2023-08-21 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Fri, Aug 18, 2023 at 05:49:58AM -0400, Xiaoyao Li wrote:
>> Bit 28 of TD attribute, named SEPT_VE_DISABLE. When set to 1, it disables
>> EPT violation conversion to #VE on guest TD access of PENDING pages.
>> 
>> Some guest OS (e.g., Linux TD guest) may require this bit as 1.
>> Otherwise refuse to boot.
>> 
>> Add sept-ve-disable property for tdx-guest object, for user to configure
>> this bit.
>> 
>> Signed-off-by: Xiaoyao Li 
>> Acked-by: Gerd Hoffmann 
>> ---
>>  qapi/qom.json |  4 +++-
>>  target/i386/kvm/tdx.c | 24 
>>  2 files changed, 27 insertions(+), 1 deletion(-)
>> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 2ca7ce7c0da5..cc08b9a98df9 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -871,10 +871,12 @@
>>  #
>>  # Properties for tdx-guest objects.
>>  #
>> +# @sept-ve-disable: bit 28 of TD attributes (default: 0)
>
> This description isn't very useful as it forces the user to go off and
> read the TDX specification to find out what bit 28 means. You've got a

Seconded.

> more useful description in the commit message, so please use that
> in the docs too. eg something like this
>
>   @sept-ve-disable: toggle bit 28 of TD attributes to control disabling
> of EPT violation conversion to #VE on guest
> TD access of PENDING pages. Some guest OS (e.g.
> Linux TD guest) may require this set, otherwise
> they refuse to boot.

But please format like

# @sept-ve-disable: toggle bit 28 of TD attributes to control disabling
# of EPT violation conversion to #VE on guest TD access of PENDING
# pages.  Some guest OS (e.g. Linux TD guest) may require this to
# be set, otherwise they refuse to boot.

to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments
to conform to current conventions).

>> +#
>>  # Since: 8.2
>>  ##
>>  { 'struct': 'TdxGuestProperties',
>> -  'data': { }}
>> +  'data': { '*sept-ve-disable': 'bool' } }
>>  
>>  ##
>>  # @ThreadContextProperties:

[...]

Re: [PATCH v2 02/58] i386: Introduce tdx-guest object

2023-08-21 Thread Markus Armbruster

Xiaoyao Li  writes:

> Introduce tdx-guest object which implements the interface of
> CONFIDENTIAL_GUEST_SUPPORT, and will be used to create TDX VMs (TDs) by
>
>   qemu -machine ...,confidential-guest-support=tdx0   \
>-object tdx-guset,id=tdx0

Typo: tdx-guest

> It has only one property 'attributes' with fixed value 0 and not
> configurable so far.

This must refer to TdxGuest member @attributes.

"Property" suggests QOM property, which @attributes isn't, at least not
in this patch.  Will it become a QOM property later in this series?

Hmm, @attributes appears to remain unused until PATCH 14.  Recommend to
delay its addition until then.

> Signed-off-by: Xiaoyao Li 
> Acked-by: Gerd Hoffmann 
> ---
> changes from RFC-V4
> - make @attributes not user-settable
> ---
>  configs/devices/i386-softmmu/default.mak |  1 +
>  hw/i386/Kconfig  |  5 +++
>  qapi/qom.json| 12 +++
>  target/i386/kvm/meson.build  |  2 ++
>  target/i386/kvm/tdx.c| 40 
>  target/i386/kvm/tdx.h| 19 +++
>  6 files changed, 79 insertions(+)
>  create mode 100644 target/i386/kvm/tdx.c
>  create mode 100644 target/i386/kvm/tdx.h
>
> diff --git a/configs/devices/i386-softmmu/default.mak 
> b/configs/devices/i386-softmmu/default.mak
> index 598c6646dfc0..9b5ec59d65b0 100644
> --- a/configs/devices/i386-softmmu/default.mak
> +++ b/configs/devices/i386-softmmu/default.mak
> @@ -18,6 +18,7 @@
>  #CONFIG_QXL=n
>  #CONFIG_SEV=n
>  #CONFIG_SGA=n
> +#CONFIG_TDX=n
>  #CONFIG_TEST_DEVICES=n
>  #CONFIG_TPM_CRB=n
>  #CONFIG_TPM_TIS_ISA=n
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 9051083c1e78..929f6c3f0e85 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -10,6 +10,10 @@ config SGX
>  bool
>  depends on KVM
>  
> +config TDX
> +bool
> +depends on KVM
> +
>  config PC
>  bool
>  imply APPLESMC
> @@ -26,6 +30,7 @@ config PC
>  imply QXL
>  imply SEV
>  imply SGX
> +imply TDX
>  imply TEST_DEVICES
>  imply TPM_CRB
>  imply TPM_TIS_ISA
> diff --git a/qapi/qom.json b/qapi/qom.json
> index e0b2044e3d20..2ca7ce7c0da5 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -866,6 +866,16 @@
>  'reduced-phys-bits': 'uint32',
>  '*kernel-hashes': 'bool' } }
>  
> +##
> +# @TdxGuestProperties:
> +#
> +# Properties for tdx-guest objects.
> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'TdxGuestProperties',
> +  'data': { }}
> +
>  ##
>  # @ThreadContextProperties:
>  #
> @@ -944,6 +954,7 @@
>  'sev-guest',
>  'thread-context',
>  's390-pv-guest',
> +'tdx-guest',
>  'throttle-group',
>  'tls-creds-anon',
>  'tls-creds-psk',
> @@ -1010,6 +1021,7 @@
>'secret_keyring': { 'type': 'SecretKeyringProperties',
>'if': 'CONFIG_SECRET_KEYRING' },
>'sev-guest':  'SevGuestProperties',
> +  'tdx-guest':  'TdxGuestProperties',
>'thread-context': 'ThreadContextProperties',
>'throttle-group': 'ThrottleGroupProperties',
>'tls-creds-anon': 'TlsCredsAnonProperties',

Actually useful only when CONFIG_TDX is on, but can't make it
conditional here, as CONFIG_TDX is poisoned.

> diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
> index 40fbde96cac6..21ab03fe1349 100644
> --- a/target/i386/kvm/meson.build
> +++ b/target/i386/kvm/meson.build
> @@ -11,6 +11,8 @@ i386_softmmu_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: 
> files('xen-emu.c'))
>  
>  i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
>  
> +i386_softmmu_kvm_ss.add(when: 'CONFIG_TDX', if_true: files('tdx.c'))
> +
>  i386_system_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), 
> if_false: files('hyperv-stub.c'))
>  
>  i386_system_ss.add_all(when: 'CONFIG_KVM', if_true: i386_softmmu_kvm_ss)
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> new file mode 100644
> index ..d3792d4a3d56
> --- /dev/null
> +++ b/target/i386/kvm/tdx.c
> @@ -0,0 +1,40 @@
> +/*
> + * QEMU TDX support
> + *
> + * Copyright Intel
> + *
> + * Author:
> + *  Xiaoyao Li 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qom/object_interfaces.h"
> +
> +#include "tdx.h"
> +
> +/* tdx guest */
> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
> +   tdx_guest,
> +   TDX_GUEST,
> +   CONFIDENTIAL_GUEST_SUPPORT,
> +   { TYPE_USER_CREATABLE },
> +   { NULL })
> +
> +static void tdx_guest_init(Object *obj)
> +{
> +TdxGuest *tdx = TDX_GUEST(obj);
> +
> +tdx

Re: [PATCH v4 2/9] target/arm: Add ID_AA64ISAR2_EL1

2023-08-21 Thread Philippe Mathieu-Daudé


On 22/8/23 06:25, Richard Henderson wrote:

From: Aaron Lindsay 

Signed-off-by: Aaron Lindsay 
[PMM: drop the HVF part of the patch and just comment that
  we need to do something when the register appears in that API]
Signed-off-by: Peter Maydell 
---
  target/arm/cpu.h | 1 +
  target/arm/helper.c  | 4 ++--
  target/arm/hvf/hvf.c | 1 +
  target/arm/kvm64.c   | 2 ++
  4 files changed, 6 insertions(+), 2 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 1/2] vhost-user: fix lost reconnect

2023-08-21 Thread Li Feng

On 22 Aug 2023, at 8:38 AM, Raphael Norwitz 
wrote:


On Aug 17, 2023, at 2:40 AM, Li Feng  wrote:


2023年8月14日 下午8:11，Raphael Norwitz  写道：

Why can’t we rather fix this by adding a “event_cb” param to
vhost_user_async_close and then call qemu_chr_fe_set_handlers in
vhost_user_async_close_bh()?

Even if calling vhost_dev_cleanup() twice is safe today I worry future
changes may easily stumble over the reconnect case and introduce crashes or
double frees.

I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’
has been called in vhost_user_async_close, and will be called in event->cb,
so why need add a
new event_cb?


I’m suggesting calling the data->event_cb instead of the data->cb if we hit
the error case where vhost->vdev is NULL. Something like:

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..edf1dccd44 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2648,6 +2648,10 @@ typedef struct {
static void vhost_user_async_close_bh(void *opaque)
{
VhostAsyncCallback *data = opaque;
+
+VirtIODevice *vdev = VIRTIO_DEVICE(data->dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
struct vhost_dev *vhost = data->vhost;

/*
@@ -2657,6 +2661,9 @@ static void vhost_user_async_close_bh(void *opaque)
 */
if (vhost->vdev) {
data->cb(data->dev);
+} else if (data->event_cb) {
+qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, data->event_cb,
+ NULL, data->dev, NULL, true);
}

g_free(data);

data->event_cb would be vhost_user_blk_event().

I think that makes the error path a lot easier to reason about and more
future proof.

For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in
struct vhost-dev to mark if it’s inited like this:


This is better than the original, but let me know what you think of my
alternative.


The vhost_user_async_close_bh() is a common function in vhost-user.c, and
vhost_user_async_close() is used by vhost-user-scsi/blk/gpio,
However, in your patch it’s limited to VhostUserBlk, so I think my fix is
more reasonable.

Thanks,
LI


diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..edc80c0231 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void
*opaque,
   goto fail_busyloop;
   }

+hdev->inited = true;
   return 0;

fail_busyloop:
@@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
{
   int i;

+if (!hdev->inited) {
+return;
+}
+hdev->inited = false;
   trace_vhost_dev_cleanup(hdev);

   for (i = 0; i < hdev->nvqs; ++i) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index ca3131b1af..74b1aec960 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -123,6 +123,7 @@ struct vhost_dev {
   /* @started: is the vhost device started? */
   bool started;
   bool log_enabled;
+bool inited;
   uint64_t log_size;
   Error *migration_blocker;
   const VhostOps *vhost_ops;

Thanks.


On Aug 4, 2023, at 1:29 AM, Li Feng  wrote:

When the vhost-user is reconnecting to the backend, and if the vhost-user
fails
at the get_features in vhost_dev_init(), then the reconnect will fail
and it will not be retriggered forever.

The reason is:
When the vhost-user fail at get_features, the vhost_dev_cleanup will be
called
immediately.

vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.

The reconnect path is:
vhost_user_blk_event
vhost_user_async_close(.. vhost_user_blk_disconnect ..)
 qemu_chr_fe_set_handlers <- clear the notifier callback
   schedule vhost_user_async_close_bh

The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
called, then the event fd callback will not be reinstalled.

With this patch, the vhost_user_blk_disconnect will call the
vhost_dev_cleanup() again, it's safe.

All vhost-user devices have this issue, including vhost-user-blk/scsi.

Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")

Signed-off-by: Li Feng 
---
hw/virtio/vhost-user.c | 10 +-
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..697b403fe2 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2648,16 +2648,8 @@ typedef struct {
static void vhost_user_async_close_bh(void *opaque)
{
 VhostAsyncCallback *data = opaque;
-struct vhost_dev *vhost = data->vhost;

-/*
- * If the vhost_dev has been cleared in the meantime there is
- * nothing left to do as some other path has completed the
- * cleanup.
- */
-if (vhost->vdev) {
-data->cb(data->dev);
-}
+data->cb(data->dev);

 g_free(data);
}
-- 
2.41.0

Re: [PATCH 2/2] vhost: Add Error parameter to vhost_scsi_common_start()

2023-08-21 Thread Li Feng

On 21 Aug 2023, at 8:09 PM, Markus Armbruster  wrote:

Li Feng  writes:

2023年8月14日 下午8:11，Raphael Norwitz  写道：

Thanks for the cleanup! A few comments.

On Aug 4, 2023, at 1:29 AM, Li Feng  wrote:

Add a Error parameter to report the real error, like vhost-user-blk.

Signed-off-by: Li Feng 
---
hw/scsi/vhost-scsi-common.c   | 17 ++---
hw/scsi/vhost-scsi.c  |  5 +++--
hw/scsi/vhost-user-scsi.c | 14 --
include/hw/virtio/vhost-scsi-common.h |  2 +-
4 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
index a61cd0e907..392587dfb5 100644
--- a/hw/scsi/vhost-scsi-common.c
+++ b/hw/scsi/vhost-scsi-common.c
@@ -16,6 +16,7 @@
*/

#include "qemu/osdep.h"
+#include "qapi/error.h"
#include "qemu/error-report.h"
#include "qemu/module.h"
#include "hw/virtio/vhost.h"
@@ -25,7 +26,7 @@
#include "hw/virtio/virtio-access.h"
#include "hw/fw-path-provider.h"

-int vhost_scsi_common_start(VHostSCSICommon *vsc)
+int vhost_scsi_common_start(VHostSCSICommon *vsc, Error **errp)
{
int ret, i;
VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
@@ -35,18 +36,19 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
VirtIOSCSICommon *vs = (VirtIOSCSICommon *)vsc;

if (!k->set_guest_notifiers) {
-error_report("binding does not support guest notifiers");
+error_setg(errp, "binding does not support guest notifiers");
return -ENOSYS;
}

ret = vhost_dev_enable_notifiers(&vsc->dev, vdev);
if (ret < 0) {
+error_setg_errno(errp, -ret, "Error enabling host notifiers");
return ret;
}

ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
if (ret < 0) {
-error_report("Error binding guest notifier");
+error_setg_errno(errp, -ret, "Error binding guest notifier");
goto err_host_notifiers;
}

@@ -54,7 +56,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)

ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
if (ret < 0) {
-error_report("Error setting inflight format: %d", -ret);


Curious why you’re adding the error value to the string. Isn’t it redundant
since we pass it in as the second param?

+error_setg_errno(errp, -ret, "Error setting inflight format: %d",
-ret);


I don’t understand. Here I put the error message in errp, where is
redundant?


The error message will come out like

 Error setting inflight format: 22: Invalid argument

You need to drop ": %d".

Two remarks:

1. The #1 reason for bad error messages is neglecting to *test* them.

2. Printing errno as a number is needlessly unfriendly to users.

[...]

Understood! Very thanks, I will fix it in the v2.

Re: [RFC v2 PATCH] record-replay: support SMP target machine

2023-08-21 Thread Pavel Dovgalyuk


On 11.08.2023 04:47, Nicholas Piggin wrote:

RR CPU switching is driven by timers and events so it is deterministic
like everything else. Record a CPU switch event and use that to drive
the CPU switch on replay.

Signed-off-by: Nicholas Piggin 
---
This is still in RFC phase because so far I've only really testd ppc
pseries, and only with patches that are not yet upstream (but posted
to list).

It works with smp 2, can step, reverse-step, reverse-continue, etc.
throughout a Linux boot.


I still didn't have time to test it, but here are some comments.



One issue is reverse-step on one gdb thread (vCPU) only steps back one
icount, so if another thread is currently running then it is that one
which goes back one instruction and the selected thread doesn't move. I
would call this a separate issue from the record-replay mechanism, which
is in the replay-debugging policy. I think we could record in each vCPU
an icount of the last instruction it executed before switching, then
reverse step for that vCPU could replay to there. I think that's not so
important yet until this mechanism is solid. But if you test and rsi is
not going backwards, then check your other threads.

Thanks,
Nick


  accel/tcg/tcg-accel-ops-icount.c |  9 +++-
  accel/tcg/tcg-accel-ops-rr.c | 73 +---
  include/exec/replay-core.h   |  3 ++
  replay/replay-internal.h |  1 +
  replay/replay.c  | 34 ++-
  scripts/replay-dump.py   |  5 +++
  softmmu/vl.c |  4 --
  7 files changed, 115 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
index 3d2cfbbc97..c26782a56a 100644
--- a/accel/tcg/tcg-accel-ops-icount.c
+++ b/accel/tcg/tcg-accel-ops-icount.c
@@ -93,10 +93,15 @@ void icount_handle_deadline(void)
  int64_t icount_percpu_budget(int cpu_count)
  {
  int64_t limit = icount_get_limit();
-int64_t timeslice = limit / cpu_count;
+int64_t timeslice;
  
-if (timeslice == 0) {

+if (replay_mode == REPLAY_MODE_PLAY) {
  timeslice = limit;
+} else {
+timeslice = limit / cpu_count;
+if (timeslice == 0) {
+timeslice = limit;
+}
  }
  
  return timeslice;

diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 212d6f8df4..ce040a687e 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -27,6 +27,7 @@
  #include "qemu/lockable.h"
  #include "sysemu/tcg.h"
  #include "sysemu/replay.h"
+#include "sysemu/reset.h"
  #include "sysemu/cpu-timers.h"
  #include "qemu/main-loop.h"
  #include "qemu/notify.h"
@@ -61,6 +62,22 @@ void rr_kick_vcpu_thread(CPUState *unused)
  
  static QEMUTimer *rr_kick_vcpu_timer;

  static CPUState *rr_current_cpu;
+static CPUState *rr_next_cpu;
+static CPUState *rr_last_cpu;
+
+/*
+ * Reset the vCPU scheduler to the initial state.
+ */
+static void record_replay_reset(void *param)
+{
+if (rr_kick_vcpu_timer) {
+timer_del(rr_kick_vcpu_timer);
+}
+g_assert(!rr_current_cpu);
+rr_next_cpu = NULL;
+rr_last_cpu = first_cpu;
+current_cpu = NULL;
+}
  
  static inline int64_t rr_next_kick_time(void)

  {
@@ -184,6 +201,8 @@ static void *rr_cpu_thread_fn(void *arg)
  Notifier force_rcu;
  CPUState *cpu = arg;
  
+qemu_register_reset(record_replay_reset, NULL);

+
  assert(tcg_enabled());
  rcu_register_thread();
  force_rcu.notify = rr_force_rcu;
@@ -238,14 +257,20 @@ static void *rr_cpu_thread_fn(void *arg)
  cpu_budget = icount_percpu_budget(cpu_count);
  }
  
+if (!rr_next_cpu) {

+qatomic_set_mb(&rr_next_cpu, first_cpu);
+}
+cpu = rr_next_cpu;
+
+if (cpu != rr_last_cpu) {
+replay_switch_cpu();
+qatomic_set_mb(&rr_last_cpu, cpu);
+}
+
  rr_start_kick_timer();
  
  replay_mutex_unlock();
  
-if (!cpu) {

-cpu = first_cpu;
-}
-
  while (cpu && cpu_work_list_empty(cpu) && !cpu->exit_request) {
  /* Store rr_current_cpu before evaluating cpu_can_run().  */
  qatomic_set_mb(&rr_current_cpu, cpu);
@@ -284,7 +309,34 @@ static void *rr_cpu_thread_fn(void *arg)
  break;
  }
  
-cpu = CPU_NEXT(cpu);

+if (replay_mode == REPLAY_MODE_NONE) {
+cpu = CPU_NEXT(cpu);
+} else if (replay_mode == REPLAY_MODE_RECORD) {
+/*
+ * Exit the loop immediately so CPU switch events can be
+ * recorded. This may be able to be improved to record
+ * switch events here.
+ */
+cpu = CPU_NEXT(cpu);
+break;
+} else if (replay_mode == REPLAY_MODE_PLAY) {
+/*
+ * Play can exit from tcg_cpus_exec at different times than
+ * r

Re: [PATCH v4 10/15] hw/loongarch: Remove restriction of la464 cores in the virt machine

2023-08-21 Thread Richard Henderson


On 8/21/23 20:27, Song Gao wrote:

Allow virt machine to be used with la132 instead of la464.

Co-authored-by: Jiajie Chen
Signed-off-by: Song Gao
---
  hw/loongarch/virt.c | 5 -
  1 file changed, 5 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v4 09/15] target/loongarch: Add LoongArch32 cpu la132

2023-08-21 Thread Richard Henderson


On 8/21/23 20:27, Song Gao wrote:

From: Jiajie Chen

Add LoongArch32 cpu la132.

Due to lack of public documentation of la132, it is currently a
synthetic LoongArch32 cpu model. Details need to be added in the future.

Signed-off-by: Jiajie Chen
Signed-off-by: Song Gao
---
  target/loongarch/cpu.c | 30 ++
  1 file changed, 30 insertions(+)


Acked-by: Richard Henderson 

r~

[PATCH v4 3/9] target/arm: Add feature detection for FEAT_Pauth2 and extensions

2023-08-21 Thread Richard Henderson

From: Aaron Lindsay 

Rename isar_feature_aa64_pauth_arch to isar_feature_aa64_pauth_qarma5
to distinguish the other architectural algorithm qarma3.

Add ARMPauthFeature and isar_feature_pauth_feature to cover the
other pauth conditions.

Signed-off-by: Aaron Lindsay 
Message-Id: <20230609172324.982888-3-aa...@os.amperecomputing.com>
[rth: Add ARMPauthFeature and eliminate most other predicates]
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h  | 49 +--
 target/arm/tcg/pauth_helper.c |  2 +-
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index fbdbf2df7f..e9fe268453 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3794,28 +3794,61 @@ static inline bool isar_feature_aa64_fcma(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
 }
 
+/*
+ * These are the values from APA/API/APA3.
+ *
+ * They must be compared '>=', except EPAC should use '=='.
+ * In the ARM pseudocode, EPAC is treated as not being implemented
+ * by larger values.
+ */
+typedef enum {
+PauthFeat_None = 0,
+PauthFeat_1= 1,
+PauthFeat_EPAC = 2,
+PauthFeat_2= 3,
+PauthFeat_FPAC = 4,
+PauthFeat_FPACCOMBINED = 5,
+} ARMPauthFeature;
+
+static inline ARMPauthFeature
+isar_feature_pauth_feature(const ARMISARegisters *id)
+{
+/*
+ * Architecturally, only one of {APA,API,APA3} may be active (non-zero)
+ * and the other two must be zero.  Thus we may avoid conditionals.
+ */
+return (FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) |
+FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, API) |
+FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3));
+}
+
 static inline bool isar_feature_aa64_pauth(const ARMISARegisters *id)
 {
 /*
  * Return true if any form of pauth is enabled, as this
  * predicate controls migration of the 128-bit keys.
  */
-return (id->id_aa64isar1 &
-(FIELD_DP64(0, ID_AA64ISAR1, APA, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, API, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, GPA, 0xf) |
- FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0;
+return isar_feature_pauth_feature(id) != PauthFeat_None;
 }
 
-static inline bool isar_feature_aa64_pauth_arch(const ARMISARegisters *id)
+static inline bool isar_feature_aa64_pauth_qarma5(const ARMISARegisters *id)
 {
 /*
- * Return true if pauth is enabled with the architected QARMA algorithm.
- * QEMU will always set APA+GPA to the same value.
+ * Return true if pauth is enabled with the architected QARMA5 algorithm.
+ * QEMU will always enable or disable both APA and GPA.
  */
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, APA) != 0;
 }
 
+static inline bool isar_feature_aa64_pauth_qarma3(const ARMISARegisters *id)
+{
+/*
+ * Return true if pauth is enabled with the architected QARMA3 algorithm.
+ * QEMU will always enable or disable both APA3 and GPA3.
+ */
+return FIELD_EX64(id->id_aa64isar2, ID_AA64ISAR2, APA3) != 0;
+}
+
 static inline bool isar_feature_aa64_tlbirange(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64isar0, ID_AA64ISAR0, TLB) == 2;
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 62af569341..6271a84ec9 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -282,7 +282,7 @@ static uint64_t pauth_computepac_impdef(uint64_t data, 
uint64_t modifier,
 static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
  uint64_t modifier, ARMPACKey key)
 {
-if (cpu_isar_feature(aa64_pauth_arch, env_archcpu(env))) {
+if (cpu_isar_feature(aa64_pauth_qarma5, env_archcpu(env))) {
 return pauth_computepac_architected(data, modifier, key);
 } else {
 return pauth_computepac_impdef(data, modifier, key);
-- 
2.34.1

[PATCH v4 5/9] target/arm: Implement FEAT_PACQARMA3

2023-08-21 Thread Richard Henderson

Implement the QARMA3 cryptographic algorithm for PAC calculation.
Implement a cpu feature to select the algorithm and document it.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-Id: <20230609172324.982888-4-aa...@os.amperecomputing.com>
[rth: Merge cpu feature addition from another patch.]
Signed-off-by: Richard Henderson 
---
 docs/system/arm/cpu-features.rst | 21 -
 docs/system/arm/emulation.rst|  3 ++
 target/arm/cpu.h |  1 +
 target/arm/arm-qmp-cmds.c|  2 +-
 target/arm/cpu64.c   | 24 --
 target/arm/tcg/pauth_helper.c| 54 ++--
 tests/qtest/arm-cpu-features.c   | 12 ++-
 7 files changed, 94 insertions(+), 23 deletions(-)

diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
index 6bb88a40c7..a5fb929243 100644
--- a/docs/system/arm/cpu-features.rst
+++ b/docs/system/arm/cpu-features.rst
@@ -210,15 +210,20 @@ TCG VCPU Features
 TCG VCPU features are CPU features that are specific to TCG.
 Below is the list of TCG VCPU features and their descriptions.
 
-``pauth-impdef``
-  When ``FEAT_Pauth`` is enabled, either the *impdef* (Implementation
-  Defined) algorithm is enabled or the *architected* QARMA algorithm
-  is enabled.  By default the impdef algorithm is disabled, and QARMA
-  is enabled.
+``pauth``
+  Enable or disable ``FEAT_Pauth`` entirely.
 
-  The architected QARMA algorithm has good cryptographic properties,
-  but can be quite slow to emulate.  The impdef algorithm used by QEMU
-  is non-cryptographic but significantly faster.
+``pauth-impdef``
+  When ``pauth`` is enabled, select the QEMU implementation defined algorithm.
+
+``pauth-qarma3``
+  When ``pauth`` is enabled, select the architected QARMA3 algorithm.
+
+Without either ``pauth-impdef`` or ``pauth-qarma3`` enabled,
+the architected QARMA5 algorithm is used.  The architected QARMA5
+and QARMA3 algorithms have good cryptographic properties, but can
+be quite slow to emulate.  The impdef algorithm used by QEMU is
+non-cryptographic but significantly faster.
 
 SVE CPU Properties
 ==
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index bdafc68819..06af20d10f 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -55,6 +55,9 @@ the following architecture extensions:
 - FEAT_MTE (Memory Tagging Extension)
 - FEAT_MTE2 (Memory Tagging Extension)
 - FEAT_MTE3 (MTE Asymmetric Fault Handling)
+- FEAT_PACIMP (Pointer authentication - IMPLEMENTATION DEFINED algorithm)
+- FEAT_PACQARMA3 (Pointer authentication - QARMA3 algorithm)
+- FEAT_PACQARMA5 (Pointer authentication - QARMA5 algorithm)
 - FEAT_PAN (Privileged access never)
 - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
 - FEAT_PAN3 (Support for SCTLR_ELx.EPAN)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e9fe268453..678ddd17a4 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1072,6 +1072,7 @@ struct ArchCPU {
  */
 bool prop_pauth;
 bool prop_pauth_impdef;
+bool prop_pauth_qarma3;
 bool prop_lpa2;
 
 /* DCZ blocksize, in log_2(words), ie low 4 bits of DCZID_EL0 */
diff --git a/target/arm/arm-qmp-cmds.c b/target/arm/arm-qmp-cmds.c
index c8fa524002..b53d5efe13 100644
--- a/target/arm/arm-qmp-cmds.c
+++ b/target/arm/arm-qmp-cmds.c
@@ -95,7 +95,7 @@ static const char *cpu_model_advertised_features[] = {
 "sve640", "sve768", "sve896", "sve1024", "sve1152", "sve1280",
 "sve1408", "sve1536", "sve1664", "sve1792", "sve1920", "sve2048",
 "kvm-no-adjvtime", "kvm-steal-time",
-"pauth", "pauth-impdef",
+"pauth", "pauth-impdef", "pauth-qarma3",
 NULL
 };
 
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index fd584a31da..f3d87e001f 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -474,7 +474,7 @@ void aarch64_add_sme_properties(Object *obj)
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 {
 ARMPauthFeature features = cpu_isar_feature(pauth_feature, cpu);
-uint64_t isar1;
+uint64_t isar1, isar2;
 
 /*
  * These properties enable or disable Pauth as a whole, or change
@@ -490,6 +490,10 @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, 0);
 isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 0);
 
+isar2 = cpu->isar.id_aa64isar2;
+isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, APA3, 0);
+isar2 = FIELD_DP64(isar2, ID_AA64ISAR2, GPA3, 0);
+
 if (kvm_enabled() || hvf_enabled()) {
 /*
  * Exit early if PAuth is enabled and fall through to disable it.
@@ -510,26 +514,39 @@ void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 }
 
 if (cpu->prop_pauth) {
+if (cpu->prop_pauth_impdef && cpu->prop_pauth_qarma3) {
+error_setg(errp,
+   "cannot enable both pauth-impdef and pauth-qar

[PATCH v4 8/9] targer/arm: Inform helpers whether a PAC instruction is 'combined'

2023-08-21 Thread Richard Henderson

From: Aaron Lindsay 

An instruction is a 'combined' Pointer Authentication instruction
if it does something in addition to PAC -- for instance, branching
to or loading an address from the authenticated pointer.

Knowing whether a PAC operation is 'combined' is needed to
implement FEAT_FPACCOMBINE.

Signed-off-by: Aaron Lindsay 
Reviewed-by: Richard Henderson 
Message-Id: <20230609172324.982888-7-aa...@os.amperecomputing.com>
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/helper-a64.h|  4 ++
 target/arm/tcg/pauth_helper.c  | 71 +++---
 target/arm/tcg/translate-a64.c | 12 +++---
 3 files changed, 68 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/helper-a64.h b/target/arm/tcg/helper-a64.h
index 3d5957c11f..57cfd68569 100644
--- a/target/arm/tcg/helper-a64.h
+++ b/target/arm/tcg/helper-a64.h
@@ -90,9 +90,13 @@ DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacdb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacga, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autia_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autib_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autda_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autdb_combined, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
 DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
 
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index b6aeb90548..c05c5b30ff 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -397,7 +397,8 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 }
 
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
-   ARMPACKey *key, bool data, int keynumber)
+   ARMPACKey *key, bool data, int keynumber,
+   uintptr_t ra, bool is_combined)
 {
 ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
@@ -519,44 +520,88 @@ uint64_t HELPER(pacga)(CPUARMState *env, uint64_t x, 
uint64_t y)
 return pac & 0xull;
 }
 
-uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+static uint64_t pauth_autia(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, &env->keys.apia, false, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, &env->keys.apia, false, 0, ra, is_combined);
 }
 
-uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autia)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autia_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autia(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autib(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnIB)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, &env->keys.apib, false, 1);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, &env->keys.apib, false, 1, ra, is_combined);
 }
 
-uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autib)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autib_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autib(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autda(CPUARMState *env, uint64_t x, uint64_t y,
+uintptr_t ra, bool is_combined)
 {
 int el = arm_current_el(env);
 if (!pauth_key_enabled(env, el, SCTLR_EnDA)) {
 return x;
 }
-pauth_check_trap(env, el, GETPC());
-return pauth_auth(env, x, y, &env->keys.apda, true, 0);
+pauth_check_trap(env, el, ra);
+return pauth_auth(env, x, y, &env->keys.apda, true, 0, ra, is_combined);
 }
 
-uint64_t HELPER(autdb)(CPUARMState *env, uint64_t x, uint64_t y)
+uint64_t HELPER(autda)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), false);
+}
+
+uint64_t HELPER(autda_combined)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+return pauth_autda(env, x, y, GETPC(), true);
+}
+
+static uint64_t pauth_autdb(CPUARM

[PATCH v4 2/9] target/arm: Add ID_AA64ISAR2_EL1

2023-08-21 Thread Richard Henderson

From: Aaron Lindsay 

Signed-off-by: Aaron Lindsay 
[PMM: drop the HVF part of the patch and just comment that
 we need to do something when the register appears in that API]
Signed-off-by: Peter Maydell 
---
 target/arm/cpu.h | 1 +
 target/arm/helper.c  | 4 ++--
 target/arm/hvf/hvf.c | 1 +
 target/arm/kvm64.c   | 2 ++
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 88e5accda6..fbdbf2df7f 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1033,6 +1033,7 @@ struct ArchCPU {
 uint32_t dbgdevid1;
 uint64_t id_aa64isar0;
 uint64_t id_aa64isar1;
+uint64_t id_aa64isar2;
 uint64_t id_aa64pfr0;
 uint64_t id_aa64pfr1;
 uint64_t id_aa64mmfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 50f61e42ca..3bae262b2f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8334,11 +8334,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = cpu->isar.id_aa64isar1 },
-{ .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+{ .name = "ID_AA64ISAR2_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = 0 },
+  .resetvalue = cpu->isar.id_aa64isar2 },
 { .name = "ID_AA64ISAR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 3,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 8fce64bbf6..c366f7f517 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -847,6 +847,7 @@ static bool 
hvf_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 { HV_SYS_REG_ID_AA64DFR1_EL1, &host_isar.id_aa64dfr1 },
 { HV_SYS_REG_ID_AA64ISAR0_EL1, &host_isar.id_aa64isar0 },
 { HV_SYS_REG_ID_AA64ISAR1_EL1, &host_isar.id_aa64isar1 },
+/* Add ID_AA64ISAR2_EL1 here when HVF supports it */
 { HV_SYS_REG_ID_AA64MMFR0_EL1, &host_isar.id_aa64mmfr0 },
 { HV_SYS_REG_ID_AA64MMFR1_EL1, &host_isar.id_aa64mmfr1 },
 { HV_SYS_REG_ID_AA64MMFR2_EL1, &host_isar.id_aa64mmfr2 },
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 94bbd9661f..e2d05d7fc0 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -306,6 +306,8 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
   ARM64_SYS_REG(3, 0, 0, 6, 0));
 err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64isar1,
   ARM64_SYS_REG(3, 0, 0, 6, 1));
+err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64isar2,
+  ARM64_SYS_REG(3, 0, 0, 6, 2));
 err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64mmfr0,
   ARM64_SYS_REG(3, 0, 0, 7, 0));
 err |= read_sys_reg64(fdarray[2], &ahcf->isar.id_aa64mmfr1,
-- 
2.34.1

[PATCH v4 4/9] target/arm: Don't change pauth features when changing algorithm

2023-08-21 Thread Richard Henderson

We have cpu properties to adjust the pauth algorithm for the
purpose of speed of emulation.  Retain the set of pauth features
supported by the cpu even as the algorithm changes.

This already affects the neoverse-v1 cpu, which has FEAT_EPAC.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu64.c | 70 +++---
 target/arm/tcg/cpu64.c |  2 ++
 2 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 96158093cc..fd584a31da 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -473,37 +473,57 @@ void aarch64_add_sme_properties(Object *obj)
 
 void arm_cpu_pauth_finalize(ARMCPU *cpu, Error **errp)
 {
-int arch_val = 0, impdef_val = 0;
-uint64_t t;
+ARMPauthFeature features = cpu_isar_feature(pauth_feature, cpu);
+uint64_t isar1;
 
-/* Exit early if PAuth is enabled, and fall through to disable it */
-if ((kvm_enabled() || hvf_enabled()) && cpu->prop_pauth) {
-if (!cpu_isar_feature(aa64_pauth, cpu)) {
-error_setg(errp, "'pauth' feature not supported by %s on this 
host",
-   kvm_enabled() ? "KVM" : "hvf");
+/*
+ * These properties enable or disable Pauth as a whole, or change
+ * the pauth algorithm, but do not change the set of features that
+ * are present.  We have saved a copy of those features above and
+ * will now place it into the field that chooses the algorithm.
+ *
+ * Begin by disabling all fields.
+ */
+isar1 = cpu->isar.id_aa64isar1;
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, 0);
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 0);
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, 0);
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 0);
+
+if (kvm_enabled() || hvf_enabled()) {
+/*
+ * Exit early if PAuth is enabled and fall through to disable it.
+ * The algorithm selection properties are not present.
+ */
+if (cpu->prop_pauth) {
+if (features == 0) {
+error_setg(errp, "'pauth' feature not supported by "
+   "%s on this host", current_accel_name());
+}
+return;
+}
+} else {
+/* Pauth properties are only present when the model supports it. */
+if (features == 0) {
+assert(!cpu->prop_pauth);
+return;
 }
 
-return;
-}
-
-/* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */
-if (cpu->prop_pauth) {
-if (cpu->prop_pauth_impdef) {
-impdef_val = 1;
-} else {
-arch_val = 1;
+if (cpu->prop_pauth) {
+if (cpu->prop_pauth_impdef) {
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, API, features);
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPI, 1);
+} else {
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, APA, features);
+isar1 = FIELD_DP64(isar1, ID_AA64ISAR1, GPA, 1);
+}
+} else if (cpu->prop_pauth_impdef) {
+error_setg(errp, "cannot enable pauth-impdef without pauth");
+error_append_hint(errp, "Add pauth=on to the CPU property 
list.\n");
 }
-} else if (cpu->prop_pauth_impdef) {
-error_setg(errp, "cannot enable pauth-impdef without pauth");
-error_append_hint(errp, "Add pauth=on to the CPU property list.\n");
 }
 
-t = cpu->isar.id_aa64isar1;
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPA, arch_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, API, impdef_val);
-t = FIELD_DP64(t, ID_AA64ISAR1, GPI, impdef_val);
-cpu->isar.id_aa64isar1 = t;
+cpu->isar.id_aa64isar1 = isar1;
 }
 
 static Property arm_cpu_pauth_property =
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 8019f00bc3..fec6a4875d 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -758,6 +758,8 @@ void aarch64_max_tcg_initfn(Object *obj)
 
 t = cpu->isar.id_aa64isar1;
 t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);  /* FEAT_DPB2 */
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_1);
+t = FIELD_DP64(t, ID_AA64ISAR1, API, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);/* FEAT_JSCVT */
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1); /* FEAT_FCMA */
 t = FIELD_DP64(t, ID_AA64ISAR1, LRCPC, 2);/* FEAT_LRCPC2 */
-- 
2.34.1

[PATCH v4 6/9] target/arm: Implement FEAT_EPAC

2023-08-21 Thread Richard Henderson

From: Aaron Lindsay 

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-Id: <20230609172324.982888-5-aa...@os.amperecomputing.com>
Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/tcg/cpu64.c|  2 +-
 target/arm/tcg/pauth_helper.c | 16 +++-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 06af20d10f..4866a73ca0 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -27,6 +27,7 @@ the following architecture extensions:
 - FEAT_DotProd (Advanced SIMD dot product instructions)
 - FEAT_DoubleFault (Double Fault Extension)
 - FEAT_E0PD (Preventing EL0 access to halves of address maps)
+- FEAT_EPAC (Enhanced pointer authentication)
 - FEAT_ETS (Enhanced Translation Synchronization)
 - FEAT_EVT (Enhanced Virtualization Traps)
 - FEAT_FCMA (Floating-point complex number instructions)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index fec6a4875d..85bf94ee40 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -758,7 +758,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 
 t = cpu->isar.id_aa64isar1;
 t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);  /* FEAT_DPB2 */
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_1);
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_EPAC);
 t = FIELD_DP64(t, ID_AA64ISAR1, API, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);/* FEAT_JSCVT */
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1); /* FEAT_FCMA */
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index bb03409ee5..63e1009ea7 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -326,8 +326,10 @@ static uint64_t pauth_computepac(CPUARMState *env, 
uint64_t data,
 static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
  ARMPACKey *key, bool data)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data, false);
+ARMPauthFeature pauth_feature = cpu_isar_feature(pauth_feature, cpu);
 uint64_t pac, ext_ptr, ext, test;
 int bot_bit, top_bit;
 
@@ -351,11 +353,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-/*
- * Note that our top_bit is one greater than the pseudocode's
- * version, hence "- 2" here.
- */
-pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+if (pauth_feature == PauthFeat_EPAC) {
+pac = 0;
+} else {
+/*
+ * Note that our top_bit is one greater than the pseudocode's
+ * version, hence "- 2" here.
+ */
+pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+}
 }
 
 /*
-- 
2.34.1

[PATCH v4 7/9] target/arm: Implement FEAT_Pauth2

2023-08-21 Thread Richard Henderson

From: Aaron Lindsay 

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-Id: <20230609172324.982888-6-aa...@os.amperecomputing.com>
Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/tcg/cpu64.c|  2 +-
 target/arm/tcg/pauth_helper.c | 21 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 4866a73ca0..54234ac090 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -63,6 +63,7 @@ the following architecture extensions:
 - FEAT_PAN2 (AT S1E1R and AT S1E1W instruction variants affected by PSTATE.PAN)
 - FEAT_PAN3 (Support for SCTLR_ELx.EPAN)
 - FEAT_PAuth (Pointer authentication)
+- FEAT_PAuth2 (Enhacements to pointer authentication)
 - FEAT_PMULL (PMULL, PMULL2 instructions)
 - FEAT_PMUv3p1 (PMU Extensions v3.1)
 - FEAT_PMUv3p4 (PMU Extensions v3.4)
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 85bf94ee40..d3be14137e 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -758,7 +758,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 
 t = cpu->isar.id_aa64isar1;
 t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);  /* FEAT_DPB2 */
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_EPAC);
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_2);
 t = FIELD_DP64(t, ID_AA64ISAR1, API, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);/* FEAT_JSCVT */
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1); /* FEAT_FCMA */
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 63e1009ea7..b6aeb90548 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -353,7 +353,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-if (pauth_feature == PauthFeat_EPAC) {
+if (pauth_feature >= PauthFeat_2) {
+/* No action required */
+} else if (pauth_feature == PauthFeat_EPAC) {
 pac = 0;
 } else {
 /*
@@ -368,6 +370,9 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  * Preserve the determination between upper and lower at bit 55,
  * and insert pointer authentication code.
  */
+if (pauth_feature >= PauthFeat_2) {
+pac ^= ptr;
+}
 if (param.tbi) {
 ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
 pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
@@ -394,18 +399,26 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data, false);
+ARMPauthFeature pauth_feature = cpu_isar_feature(pauth_feature, cpu);
 int bot_bit, top_bit;
-uint64_t pac, orig_ptr, test;
+uint64_t pac, orig_ptr, cmp_mask;
 
 orig_ptr = pauth_original_ptr(ptr, param);
 pac = pauth_computepac(env, orig_ptr, modifier, *key);
 bot_bit = 64 - param.tsz;
 top_bit = 64 - 8 * param.tbi;
 
-test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
-if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+cmp_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit);
+cmp_mask &= ~MAKE_64BIT_MASK(55, 1);
+
+if (pauth_feature >= PauthFeat_2) {
+return ptr ^ (pac & cmp_mask);
+}
+
+if ((pac ^ ptr) & cmp_mask) {
 int error_code = (keynumber << 1) | (keynumber ^ 1);
 if (param.tbi) {
 return deposit64(orig_ptr, 53, 2, error_code);
-- 
2.34.1

[PATCH v4 0/9] Implement Most ARMv8.3 Pointer Authentication Features

2023-08-21 Thread Richard Henderson

This is an update of Aaron's v3 [1].
There are a fair number of changes beyond a mere rebase:

  * Updates to the test cases which fail with the new features.
  * Updates to the documentation.
  * Preserve pauth feature set when changing pauth algorithm.
  * Rearrange feature detection:
 - Add ARMPauthFeature
 - Use it with isar_feature_pauth_feature
 - Remove many of the isar predicates
The pauth_auth and pauth_addpac routines wind up making lots
of tests against the pauth feature set.  Rather than recompute
the feature set many times for each predicate, compute it once
and compare against the enumerators.
  * Algorithmic simplification of Pauth2 and FPAC using cmp_mask.

r~

[1] 
https://patchew.org/QEMU/20230322202541.1404058-1-aa...@os.amperecomputing.com/

Aaron Lindsay (6):
  target/arm: Add ID_AA64ISAR2_EL1
  target/arm: Add feature detection for FEAT_Pauth2 and extensions
  target/arm: Implement FEAT_EPAC
  target/arm: Implement FEAT_Pauth2
  targer/arm: Inform helpers whether a PAC instruction is 'combined'
  target/arm: Implement FEAT_FPAC and FEAT_FPACCOMBINE

Richard Henderson (3):
  tests/tcg/aarch64: Adjust pauth tests for FEAT_FPAC
  target/arm: Don't change pauth features when changing algorithm
  target/arm: Implement FEAT_PACQARMA3

 docs/system/arm/cpu-features.rst  |  21 ++--
 docs/system/arm/emulation.rst |   7 ++
 target/arm/cpu.h  |  51 +++--
 target/arm/syndrome.h |   7 ++
 target/arm/tcg/helper-a64.h   |   4 +
 target/arm/arm-qmp-cmds.c |   2 +-
 target/arm/cpu64.c|  86 ++
 target/arm/helper.c   |   4 +-
 target/arm/hvf/hvf.c  |   1 +
 target/arm/kvm64.c|   2 +
 target/arm/tcg/cpu64.c|   2 +
 target/arm/tcg/pauth_helper.c | 180 --
 target/arm/tcg/translate-a64.c|  12 +-
 tests/qtest/arm-cpu-features.c|  12 +-
 tests/tcg/aarch64/pauth-2.c   |  61 --
 tests/tcg/aarch64/pauth-4.c   |  28 -
 tests/tcg/aarch64/pauth-5.c   |  20 
 tests/tcg/aarch64/Makefile.target |   5 +-
 18 files changed, 409 insertions(+), 96 deletions(-)

-- 
2.34.1

[PATCH v4 9/9] target/arm: Implement FEAT_FPAC and FEAT_FPACCOMBINE

2023-08-21 Thread Richard Henderson

From: Aaron Lindsay 

Signed-off-by: Aaron Lindsay 
Reviewed-by: Richard Henderson 
Message-Id: <20230609172324.982888-8-aa...@os.amperecomputing.com>
[rth: Simplify fpac comparison, reusing cmp_mask]
Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst |  2 ++
 target/arm/syndrome.h |  7 +++
 target/arm/tcg/cpu64.c|  2 +-
 target/arm/tcg/pauth_helper.c | 18 +-
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 54234ac090..8be04edbcc 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -34,6 +34,8 @@ the following architecture extensions:
 - FEAT_FGT (Fine-Grained Traps)
 - FEAT_FHM (Floating-point half-precision multiplication instructions)
 - FEAT_FP16 (Half-precision floating-point data processing)
+- FEAT_FPAC (Faulting on AUT* instructions)
+- FEAT_FPACCOMBINE (Faulting on combined pointer authentication instructions)
 - FEAT_FRINTTS (Floating-point to integer instructions)
 - FEAT_FlagM (Flag manipulation instructions v2)
 - FEAT_FlagM2 (Enhancements to flag manipulation instructions)
diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index 62254d0e51..8a6b8f8162 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -49,6 +49,7 @@ enum arm_exception_class {
 EC_SYSTEMREGISTERTRAP = 0x18,
 EC_SVEACCESSTRAP  = 0x19,
 EC_ERETTRAP   = 0x1a,
+EC_PACFAIL= 0x1c,
 EC_SMETRAP= 0x1d,
 EC_GPC= 0x1e,
 EC_INSNABORT  = 0x20,
@@ -232,6 +233,12 @@ static inline uint32_t syn_smetrap(SMEExceptionType etype, 
bool is_16bit)
 | (is_16bit ? 0 : ARM_EL_IL) | etype;
 }
 
+static inline uint32_t syn_pacfail(bool data, int keynumber)
+{
+int error_code = (data << 1) | keynumber;
+return (EC_PACFAIL << ARM_EL_EC_SHIFT) | ARM_EL_IL | error_code;
+}
+
 static inline uint32_t syn_pactrap(void)
 {
 return EC_PACTRAP << ARM_EL_EC_SHIFT;
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index d3be14137e..7734058bb1 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -758,7 +758,7 @@ void aarch64_max_tcg_initfn(Object *obj)
 
 t = cpu->isar.id_aa64isar1;
 t = FIELD_DP64(t, ID_AA64ISAR1, DPB, 2);  /* FEAT_DPB2 */
-t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_2);
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, PauthFeat_FPACCOMBINED);
 t = FIELD_DP64(t, ID_AA64ISAR1, API, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, JSCVT, 1);/* FEAT_JSCVT */
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1); /* FEAT_FCMA */
diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index c05c5b30ff..4da2962ad5 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -396,6 +396,14 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 }
 }
 
+static G_NORETURN
+void pauth_fail_exception(CPUARMState *env, bool data,
+  int keynumber, uintptr_t ra)
+{
+raise_exception_ra(env, EXCP_UDEF, syn_pacfail(data, keynumber),
+   exception_target_el(env), ra);
+}
+
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber,
uintptr_t ra, bool is_combined)
@@ -416,7 +424,15 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, 
uint64_t modifier,
 cmp_mask &= ~MAKE_64BIT_MASK(55, 1);
 
 if (pauth_feature >= PauthFeat_2) {
-return ptr ^ (pac & cmp_mask);
+ARMPauthFeature fault_feature =
+is_combined ? PauthFeat_FPACCOMBINED : PauthFeat_FPAC;
+uint64_t result = ptr ^ (pac & cmp_mask);
+
+if (pauth_feature >= fault_feature
+&& ((result ^ sextract64(result, 55, 1)) & cmp_mask)) {
+pauth_fail_exception(env, data, keynumber, ra);
+}
+return result;
 }
 
 if ((pac ^ ptr) & cmp_mask) {
-- 
2.34.1

[PATCH v4 1/9] tests/tcg/aarch64: Adjust pauth tests for FEAT_FPAC

2023-08-21 Thread Richard Henderson

With FEAT_FPAC, AUT* instructions that fail authentication
do not produce an error value but instead fault.

For pauth-2, install a signal handler and verify it gets called.

For pauth-4 and pauth-5, we are explicitly testing the error value,
so there's nothing to test with FEAT_FPAC, so exit early.
Adjust the makefile to use -cpu neoverse-v1, which has FEAT_EPAC
but not FEAT_FPAC.

Signed-off-by: Richard Henderson 
---
 tests/tcg/aarch64/pauth-2.c   | 61 +++
 tests/tcg/aarch64/pauth-4.c   | 28 --
 tests/tcg/aarch64/pauth-5.c   | 20 ++
 tests/tcg/aarch64/Makefile.target |  5 ++-
 4 files changed, 101 insertions(+), 13 deletions(-)

diff --git a/tests/tcg/aarch64/pauth-2.c b/tests/tcg/aarch64/pauth-2.c
index 978652ede3..d498d7dd8b 100644
--- a/tests/tcg/aarch64/pauth-2.c
+++ b/tests/tcg/aarch64/pauth-2.c
@@ -1,5 +1,21 @@
 #include 
+#include 
+#include 
 #include 
+#include 
+
+static void sigill(int sig, siginfo_t *info, void *vuc)
+{
+ucontext_t *uc = vuc;
+uint64_t test;
+
+/* There is only one insn below that is allowed to fault. */
+asm volatile("adr %0, auth2_insn" : "=r"(test));
+assert(test == uc->uc_mcontext.pc);
+exit(0);
+}
+
+static int pac_feature;
 
 void do_test(uint64_t value)
 {
@@ -27,31 +43,60 @@ void do_test(uint64_t value)
  * An invalid salt usually fails authorization, but again there
  * is a chance of choosing another salt that works.
  * Iterate until we find another salt which does fail.
+ *
+ * With FEAT_FPAC, this will SIGILL instead of producing a result.
  */
 for (salt2 = salt1 + 1; ; salt2++) {
-asm volatile("autda %0, %2" : "=r"(decode) : "0"(encode), "r"(salt2));
+asm volatile("auth2_insn: autda %0, %2"
+ : "=r"(decode) : "0"(encode), "r"(salt2));
 if (decode != value) {
 break;
 }
 }
 
+assert(pac_feature < 4);  /* No FEAT_FPAC */
+
 /* The VA bits, bit 55, and the TBI bits, should be unchanged.  */
 assert(((decode ^ value) & 0xff80ull) == 0);
 
 /*
- * Bits [54:53] are an error indicator based on the key used;
- * the DA key above is keynumber 0, so error == 0b01.  Otherwise
- * bit 55 of the original is sign-extended into the rest of the auth.
+ * Without FEAT_Pauth2, bits [54:53] are an error indicator based on
+ * the key used; the DA key above is keynumber 0, so error == 0b01.
+ * Otherwise * bit 55 of the original is sign-extended into the rest
+ * of the auth.
  */
-if ((value >> 55) & 1) {
-assert(((decode >> 48) & 0xff) == 0b1011);
-} else {
-assert(((decode >> 48) & 0xff) == 0b0010);
+if (pac_feature < 3) {
+if ((value >> 55) & 1) {
+assert(((decode >> 48) & 0xff) == 0b1011);
+} else {
+assert(((decode >> 48) & 0xff) == 0b0010);
+}
 }
 }
 
 int main()
 {
+static const struct sigaction sa = {
+.sa_sigaction = sigill,
+.sa_flags = SA_SIGINFO
+};
+unsigned long isar1, isar2;
+
+assert(getauxval(AT_HWCAP) & HWCAP_CPUID);
+
+asm("mrs %0, id_aa64isar1_el1" : "=r"(isar1));
+asm("mrs %0, id_aa64isar2_el1" : "=r"(isar2));
+
+pac_feature = ((isar1 >> 4) & 0xf)   /* APA */
+| ((isar1 >> 8) & 0xf)   /* API */
+| ((isar2 >> 12) & 0xf); /* APA3 */
+assert(pac_feature != 0);
+
+if (pac_feature >= 4) {
+/* FEAT_FPAC */
+sigaction(SIGILL, &sa, NULL);
+}
+
 do_test(0);
 do_test(0xda004acedeadbeefull);
 return 0;
diff --git a/tests/tcg/aarch64/pauth-4.c b/tests/tcg/aarch64/pauth-4.c
index 24a639e36c..0d79ef21ea 100644
--- a/tests/tcg/aarch64/pauth-4.c
+++ b/tests/tcg/aarch64/pauth-4.c
@@ -2,14 +2,34 @@
 #include 
 #include 
 #include 
+#include 
 
 #define TESTS 1000
 
 int main()
 {
+char base[TESTS];
 int i, count = 0;
 float perc;
-void *base = malloc(TESTS);
+unsigned long isar1, isar2;
+int pac_feature;
+
+assert(getauxval(AT_HWCAP) & HWCAP_CPUID);
+
+asm("mrs %0, id_aa64isar1_el1" : "=r"(isar1));
+asm("mrs %0, id_aa64isar2_el1" : "=r"(isar2));
+
+pac_feature = ((isar1 >> 4) & 0xf)   /* APA */
+| ((isar1 >> 8) & 0xf)   /* API */
+| ((isar2 >> 12) & 0xf); /* APA3 */
+
+/*
+ * Exit if no PAuth or FEAT_FPAC, which will SIGILL on AUTIA failure
+ * rather than return an error for us to check below.
+ */
+if (pac_feature == 0 || pac_feature >= 4) {
+return 0;
+}
 
 for (i = 0; i < TESTS; i++) {
 uintptr_t in, x, y;
@@ -17,7 +37,7 @@ int main()
 in = i + (uintptr_t) base;
 
 asm("mov %0, %[in]\n\t"
-"pacia %0, sp\n\t"/* sigill if pauth not supported */
+"pacia %0, sp\n\t"
 "eor %0, %0, #4\n\t"  /* corrupt single bit */
 "mov %1, %0\n\t"

Re: [PATCH v3 0/8] target/loongarch: Cleanups in preparation of loongarch32 support

2023-08-21 Thread gaosong


在 2023/8/21 下午8:59, Philippe Mathieu-Daudé 写道:

Series fully reviewed.

v3:
- Do not rename loongarch_la464_initfn (rth)
- Added R-b

v2:
- Do no rename loongarch_cpu_get/set_pc (rth)
- Rebased Jiajie's patches for convenience
- Added rth's R-b

Jiajie, this series contains few notes I took while
reviewing your series adding loongarch32 support [*].

If your series isn't merged, consider rebasing it on
this one.

Regards,

Phil.

[*] 
https://lore.kernel.org/qemu-devel/20230817093121.1053890-1-gaos...@loongson.cn/

Jiajie Chen (3):
   target/loongarch: Add function to check current arch
   target/loongarch: Add new object class for loongarch32 cpus
   target/loongarch: Add GDB support for loongarch32 mode

Philippe Mathieu-Daudé (4):
   target/loongarch: Log I/O write accesses to CSR registers
   target/loongarch: Remove duplicated disas_set_info assignment
   target/loongarch: Introduce abstract TYPE_LOONGARCH64_CPU
   target/loongarch: Extract 64-bit specifics to
 loongarch64_cpu_class_init

Song Gao (1):
   target/loongarch: Fix loongarch_la464_initfn() misses setting LSPW

  configs/targets/loongarch64-softmmu.mak |  2 +-
  target/loongarch/cpu.h  | 12 +
  target/loongarch/cpu.c  | 60 -
  target/loongarch/gdbstub.c  | 32 ++---
  gdb-xml/loongarch-base32.xml| 45 +++
  5 files changed, 131 insertions(+), 20 deletions(-)
  create mode 100644 gdb-xml/loongarch-base32.xml


Thanks!

Applied to loongarch-to-apply queue

Thanks.
Song Gao

[PATCH v4 05/15] target/loongarch: Truncate high 32 bits of address in VA32 mode

2023-08-21 Thread Song Gao

From: Jiajie Chen 

When running in VA32 mode(!LA64 or VA32L[1-3] matching PLV), virtual
address is truncated to 32 bits before address mapping.

Signed-off-by: Jiajie Chen 
Co-authored-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.h|  9 +
 target/loongarch/cpu.c| 16 
 target/loongarch/gdbstub.c|  2 +-
 target/loongarch/op_helper.c  |  4 +-
 target/loongarch/translate.c  | 32 
 .../loongarch/insn_trans/trans_atomic.c.inc   |  5 ++-
 .../loongarch/insn_trans/trans_branch.c.inc   |  3 +-
 .../loongarch/insn_trans/trans_fmemory.c.inc  | 30 ---
 target/loongarch/insn_trans/trans_lsx.c.inc   | 38 +--
 .../loongarch/insn_trans/trans_memory.c.inc   | 34 +
 10 files changed, 85 insertions(+), 88 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 72109095e4..25a0ef7e41 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -443,6 +443,15 @@ static inline bool is_va32(CPULoongArchState *env)
 return va32;
 }
 
+static inline void set_pc(CPULoongArchState *env, uint64_t value)
+{
+if (is_va32(env)) {
+env->pc = (uint32_t)value;
+} else {
+env->pc = value;
+}
+}
+
 /*
  * LoongArch CPUs hardware flags.
  */
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 822f2a72e5..67eb6c3135 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -81,7 +81,7 @@ static void loongarch_cpu_set_pc(CPUState *cs, vaddr value)
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = &cpu->env;
 
-env->pc = value;
+set_pc(env, value);
 }
 
 static vaddr loongarch_cpu_get_pc(CPUState *cs)
@@ -168,7 +168,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 set_DERA:
 env->CSR_DERA = env->pc;
 env->CSR_DBG = FIELD_DP64(env->CSR_DBG, CSR_DBG, DST, 1);
-env->pc = env->CSR_EENTRY + 0x480;
+set_pc(env, env->CSR_EENTRY + 0x480);
 break;
 case EXCCODE_INT:
 if (FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) {
@@ -249,7 +249,8 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
 
 /* Find the highest-priority interrupt. */
 vector = 31 - clz32(pending);
-env->pc = env->CSR_EENTRY + (EXCCODE_EXTERNAL_INT + vector) * vec_size;
+set_pc(env, env->CSR_EENTRY + \
+   (EXCCODE_EXTERNAL_INT + vector) * vec_size);
 qemu_log_mask(CPU_LOG_INT,
   "%s: PC " TARGET_FMT_lx " ERA " TARGET_FMT_lx
   " cause %d\n" "A " TARGET_FMT_lx " D "
@@ -260,10 +261,9 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
   env->CSR_ECFG, env->CSR_ESTAT);
 } else {
 if (tlbfill) {
-env->pc = env->CSR_TLBRENTRY;
+set_pc(env, env->CSR_TLBRENTRY);
 } else {
-env->pc = env->CSR_EENTRY;
-env->pc += EXCODE_MCODE(cause) * vec_size;
+set_pc(env, env->CSR_EENTRY + EXCODE_MCODE(cause) * vec_size);
 }
 qemu_log_mask(CPU_LOG_INT,
   "%s: PC " TARGET_FMT_lx " ERA " TARGET_FMT_lx
@@ -324,7 +324,7 @@ static void loongarch_cpu_synchronize_from_tb(CPUState *cs,
 CPULoongArchState *env = &cpu->env;
 
 tcg_debug_assert(!(cs->tcg_cflags & CF_PCREL));
-env->pc = tb->pc;
+set_pc(env, tb->pc);
 }
 
 static void loongarch_restore_state_to_opc(CPUState *cs,
@@ -334,7 +334,7 @@ static void loongarch_restore_state_to_opc(CPUState *cs,
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = &cpu->env;
 
-env->pc = data[0];
+set_pc(env, data[0]);
 }
 #endif /* CONFIG_TCG */
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index a462e25737..e20b20f99b 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -77,7 +77,7 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 env->gpr[n] = tmp;
 length = read_length;
 } else if (n == 33) {
-env->pc = tmp;
+set_pc(env, tmp);
 length = read_length;
 }
 return length;
diff --git a/target/loongarch/op_helper.c b/target/loongarch/op_helper.c
index 60335a05e2..cf84f20aba 100644
--- a/target/loongarch/op_helper.c
+++ b/target/loongarch/op_helper.c
@@ -114,14 +114,14 @@ void helper_ertn(CPULoongArchState *env)
 env->CSR_TLBRERA = FIELD_DP64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR, 
0);
 env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, DA, 0);
 env->CSR_CRMD = FIELD_DP64(env->CSR_CRMD, CSR_CRMD, PG, 1);
-env->pc = env->CSR_TLBRERA;
+set_pc(env, env->CSR_TLBRERA);
 qemu_log_mask(CPU_LOG_INT, "%s: TLBRERA " TARGET_FMT_lx "\n",
   __func__, env->CSR_TLBRERA);
 } else {
 csr_pplv = FIELD_EX64(

[PATCH v4 06/15] target/loongarch: Sign extend results in VA32 mode

2023-08-21 Thread Song Gao

From: Jiajie Chen 

In VA32 mode, BL, JIRL and PC* instructions should sign-extend the low
32 bit result to 64 bits.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/translate.c   | 8 
 target/loongarch/insn_trans/trans_arith.c.inc  | 2 +-
 target/loongarch/insn_trans/trans_branch.c.inc | 4 ++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 689da19ed0..de7c1c5d1f 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -236,6 +236,14 @@ static TCGv make_address_i(DisasContext *ctx, TCGv base, 
target_long ofs)
 return make_address_x(ctx, base, addend);
 }
 
+static uint64_t make_address_pc(DisasContext *ctx, uint64_t addr)
+{
+if (ctx->va32) {
+addr = (int32_t)addr;
+}
+return addr;
+}
+
 #include "decode-insns.c.inc"
 #include "insn_trans/trans_arith.c.inc"
 #include "insn_trans/trans_shift.c.inc"
diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index 43d6cf261d..2aea4e41d5 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -72,7 +72,7 @@ static bool gen_pc(DisasContext *ctx, arg_r_i *a,
target_ulong (*func)(target_ulong, int))
 {
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
-target_ulong addr = func(ctx->base.pc_next, a->imm);
+target_ulong addr = make_address_pc(ctx, func(ctx->base.pc_next, a->imm));
 
 tcg_gen_movi_tl(dest, addr);
 gen_set_gpr(a->rd, dest, EXT_NONE);
diff --git a/target/loongarch/insn_trans/trans_branch.c.inc 
b/target/loongarch/insn_trans/trans_branch.c.inc
index 3ad34bcc05..2e35572cea 100644
--- a/target/loongarch/insn_trans/trans_branch.c.inc
+++ b/target/loongarch/insn_trans/trans_branch.c.inc
@@ -12,7 +12,7 @@ static bool trans_b(DisasContext *ctx, arg_b *a)
 
 static bool trans_bl(DisasContext *ctx, arg_bl *a)
 {
-tcg_gen_movi_tl(cpu_gpr[1], ctx->base.pc_next + 4);
+tcg_gen_movi_tl(cpu_gpr[1], make_address_pc(ctx, ctx->base.pc_next + 4));
 gen_goto_tb(ctx, 0, ctx->base.pc_next + a->offs);
 ctx->base.is_jmp = DISAS_NORETURN;
 return true;
@@ -25,7 +25,7 @@ static bool trans_jirl(DisasContext *ctx, arg_jirl *a)
 
 TCGv addr = make_address_i(ctx, src1, a->imm);
 tcg_gen_mov_tl(cpu_pc, addr);
-tcg_gen_movi_tl(dest, ctx->base.pc_next + 4);
+tcg_gen_movi_tl(dest, make_address_pc(ctx, ctx->base.pc_next + 4));
 gen_set_gpr(a->rd, dest, EXT_NONE);
 tcg_gen_lookup_and_goto_ptr();
 ctx->base.is_jmp = DISAS_NORETURN;
-- 
2.39.1

[PATCH v4 01/15] target/loongarch: Support LoongArch32 TLB entry

2023-08-21 Thread Song Gao

From: Jiajie Chen 

The TLB entry of LA32 lacks NR, NX and RPLV and they are hardwired to
zero in LoongArch32.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu-csr.h|  9 +
 target/loongarch/tlb_helper.c | 17 -
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index f8f24032cb..48ed2e0632 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -66,10 +66,11 @@ FIELD(TLBENTRY, D, 1, 1)
 FIELD(TLBENTRY, PLV, 2, 2)
 FIELD(TLBENTRY, MAT, 4, 2)
 FIELD(TLBENTRY, G, 6, 1)
-FIELD(TLBENTRY, PPN, 12, 36)
-FIELD(TLBENTRY, NR, 61, 1)
-FIELD(TLBENTRY, NX, 62, 1)
-FIELD(TLBENTRY, RPLV, 63, 1)
+FIELD(TLBENTRY_32, PPN, 8, 24)
+FIELD(TLBENTRY_64, PPN, 12, 36)
+FIELD(TLBENTRY_64, NR, 61, 1)
+FIELD(TLBENTRY_64, NX, 62, 1)
+FIELD(TLBENTRY_64, RPLV, 63, 1)
 
 #define LOONGARCH_CSR_ASID   0x18 /* Address space identifier */
 FIELD(CSR_ASID, ASID, 0, 10)
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index 6e00190547..cef10e2257 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -48,10 +48,17 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, 
hwaddr *physical,
 tlb_v = FIELD_EX64(tlb_entry, TLBENTRY, V);
 tlb_d = FIELD_EX64(tlb_entry, TLBENTRY, D);
 tlb_plv = FIELD_EX64(tlb_entry, TLBENTRY, PLV);
-tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY, PPN);
-tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY, NX);
-tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY, NR);
-tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY, RPLV);
+if (is_la64(env)) {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_64, PPN);
+tlb_nx = FIELD_EX64(tlb_entry, TLBENTRY_64, NX);
+tlb_nr = FIELD_EX64(tlb_entry, TLBENTRY_64, NR);
+tlb_rplv = FIELD_EX64(tlb_entry, TLBENTRY_64, RPLV);
+} else {
+tlb_ppn = FIELD_EX64(tlb_entry, TLBENTRY_32, PPN);
+tlb_nx = 0;
+tlb_nr = 0;
+tlb_rplv = 0;
+}
 
 /* Check access rights */
 if (!tlb_v) {
@@ -79,7 +86,7 @@ static int loongarch_map_tlb_entry(CPULoongArchState *env, 
hwaddr *physical,
  * tlb_entry contains ppn[47:12] while 16KiB ppn is [47:15]
  * need adjust.
  */
-*physical = (tlb_ppn << R_TLBENTRY_PPN_SHIFT) |
+*physical = (tlb_ppn << R_TLBENTRY_64_PPN_SHIFT) |
 (address & MAKE_64BIT_MASK(0, tlb_ps));
 *prot = PAGE_READ;
 if (tlb_d) {
-- 
2.39.1

[PATCH v4 04/15] target/loongarch: Add LA64 & VA32 to DisasContext

2023-08-21 Thread Song Gao

From: Jiajie Chen 

Add LA64 and VA32(32-bit Virtual Address) to DisasContext to allow the
translator to reject doubleword instructions in LA32 mode for example.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.h   | 13 +
 target/loongarch/translate.h |  2 ++
 target/loongarch/translate.c |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index b8af491041..72109095e4 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -432,6 +432,17 @@ static inline bool is_la64(CPULoongArchState *env)
 return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64;
 }
 
+static inline bool is_va32(CPULoongArchState *env)
+{
+/* VA32 if !LA64 or VA32L[1-3] */
+bool va32 = !is_la64(env);
+uint64_t plv = FIELD_EX64(env->CSR_CRMD, CSR_CRMD, PLV);
+if (plv >= 1 && (FIELD_EX64(env->CSR_MISC, CSR_MISC, VA32) & (1 << plv))) {
+va32 = true;
+}
+return va32;
+}
+
 /*
  * LoongArch CPUs hardware flags.
  */
@@ -439,6 +450,7 @@ static inline bool is_la64(CPULoongArchState *env)
 #define HW_FLAGS_CRMD_PGR_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
 #define HW_FLAGS_EUEN_SXE   0x08
+#define HW_FLAGS_VA32   0x20
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
 uint64_t *cs_base, uint32_t *flags)
@@ -448,6 +460,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState 
*env, vaddr *pc,
 *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
 *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
+*flags |= is_va32(env) * HW_FLAGS_VA32;
 }
 
 void loongarch_cpu_list(void);
diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 7f60090580..b6fa5df82d 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -33,6 +33,8 @@ typedef struct DisasContext {
 uint16_t plv;
 int vl;   /* Vector length */
 TCGv zero;
+bool la64; /* LoongArch64 mode */
+bool va32; /* 32-bit virtual address */
 } DisasContext;
 
 void generate_exception(DisasContext *ctx, int excp);
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 3146a2d4ac..ac847745df 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -119,6 +119,9 @@ static void 
loongarch_tr_init_disas_context(DisasContextBase *dcbase,
 ctx->vl = LSX_LEN;
 }
 
+ctx->la64 = is_la64(env);
+ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0;
+
 ctx->zero = tcg_constant_tl(0);
 }
 
-- 
2.39.1

[PATCH v4 00/15] Add some checks before translating instructions

2023-08-21 Thread Song Gao

Based-on: https://patchew.org/QEMU/20230821125959.28666-1-phi...@linaro.org/

Hi,

This series adds some checks before translating instructions

This includes:

CPUCFG[1].IOCSR

CPUCFG[2].FP
CPUCFG[2].FP_SP
CPUCFG[2].FP_DP
CPUCFG[2].LSPW
CPUCFG[2].LAM
CPUCFG[2].LSX


And this series adds [1] patches together.

Patch 9,10 need review.

V4:
- Rebase;
- Split patch 'Add LoongArch32 cpu la132' in two patch; (PMD)
- Remove unrelated cpucfgX;(PMD)
- R-b.

V3:
- Rebase;
- The la32 instructions following Table 2 at [2].

V2:
- Add a check parameter to the TRANS macro.
- remove TRANS_64.
- Add avail_ALL/64/FP/FP_SP/FP_DP/LSPW/LAM/LSX/IOCSR
  to check instructions.

[1]: https://patchew.org/QEMU/20230809083258.1787464-...@jia.je/
[2]: 
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#overview-of-basic-integer-instructions


Jiajie Chen (7):
  target/loongarch: Support LoongArch32 TLB entry
  target/loongarch: Support LoongArch32 DMW
  target/loongarch: Support LoongArch32 VPPN
  target/loongarch: Add LA64 & VA32 to DisasContext
  target/loongarch: Truncate high 32 bits of address in VA32 mode
  target/loongarch: Sign extend results in VA32 mode
  target/loongarch: Add LoongArch32 cpu la132

Song Gao (8):
  target/loongarch: Add a check parameter to the TRANS macro
  target/loongarch: Add avail_64 to check la64-only instructions
  hw/loongarch: Remove restriction of la464 cores in the virt machine
  target/loongarch: Add avail_FP/FP_SP/FP_DP to check fpu instructions
  target/loongarch: Add avail_LSPW to check LSPW instructions
  target/loongarch: Add avail_LAM to check atomic instructions
  target/loongarch: Add avail_LSX to check LSX instructions
  target/loongarch: Add avail_IOCSR to check iocsr instructions

 target/loongarch/cpu-csr.h|   22 +-
 target/loongarch/cpu.h|   22 +
 target/loongarch/translate.h  |   19 +-
 hw/loongarch/virt.c   |5 -
 target/loongarch/cpu.c|   46 +-
 target/loongarch/gdbstub.c|2 +-
 target/loongarch/op_helper.c  |4 +-
 target/loongarch/tlb_helper.c |   66 +-
 target/loongarch/translate.c  |   46 +
 target/loongarch/insn_trans/trans_arith.c.inc |   98 +-
 .../loongarch/insn_trans/trans_atomic.c.inc   |   85 +-
 target/loongarch/insn_trans/trans_bit.c.inc   |   56 +-
 .../loongarch/insn_trans/trans_branch.c.inc   |   27 +-
 target/loongarch/insn_trans/trans_extra.c.inc |   24 +-
 .../loongarch/insn_trans/trans_farith.c.inc   |   96 +-
 target/loongarch/insn_trans/trans_fcmp.c.inc  |8 +
 target/loongarch/insn_trans/trans_fcnv.c.inc  |   56 +-
 .../loongarch/insn_trans/trans_fmemory.c.inc  |   62 +-
 target/loongarch/insn_trans/trans_fmov.c.inc  |   52 +-
 target/loongarch/insn_trans/trans_lsx.c.inc   | 1520 +
 .../loongarch/insn_trans/trans_memory.c.inc   |  118 +-
 .../insn_trans/trans_privileged.c.inc |   24 +-
 target/loongarch/insn_trans/trans_shift.c.inc |   34 +-
 23 files changed, 1429 insertions(+), 1063 deletions(-)

-- 
2.39.1

[PATCH v4 08/15] target/loongarch: Add avail_64 to check la64-only instructions

2023-08-21 Thread Song Gao

The la32 instructions listed in Table 2 at
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#overview-of-basic-integer-instructions

Co-authored-by: Jiajie Chen 
Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/translate.h  |  3 +
 target/loongarch/translate.c  |  2 +
 target/loongarch/insn_trans/trans_arith.c.inc | 42 ++
 .../loongarch/insn_trans/trans_atomic.c.inc   | 76 +--
 target/loongarch/insn_trans/trans_bit.c.inc   | 28 +++
 .../loongarch/insn_trans/trans_branch.c.inc   |  4 +-
 target/loongarch/insn_trans/trans_extra.c.inc | 24 --
 target/loongarch/insn_trans/trans_fmov.c.inc  |  4 +-
 .../loongarch/insn_trans/trans_memory.c.inc   | 68 -
 target/loongarch/insn_trans/trans_shift.c.inc | 24 +++---
 10 files changed, 152 insertions(+), 123 deletions(-)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 3c5c746f30..1342446242 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -15,6 +15,8 @@
 { return avail_##AVAIL(ctx) && FUNC(ctx, a, __VA_ARGS__); }
 
 #define avail_ALL(C)   true
+#define avail_64(C)(FIELD_EX32((C)->cpucfg1, CPUCFG1, ARCH) == \
+CPUCFG1_ARCH_LA64)
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
@@ -37,6 +39,7 @@ typedef struct DisasContext {
 TCGv zero;
 bool la64; /* LoongArch64 mode */
 bool va32; /* 32-bit virtual address */
+uint32_t cpucfg1;
 } DisasContext;
 
 void generate_exception(DisasContext *ctx, int excp);
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index de7c1c5d1f..6967e12fc3 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -127,6 +127,8 @@ static void 
loongarch_tr_init_disas_context(DisasContextBase *dcbase,
 ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0;
 
 ctx->zero = tcg_constant_tl(0);
+
+ctx->cpucfg1 = env->cpucfg[1];
 }
 
 static void loongarch_tr_tb_start(DisasContextBase *dcbase, CPUState *cs)
diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index d7f69a7553..2be057e932 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -199,6 +199,10 @@ static bool trans_lu32i_d(DisasContext *ctx, arg_lu32i_d 
*a)
 TCGv src1 = gpr_src(ctx, a->rd, EXT_NONE);
 TCGv src2 = tcg_constant_tl(a->imm);
 
+if (!avail_64(ctx)) {
+return false;
+}
+
 tcg_gen_deposit_tl(dest, src1, src2, 32, 32);
 gen_set_gpr(a->rd, dest, EXT_NONE);
 
@@ -211,6 +215,10 @@ static bool trans_lu52i_d(DisasContext *ctx, arg_lu52i_d 
*a)
 TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
 TCGv src2 = tcg_constant_tl(a->imm);
 
+if (!avail_64(ctx)) {
+return false;
+}
+
 tcg_gen_deposit_tl(dest, src1, src2, 52, 12);
 gen_set_gpr(a->rd, dest, EXT_NONE);
 
@@ -242,6 +250,10 @@ static bool trans_addu16i_d(DisasContext *ctx, 
arg_addu16i_d *a)
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
 TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
 
+if (!avail_64(ctx)) {
+return false;
+}
+
 tcg_gen_addi_tl(dest, src1, a->imm << 16);
 gen_set_gpr(a->rd, dest, EXT_NONE);
 
@@ -249,9 +261,9 @@ static bool trans_addu16i_d(DisasContext *ctx, 
arg_addu16i_d *a)
 }
 
 TRANS(add_w, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)
-TRANS(add_d, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
+TRANS(add_d, 64, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
 TRANS(sub_w, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_sub_tl)
-TRANS(sub_d, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
+TRANS(sub_d, 64, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
 TRANS(and, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_and_tl)
 TRANS(or, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_or_tl)
 TRANS(xor, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_xor_tl)
@@ -261,32 +273,32 @@ TRANS(orn, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, 
tcg_gen_orc_tl)
 TRANS(slt, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_slt)
 TRANS(sltu, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_sltu)
 TRANS(mul_w, ALL, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, tcg_gen_mul_tl)
-TRANS(mul_d, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
+TRANS(mul_d, 64, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
 TRANS(mulh_w, ALL, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, gen_mulh_w)
 TRANS(mulh_wu, ALL, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, gen_mulh_w)
-TRANS(mulh_d, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d)
-TRANS(mulh_du, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du)
-TRANS(mulw_d_w, ALL, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl)
-TRANS(mulw_d_wu, ALL, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl)
+TRA

[PATCH v4 07/15] target/loongarch: Add a check parameter to the TRANS macro

2023-08-21 Thread Song Gao

The default check parmeter is ALL.

Suggested-by: Richard Henderson 
Signed-off-by: Song Gao 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 target/loongarch/translate.h  |6 +-
 target/loongarch/insn_trans/trans_arith.c.inc |   84 +-
 .../loongarch/insn_trans/trans_atomic.c.inc   |   80 +-
 target/loongarch/insn_trans/trans_bit.c.inc   |   56 +-
 .../loongarch/insn_trans/trans_branch.c.inc   |   20 +-
 target/loongarch/insn_trans/trans_extra.c.inc |   16 +-
 .../loongarch/insn_trans/trans_farith.c.inc   |   72 +-
 target/loongarch/insn_trans/trans_fcnv.c.inc  |   56 +-
 .../loongarch/insn_trans/trans_fmemory.c.inc  |   32 +-
 target/loongarch/insn_trans/trans_fmov.c.inc  |   16 +-
 target/loongarch/insn_trans/trans_lsx.c.inc   | 1322 -
 .../loongarch/insn_trans/trans_memory.c.inc   |   84 +-
 .../insn_trans/trans_privileged.c.inc |   16 +-
 target/loongarch/insn_trans/trans_shift.c.inc |   30 +-
 14 files changed, 946 insertions(+), 944 deletions(-)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index b6fa5df82d..3c5c746f30 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -10,9 +10,11 @@
 
 #include "exec/translator.h"
 
-#define TRANS(NAME, FUNC, ...) \
+#define TRANS(NAME, AVAIL, FUNC, ...) \
 static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
-{ return FUNC(ctx, a, __VA_ARGS__); }
+{ return avail_##AVAIL(ctx) && FUNC(ctx, a, __VA_ARGS__); }
+
+#define avail_ALL(C)   true
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
diff --git a/target/loongarch/insn_trans/trans_arith.c.inc 
b/target/loongarch/insn_trans/trans_arith.c.inc
index 2aea4e41d5..d7f69a7553 100644
--- a/target/loongarch/insn_trans/trans_arith.c.inc
+++ b/target/loongarch/insn_trans/trans_arith.c.inc
@@ -248,45 +248,45 @@ static bool trans_addu16i_d(DisasContext *ctx, 
arg_addu16i_d *a)
 return true;
 }
 
-TRANS(add_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)
-TRANS(add_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
-TRANS(sub_w, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_sub_tl)
-TRANS(sub_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
-TRANS(and, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_and_tl)
-TRANS(or, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_or_tl)
-TRANS(xor, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_xor_tl)
-TRANS(nor, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_nor_tl)
-TRANS(andn, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_andc_tl)
-TRANS(orn, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_orc_tl)
-TRANS(slt, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_slt)
-TRANS(sltu, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_sltu)
-TRANS(mul_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, tcg_gen_mul_tl)
-TRANS(mul_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_mul_tl)
-TRANS(mulh_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, gen_mulh_w)
-TRANS(mulh_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, gen_mulh_w)
-TRANS(mulh_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_d)
-TRANS(mulh_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_mulh_du)
-TRANS(mulw_d_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_NONE, tcg_gen_mul_tl)
-TRANS(mulw_d_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_NONE, tcg_gen_mul_tl)
-TRANS(div_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_div_w)
-TRANS(mod_w, gen_rrr, EXT_SIGN, EXT_SIGN, EXT_SIGN, gen_rem_w)
-TRANS(div_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_div_du)
-TRANS(mod_wu, gen_rrr, EXT_ZERO, EXT_ZERO, EXT_SIGN, gen_rem_du)
-TRANS(div_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_d)
-TRANS(mod_d, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_d)
-TRANS(div_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_div_du)
-TRANS(mod_du, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, gen_rem_du)
-TRANS(slti, gen_rri_v, EXT_NONE, EXT_NONE, gen_slt)
-TRANS(sltui, gen_rri_v, EXT_NONE, EXT_NONE, gen_sltu)
-TRANS(addi_w, gen_rri_c, EXT_NONE, EXT_SIGN, tcg_gen_addi_tl)
-TRANS(addi_d, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_addi_tl)
-TRANS(alsl_w, gen_rrr_sa, EXT_NONE, EXT_SIGN, gen_alsl)
-TRANS(alsl_wu, gen_rrr_sa, EXT_NONE, EXT_ZERO, gen_alsl)
-TRANS(alsl_d, gen_rrr_sa, EXT_NONE, EXT_NONE, gen_alsl)
-TRANS(pcaddi, gen_pc, gen_pcaddi)
-TRANS(pcalau12i, gen_pc, gen_pcalau12i)
-TRANS(pcaddu12i, gen_pc, gen_pcaddu12i)
-TRANS(pcaddu18i, gen_pc, gen_pcaddu18i)
-TRANS(andi, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_andi_tl)
-TRANS(ori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_ori_tl)
-TRANS(xori, gen_rri_c, EXT_NONE, EXT_NONE, tcg_gen_xori_tl)
+TRANS(add_w, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)
+TRANS(add_d, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_add_tl)
+TRANS(sub_w, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_sub_tl)
+TRANS(sub_d, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_sub_tl)
+TRANS(and, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_NONE, tcg_gen_and_tl)
+TRANS(or, ALL, gen_rrr, EXT_NON

[PATCH v4 03/15] target/loongarch: Support LoongArch32 VPPN

2023-08-21 Thread Song Gao

From: Jiajie Chen 

VPPN of TLBEHI/TLBREHI is limited to 19 bits in LA32.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu-csr.h|  6 --
 target/loongarch/tlb_helper.c | 23 ++-
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index b93f99a9ef..c59d7a9fcb 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -57,7 +57,8 @@ FIELD(CSR_TLBIDX, PS, 24, 6)
 FIELD(CSR_TLBIDX, NE, 31, 1)
 
 #define LOONGARCH_CSR_TLBEHI 0x11 /* TLB EntryHi */
-FIELD(CSR_TLBEHI, VPPN, 13, 35)
+FIELD(CSR_TLBEHI_32, VPPN, 13, 19)
+FIELD(CSR_TLBEHI_64, VPPN, 13, 35)
 
 #define LOONGARCH_CSR_TLBELO00x12 /* TLB EntryLo0 */
 #define LOONGARCH_CSR_TLBELO10x13 /* TLB EntryLo1 */
@@ -164,7 +165,8 @@ FIELD(CSR_TLBRERA, PC, 2, 62)
 #define LOONGARCH_CSR_TLBRELO1   0x8d /* TLB refill entrylo1 */
 #define LOONGARCH_CSR_TLBREHI0x8e /* TLB refill entryhi */
 FIELD(CSR_TLBREHI, PS, 0, 6)
-FIELD(CSR_TLBREHI, VPPN, 13, 35)
+FIELD(CSR_TLBREHI_32, VPPN, 13, 19)
+FIELD(CSR_TLBREHI_64, VPPN, 13, 35)
 #define LOONGARCH_CSR_TLBRPRMD   0x8f /* TLB refill mode info */
 FIELD(CSR_TLBRPRMD, PPLV, 0, 2)
 FIELD(CSR_TLBRPRMD, PIE, 2, 1)
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index 1f8e7911c7..c8b8b0497f 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -300,8 +300,13 @@ static void raise_mmu_exception(CPULoongArchState *env, 
target_ulong address,
 
 if (tlb_error == TLBRET_NOMATCH) {
 env->CSR_TLBRBADV = address;
-env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN,
-  extract64(address, 13, 35));
+if (is_la64(env)) {
+env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_64,
+VPPN, extract64(address, 13, 35));
+} else {
+env->CSR_TLBREHI = FIELD_DP64(env->CSR_TLBREHI, CSR_TLBREHI_32,
+VPPN, extract64(address, 13, 19));
+}
 } else {
 if (!FIELD_EX64(env->CSR_DBG, CSR_DBG, DST)) {
 env->CSR_BADV = address;
@@ -366,12 +371,20 @@ static void fill_tlb_entry(CPULoongArchState *env, int 
index)
 
 if (FIELD_EX64(env->CSR_TLBRERA, CSR_TLBRERA, ISTLBR)) {
 csr_ps = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, PS);
-csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI, VPPN);
+if (is_la64(env)) {
+csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_64, VPPN);
+} else {
+csr_vppn = FIELD_EX64(env->CSR_TLBREHI, CSR_TLBREHI_32, VPPN);
+}
 lo0 = env->CSR_TLBRELO0;
 lo1 = env->CSR_TLBRELO1;
 } else {
 csr_ps = FIELD_EX64(env->CSR_TLBIDX, CSR_TLBIDX, PS);
-csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI, VPPN);
+if (is_la64(env)) {
+csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_64, VPPN);
+} else {
+csr_vppn = FIELD_EX64(env->CSR_TLBEHI, CSR_TLBEHI_32, VPPN);
+}
 lo0 = env->CSR_TLBELO0;
 lo1 = env->CSR_TLBELO1;
 }
@@ -491,7 +504,7 @@ void helper_tlbfill(CPULoongArchState *env)
 
 if (pagesize == stlb_ps) {
 /* Only write into STLB bits [47:13] */
-address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_VPPN_SHIFT);
+address = entryhi & ~MAKE_64BIT_MASK(0, R_CSR_TLBEHI_64_VPPN_SHIFT);
 
 /* Choose one set ramdomly */
 set = get_random_tlb(0, 7);
-- 
2.39.1

[PATCH v4 12/15] target/loongarch: Add avail_LSPW to check LSPW instructions

2023-08-21 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/translate.h   | 1 +
 target/loongarch/insn_trans/trans_privileged.c.inc | 8 
 2 files changed, 9 insertions(+)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 0f244cd83b..f0d7b82932 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -20,6 +20,7 @@
 #define avail_FP(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, FP))
 #define avail_FP_SP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP))
 #define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP))
+#define avail_LSPW(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
diff --git a/target/loongarch/insn_trans/trans_privileged.c.inc 
b/target/loongarch/insn_trans/trans_privileged.c.inc
index 684ff547a7..099cd871f0 100644
--- a/target/loongarch/insn_trans/trans_privileged.c.inc
+++ b/target/loongarch/insn_trans/trans_privileged.c.inc
@@ -437,6 +437,10 @@ static bool trans_ldpte(DisasContext *ctx, arg_ldpte *a)
 TCGv_i32 mem_idx = tcg_constant_i32(ctx->mem_idx);
 TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
 
+if (!avail_LSPW(ctx)) {
+return true;
+}
+
 if (check_plv(ctx)) {
 return false;
 }
@@ -450,6 +454,10 @@ static bool trans_lddir(DisasContext *ctx, arg_lddir *a)
 TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
 TCGv dest = gpr_dst(ctx, a->rd, EXT_NONE);
 
+if (!avail_LSPW(ctx)) {
+return true;
+}
+
 if (check_plv(ctx)) {
 return false;
 }
-- 
2.39.1

[PATCH v4 14/15] target/loongarch: Add avail_LSX to check LSX instructions

2023-08-21 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/translate.h|2 +
 target/loongarch/insn_trans/trans_lsx.c.inc | 1482 ++-
 2 files changed, 823 insertions(+), 661 deletions(-)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index faf4ce87f9..db46e9aa0f 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -22,6 +22,8 @@
 #define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP))
 #define avail_LSPW(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
 #define avail_LAM(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM))
+#define avail_LSX(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSX))
+
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
b/target/loongarch/insn_trans/trans_lsx.c.inc
index 45e0e738ad..5fbf2718f7 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -135,16 +135,20 @@ static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, 
MemOp mop)
 return true;
 }
 
-TRANS(vadd_b, ALL, gvec_vvv, MO_8, tcg_gen_gvec_add)
-TRANS(vadd_h, ALL, gvec_vvv, MO_16, tcg_gen_gvec_add)
-TRANS(vadd_w, ALL, gvec_vvv, MO_32, tcg_gen_gvec_add)
-TRANS(vadd_d, ALL, gvec_vvv, MO_64, tcg_gen_gvec_add)
+TRANS(vadd_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_add)
+TRANS(vadd_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_add)
+TRANS(vadd_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_add)
+TRANS(vadd_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_add)
 
 #define VADDSUB_Q(NAME)\
 static bool trans_v## NAME ##_q(DisasContext *ctx, arg_vvv *a) \
 {  \
 TCGv_i64 rh, rl, ah, al, bh, bl;   \
\
+if (!avail_LSX(ctx)) { \
+return false;  \
+}  \
+   \
 CHECK_SXE; \
\
 rh = tcg_temp_new_i64();   \
@@ -170,58 +174,58 @@ static bool trans_v## NAME ##_q(DisasContext *ctx, 
arg_vvv *a) \
 VADDSUB_Q(add)
 VADDSUB_Q(sub)
 
-TRANS(vsub_b, ALL, gvec_vvv, MO_8, tcg_gen_gvec_sub)
-TRANS(vsub_h, ALL, gvec_vvv, MO_16, tcg_gen_gvec_sub)
-TRANS(vsub_w, ALL, gvec_vvv, MO_32, tcg_gen_gvec_sub)
-TRANS(vsub_d, ALL, gvec_vvv, MO_64, tcg_gen_gvec_sub)
-
-TRANS(vaddi_bu, ALL, gvec_vv_i, MO_8, tcg_gen_gvec_addi)
-TRANS(vaddi_hu, ALL, gvec_vv_i, MO_16, tcg_gen_gvec_addi)
-TRANS(vaddi_wu, ALL, gvec_vv_i, MO_32, tcg_gen_gvec_addi)
-TRANS(vaddi_du, ALL, gvec_vv_i, MO_64, tcg_gen_gvec_addi)
-TRANS(vsubi_bu, ALL, gvec_subi, MO_8)
-TRANS(vsubi_hu, ALL, gvec_subi, MO_16)
-TRANS(vsubi_wu, ALL, gvec_subi, MO_32)
-TRANS(vsubi_du, ALL, gvec_subi, MO_64)
-
-TRANS(vneg_b, ALL, gvec_vv, MO_8, tcg_gen_gvec_neg)
-TRANS(vneg_h, ALL, gvec_vv, MO_16, tcg_gen_gvec_neg)
-TRANS(vneg_w, ALL, gvec_vv, MO_32, tcg_gen_gvec_neg)
-TRANS(vneg_d, ALL, gvec_vv, MO_64, tcg_gen_gvec_neg)
-
-TRANS(vsadd_b, ALL, gvec_vvv, MO_8, tcg_gen_gvec_ssadd)
-TRANS(vsadd_h, ALL, gvec_vvv, MO_16, tcg_gen_gvec_ssadd)
-TRANS(vsadd_w, ALL, gvec_vvv, MO_32, tcg_gen_gvec_ssadd)
-TRANS(vsadd_d, ALL, gvec_vvv, MO_64, tcg_gen_gvec_ssadd)
-TRANS(vsadd_bu, ALL, gvec_vvv, MO_8, tcg_gen_gvec_usadd)
-TRANS(vsadd_hu, ALL, gvec_vvv, MO_16, tcg_gen_gvec_usadd)
-TRANS(vsadd_wu, ALL, gvec_vvv, MO_32, tcg_gen_gvec_usadd)
-TRANS(vsadd_du, ALL, gvec_vvv, MO_64, tcg_gen_gvec_usadd)
-TRANS(vssub_b, ALL, gvec_vvv, MO_8, tcg_gen_gvec_sssub)
-TRANS(vssub_h, ALL, gvec_vvv, MO_16, tcg_gen_gvec_sssub)
-TRANS(vssub_w, ALL, gvec_vvv, MO_32, tcg_gen_gvec_sssub)
-TRANS(vssub_d, ALL, gvec_vvv, MO_64, tcg_gen_gvec_sssub)
-TRANS(vssub_bu, ALL, gvec_vvv, MO_8, tcg_gen_gvec_ussub)
-TRANS(vssub_hu, ALL, gvec_vvv, MO_16, tcg_gen_gvec_ussub)
-TRANS(vssub_wu, ALL, gvec_vvv, MO_32, tcg_gen_gvec_ussub)
-TRANS(vssub_du, ALL, gvec_vvv, MO_64, tcg_gen_gvec_ussub)
-
-TRANS(vhaddw_h_b, ALL, gen_vvv, gen_helper_vhaddw_h_b)
-TRANS(vhaddw_w_h, ALL, gen_vvv, gen_helper_vhaddw_w_h)
-TRANS(vhaddw_d_w, ALL, gen_vvv, gen_helper_vhaddw_d_w)
-TRANS(vhaddw_q_d, ALL, gen_vvv, gen_helper_vhaddw_q_d)
-TRANS(vhaddw_hu_bu, ALL, gen_vvv, gen_helper_vhaddw_hu_bu)
-TRANS(vhaddw_wu_hu, ALL, gen_vvv, gen_helper_vhaddw_wu_hu)
-TRANS(vhaddw_du_wu, ALL, gen_vvv, gen_helper_vhaddw_du_wu)
-TRANS(vhaddw_qu_du, ALL, gen_vvv, gen_helper_vhaddw_qu_du)
-TRANS(vhsubw_h_b, ALL, gen_vvv, gen_helper_vhsubw_h_b)
-TRANS(vhsubw_w_h, ALL, gen_vvv, gen_helper_vhsubw_w_h)
-TRANS(vhsubw_d_w, ALL, gen_vvv, gen_helper_vhsubw_d_w)
-TRANS(vhsubw_q_d, ALL, gen_vvv, gen_helper_vhsubw_q_d)
-TRANS(vhsubw_hu_bu, ALL, gen_vvv, gen_helper_vhsubw_hu_bu

[PATCH v4 10/15] hw/loongarch: Remove restriction of la464 cores in the virt machine

2023-08-21 Thread Song Gao

Allow virt machine to be used with la132 instead of la464.

Co-authored-by: Jiajie Chen 
Signed-off-by: Song Gao 
---
 hw/loongarch/virt.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e19b042ce8..af15bf5aaa 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -798,11 +798,6 @@ static void loongarch_init(MachineState *machine)
 cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
 }
 
-if (!strstr(cpu_model, "la464")) {
-error_report("LoongArch/TCG needs cpu type la464");
-exit(1);
-}
-
 if (ram_size < 1 * GiB) {
 error_report("ram_size must be greater than 1G.");
 exit(1);
-- 
2.39.1

[PATCH v4 13/15] target/loongarch: Add avail_LAM to check atomic instructions

2023-08-21 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/translate.h  |  1 +
 .../loongarch/insn_trans/trans_atomic.c.inc   | 72 +--
 2 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index f0d7b82932..faf4ce87f9 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -21,6 +21,7 @@
 #define avail_FP_SP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP))
 #define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP))
 #define avail_LSPW(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
+#define avail_LAM(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM))
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
diff --git a/target/loongarch/insn_trans/trans_atomic.c.inc 
b/target/loongarch/insn_trans/trans_atomic.c.inc
index 194818d74d..40085190f6 100644
--- a/target/loongarch/insn_trans/trans_atomic.c.inc
+++ b/target/loongarch/insn_trans/trans_atomic.c.inc
@@ -73,39 +73,39 @@ TRANS(ll_w, ALL, gen_ll, MO_TESL)
 TRANS(sc_w, ALL, gen_sc, MO_TESL)
 TRANS(ll_d, 64, gen_ll, MO_TEUQ)
 TRANS(sc_d, 64, gen_sc, MO_TEUQ)
-TRANS(amswap_w, 64, gen_am, tcg_gen_atomic_xchg_tl, MO_TESL)
-TRANS(amswap_d, 64, gen_am, tcg_gen_atomic_xchg_tl, MO_TEUQ)
-TRANS(amadd_w, 64, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESL)
-TRANS(amadd_d, 64, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TEUQ)
-TRANS(amand_w, 64, gen_am, tcg_gen_atomic_fetch_and_tl, MO_TESL)
-TRANS(amand_d, 64, gen_am, tcg_gen_atomic_fetch_and_tl, MO_TEUQ)
-TRANS(amor_w, 64, gen_am, tcg_gen_atomic_fetch_or_tl, MO_TESL)
-TRANS(amor_d, 64, gen_am, tcg_gen_atomic_fetch_or_tl, MO_TEUQ)
-TRANS(amxor_w, 64, gen_am, tcg_gen_atomic_fetch_xor_tl, MO_TESL)
-TRANS(amxor_d, 64, gen_am, tcg_gen_atomic_fetch_xor_tl, MO_TEUQ)
-TRANS(ammax_w, 64, gen_am, tcg_gen_atomic_fetch_smax_tl, MO_TESL)
-TRANS(ammax_d, 64, gen_am, tcg_gen_atomic_fetch_smax_tl, MO_TEUQ)
-TRANS(ammin_w, 64, gen_am, tcg_gen_atomic_fetch_smin_tl, MO_TESL)
-TRANS(ammin_d, 64, gen_am, tcg_gen_atomic_fetch_smin_tl, MO_TEUQ)
-TRANS(ammax_wu, 64, gen_am, tcg_gen_atomic_fetch_umax_tl, MO_TESL)
-TRANS(ammax_du, 64, gen_am, tcg_gen_atomic_fetch_umax_tl, MO_TEUQ)
-TRANS(ammin_wu, 64, gen_am, tcg_gen_atomic_fetch_umin_tl, MO_TESL)
-TRANS(ammin_du, 64, gen_am, tcg_gen_atomic_fetch_umin_tl, MO_TEUQ)
-TRANS(amswap_db_w, 64, gen_am, tcg_gen_atomic_xchg_tl, MO_TESL)
-TRANS(amswap_db_d, 64, gen_am, tcg_gen_atomic_xchg_tl, MO_TEUQ)
-TRANS(amadd_db_w, 64, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESL)
-TRANS(amadd_db_d, 64, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TEUQ)
-TRANS(amand_db_w, 64, gen_am, tcg_gen_atomic_fetch_and_tl, MO_TESL)
-TRANS(amand_db_d, 64, gen_am, tcg_gen_atomic_fetch_and_tl, MO_TEUQ)
-TRANS(amor_db_w, 64, gen_am, tcg_gen_atomic_fetch_or_tl, MO_TESL)
-TRANS(amor_db_d, 64, gen_am, tcg_gen_atomic_fetch_or_tl, MO_TEUQ)
-TRANS(amxor_db_w, 64, gen_am, tcg_gen_atomic_fetch_xor_tl, MO_TESL)
-TRANS(amxor_db_d, 64, gen_am, tcg_gen_atomic_fetch_xor_tl, MO_TEUQ)
-TRANS(ammax_db_w, 64, gen_am, tcg_gen_atomic_fetch_smax_tl, MO_TESL)
-TRANS(ammax_db_d, 64, gen_am, tcg_gen_atomic_fetch_smax_tl, MO_TEUQ)
-TRANS(ammin_db_w, 64, gen_am, tcg_gen_atomic_fetch_smin_tl, MO_TESL)
-TRANS(ammin_db_d, 64, gen_am, tcg_gen_atomic_fetch_smin_tl, MO_TEUQ)
-TRANS(ammax_db_wu, 64, gen_am, tcg_gen_atomic_fetch_umax_tl, MO_TESL)
-TRANS(ammax_db_du, 64, gen_am, tcg_gen_atomic_fetch_umax_tl, MO_TEUQ)
-TRANS(ammin_db_wu, 64, gen_am, tcg_gen_atomic_fetch_umin_tl, MO_TESL)
-TRANS(ammin_db_du, 64, gen_am, tcg_gen_atomic_fetch_umin_tl, MO_TEUQ)
+TRANS(amswap_w, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TESL)
+TRANS(amswap_d, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TEUQ)
+TRANS(amadd_w, LAM, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TESL)
+TRANS(amadd_d, LAM, gen_am, tcg_gen_atomic_fetch_add_tl, MO_TEUQ)
+TRANS(amand_w, LAM, gen_am, tcg_gen_atomic_fetch_and_tl, MO_TESL)
+TRANS(amand_d, LAM, gen_am, tcg_gen_atomic_fetch_and_tl, MO_TEUQ)
+TRANS(amor_w, LAM, gen_am, tcg_gen_atomic_fetch_or_tl, MO_TESL)
+TRANS(amor_d, LAM, gen_am, tcg_gen_atomic_fetch_or_tl, MO_TEUQ)
+TRANS(amxor_w, LAM, gen_am, tcg_gen_atomic_fetch_xor_tl, MO_TESL)
+TRANS(amxor_d, LAM, gen_am, tcg_gen_atomic_fetch_xor_tl, MO_TEUQ)
+TRANS(ammax_w, LAM, gen_am, tcg_gen_atomic_fetch_smax_tl, MO_TESL)
+TRANS(ammax_d, LAM, gen_am, tcg_gen_atomic_fetch_smax_tl, MO_TEUQ)
+TRANS(ammin_w, LAM, gen_am, tcg_gen_atomic_fetch_smin_tl, MO_TESL)
+TRANS(ammin_d, LAM, gen_am, tcg_gen_atomic_fetch_smin_tl, MO_TEUQ)
+TRANS(ammax_wu, LAM, gen_am, tcg_gen_atomic_fetch_umax_tl, MO_TESL)
+TRANS(ammax_du, LAM, gen_am, tcg_gen_atomic_fetch_umax_tl, MO_TEUQ)
+TRANS(ammin_wu, LAM, gen_am, tcg_gen_atomic_fetch_umin_tl, MO_TESL)
+TRANS(ammin_du, LAM, gen_am, tcg_gen_atomic_fetch_umin_tl, MO_TEUQ)
+TRANS(amswap_db_w, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TESL)
+TRANS(amswap_db_d, LAM, gen_am, tcg_gen_atomic_xchg_tl, MO_TEUQ)
+TRANS(amadd_db_w, LAM, gen_am, tcg_gen_atom

[PATCH v4 02/15] target/loongarch: Support LoongArch32 DMW

2023-08-21 Thread Song Gao

From: Jiajie Chen 

LA32 uses a different encoding for CSR.DMW and a new direct mapping
mechanism.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu-csr.h|  7 +++
 target/loongarch/tlb_helper.c | 26 +++---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/target/loongarch/cpu-csr.h b/target/loongarch/cpu-csr.h
index 48ed2e0632..b93f99a9ef 100644
--- a/target/loongarch/cpu-csr.h
+++ b/target/loongarch/cpu-csr.h
@@ -188,10 +188,9 @@ FIELD(CSR_DMW, PLV1, 1, 1)
 FIELD(CSR_DMW, PLV2, 2, 1)
 FIELD(CSR_DMW, PLV3, 3, 1)
 FIELD(CSR_DMW, MAT, 4, 2)
-FIELD(CSR_DMW, VSEG, 60, 4)
-
-#define dmw_va2pa(va) \
-(va & MAKE_64BIT_MASK(0, TARGET_VIRT_ADDR_SPACE_BITS))
+FIELD(CSR_DMW_32, PSEG, 25, 3)
+FIELD(CSR_DMW_32, VSEG, 29, 3)
+FIELD(CSR_DMW_64, VSEG, 60, 4)
 
 /* Debug CSRs */
 #define LOONGARCH_CSR_DBG0x500 /* debug config */
diff --git a/target/loongarch/tlb_helper.c b/target/loongarch/tlb_helper.c
index cef10e2257..1f8e7911c7 100644
--- a/target/loongarch/tlb_helper.c
+++ b/target/loongarch/tlb_helper.c
@@ -173,6 +173,18 @@ static int loongarch_map_address(CPULoongArchState *env, 
hwaddr *physical,
 return TLBRET_NOMATCH;
 }
 
+static hwaddr dmw_va2pa(CPULoongArchState *env, target_ulong va,
+target_ulong dmw)
+{
+if (is_la64(env)) {
+return va & TARGET_VIRT_MASK;
+} else {
+uint32_t pseg = FIELD_EX32(dmw, CSR_DMW_32, PSEG);
+return (va & MAKE_64BIT_MASK(0, R_CSR_DMW_32_VSEG_SHIFT)) | \
+(pseg << R_CSR_DMW_32_VSEG_SHIFT);
+}
+}
+
 static int get_physical_address(CPULoongArchState *env, hwaddr *physical,
 int *prot, target_ulong address,
 MMUAccessType access_type, int mmu_idx)
@@ -192,12 +204,20 @@ static int get_physical_address(CPULoongArchState *env, 
hwaddr *physical,
 }
 
 plv = kernel_mode | (user_mode << R_CSR_DMW_PLV3_SHIFT);
-base_v = address >> R_CSR_DMW_VSEG_SHIFT;
+if (is_la64(env)) {
+base_v = address >> R_CSR_DMW_64_VSEG_SHIFT;
+} else {
+base_v = address >> R_CSR_DMW_32_VSEG_SHIFT;
+}
 /* Check direct map window */
 for (int i = 0; i < 4; i++) {
-base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW, VSEG);
+if (is_la64(env)) {
+base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_64, VSEG);
+} else {
+base_c = FIELD_EX64(env->CSR_DMW[i], CSR_DMW_32, VSEG);
+}
 if ((plv & env->CSR_DMW[i]) && (base_c == base_v)) {
-*physical = dmw_va2pa(address);
+*physical = dmw_va2pa(env, address, env->CSR_DMW[i]);
 *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 return TLBRET_MATCH;
 }
-- 
2.39.1

[PATCH v4 09/15] target/loongarch: Add LoongArch32 cpu la132

2023-08-21 Thread Song Gao

From: Jiajie Chen 

Add LoongArch32 cpu la132.

Due to lack of public documentation of la132, it is currently a
synthetic LoongArch32 cpu model. Details need to be added in the future.

Signed-off-by: Jiajie Chen 
Signed-off-by: Song Gao 
---
 target/loongarch/cpu.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 67eb6c3135..d3c3e0d8a1 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -440,6 +440,35 @@ static void loongarch_la464_initfn(Object *obj)
 env->CSR_ASID = FIELD_DP64(0, CSR_ASID, ASIDBITS, 0xa);
 }
 
+static void loongarch_la132_initfn(Object *obj)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(obj);
+CPULoongArchState *env = &cpu->env;
+
+int i;
+
+for (i = 0; i < 21; i++) {
+env->cpucfg[i] = 0x0;
+}
+
+cpu->dtb_compatible = "loongarch,Loongson-1C103";
+env->cpucfg[0] = 0x148042;  /* PRID */
+
+uint32_t data = 0;
+data = FIELD_DP32(data, CPUCFG1, ARCH, 1); /* LA32 */
+data = FIELD_DP32(data, CPUCFG1, PGMMU, 1);
+data = FIELD_DP32(data, CPUCFG1, IOCSR, 1);
+data = FIELD_DP32(data, CPUCFG1, PALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, VALEN, 0x1f); /* 32 bits */
+data = FIELD_DP32(data, CPUCFG1, UAL, 1);
+data = FIELD_DP32(data, CPUCFG1, RI, 0);
+data = FIELD_DP32(data, CPUCFG1, EP, 0);
+data = FIELD_DP32(data, CPUCFG1, RPLV, 0);
+data = FIELD_DP32(data, CPUCFG1, HP, 1);
+data = FIELD_DP32(data, CPUCFG1, IOCSR_BRD, 1);
+env->cpucfg[1] = data;
+}
+
 static void loongarch_cpu_list_entry(gpointer data, gpointer user_data)
 {
 const char *typename = object_class_get_name(OBJECT_CLASS(data));
@@ -787,6 +816,7 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .class_init = loongarch64_cpu_class_init,
 },
 DEFINE_LOONGARCH_CPU_TYPE(64, "la464", loongarch_la464_initfn),
+DEFINE_LOONGARCH_CPU_TYPE(32, "la132", loongarch_la132_initfn),
 };
 
 DEFINE_TYPES(loongarch_cpu_type_infos)
-- 
2.39.1

[PATCH v4 15/15] target/loongarch: Add avail_IOCSR to check iocsr instructions

2023-08-21 Thread Song Gao

Signed-off-by: Song Gao 
Reviewed-by: Richard Henderson 
---
 target/loongarch/translate.h |  2 +-
 .../loongarch/insn_trans/trans_privileged.c.inc  | 16 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index db46e9aa0f..89b49a859e 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -23,7 +23,7 @@
 #define avail_LSPW(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
 #define avail_LAM(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM))
 #define avail_LSX(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSX))
-
+#define avail_IOCSR(C) (FIELD_EX32((C)->cpucfg1, CPUCFG1, IOCSR))
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
diff --git a/target/loongarch/insn_trans/trans_privileged.c.inc 
b/target/loongarch/insn_trans/trans_privileged.c.inc
index 099cd871f0..4cb701b4b5 100644
--- a/target/loongarch/insn_trans/trans_privileged.c.inc
+++ b/target/loongarch/insn_trans/trans_privileged.c.inc
@@ -312,14 +312,14 @@ static bool gen_iocsrwr(DisasContext *ctx, arg_rr *a,
 return true;
 }
 
-TRANS(iocsrrd_b, ALL, gen_iocsrrd, gen_helper_iocsrrd_b)
-TRANS(iocsrrd_h, ALL, gen_iocsrrd, gen_helper_iocsrrd_h)
-TRANS(iocsrrd_w, ALL, gen_iocsrrd, gen_helper_iocsrrd_w)
-TRANS(iocsrrd_d, ALL, gen_iocsrrd, gen_helper_iocsrrd_d)
-TRANS(iocsrwr_b, ALL, gen_iocsrwr, gen_helper_iocsrwr_b)
-TRANS(iocsrwr_h, ALL, gen_iocsrwr, gen_helper_iocsrwr_h)
-TRANS(iocsrwr_w, ALL, gen_iocsrwr, gen_helper_iocsrwr_w)
-TRANS(iocsrwr_d, ALL, gen_iocsrwr, gen_helper_iocsrwr_d)
+TRANS(iocsrrd_b, IOCSR, gen_iocsrrd, gen_helper_iocsrrd_b)
+TRANS(iocsrrd_h, IOCSR, gen_iocsrrd, gen_helper_iocsrrd_h)
+TRANS(iocsrrd_w, IOCSR, gen_iocsrrd, gen_helper_iocsrrd_w)
+TRANS(iocsrrd_d, IOCSR, gen_iocsrrd, gen_helper_iocsrrd_d)
+TRANS(iocsrwr_b, IOCSR, gen_iocsrwr, gen_helper_iocsrwr_b)
+TRANS(iocsrwr_h, IOCSR, gen_iocsrwr, gen_helper_iocsrwr_h)
+TRANS(iocsrwr_w, IOCSR, gen_iocsrwr, gen_helper_iocsrwr_w)
+TRANS(iocsrwr_d, IOCSR, gen_iocsrwr, gen_helper_iocsrwr_d)
 
 static void check_mmu_idx(DisasContext *ctx)
 {
-- 
2.39.1

[PATCH v4 11/15] target/loongarch: Add avail_FP/FP_SP/FP_DP to check fpu instructions

2023-08-21 Thread Song Gao

Signed-off-by: Song Gao 
Acked-by: Richard Henderson 
---
 target/loongarch/translate.h  |  4 +
 target/loongarch/translate.c  |  1 +
 .../loongarch/insn_trans/trans_farith.c.inc   | 96 ---
 target/loongarch/insn_trans/trans_fcmp.c.inc  |  8 ++
 target/loongarch/insn_trans/trans_fcnv.c.inc  | 56 +--
 .../loongarch/insn_trans/trans_fmemory.c.inc  | 32 +++
 target/loongarch/insn_trans/trans_fmov.c.inc  | 48 --
 7 files changed, 159 insertions(+), 86 deletions(-)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 1342446242..0f244cd83b 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -17,6 +17,9 @@
 #define avail_ALL(C)   true
 #define avail_64(C)(FIELD_EX32((C)->cpucfg1, CPUCFG1, ARCH) == \
 CPUCFG1_ARCH_LA64)
+#define avail_FP(C)(FIELD_EX32((C)->cpucfg2, CPUCFG2, FP))
+#define avail_FP_SP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_SP))
+#define avail_FP_DP(C) (FIELD_EX32((C)->cpucfg2, CPUCFG2, FP_DP))
 
 /*
  * If an operation is being performed on less than TARGET_LONG_BITS,
@@ -40,6 +43,7 @@ typedef struct DisasContext {
 bool la64; /* LoongArch64 mode */
 bool va32; /* 32-bit virtual address */
 uint32_t cpucfg1;
+uint32_t cpucfg2;
 } DisasContext;
 
 void generate_exception(DisasContext *ctx, int excp);
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 6967e12fc3..fd393ed76d 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -129,6 +129,7 @@ static void 
loongarch_tr_init_disas_context(DisasContextBase *dcbase,
 ctx->zero = tcg_constant_tl(0);
 
 ctx->cpucfg1 = env->cpucfg[1];
+ctx->cpucfg2 = env->cpucfg[2];
 }
 
 static void loongarch_tr_tb_start(DisasContextBase *dcbase, CPUState *cs)
diff --git a/target/loongarch/insn_trans/trans_farith.c.inc 
b/target/loongarch/insn_trans/trans_farith.c.inc
index b1a1dc7b01..a7ced99fd3 100644
--- a/target/loongarch/insn_trans/trans_farith.c.inc
+++ b/target/loongarch/insn_trans/trans_farith.c.inc
@@ -67,6 +67,10 @@ static bool trans_fcopysign_s(DisasContext *ctx, 
arg_fcopysign_s *a)
 TCGv src1 = get_fpr(ctx, a->fk);
 TCGv src2 = get_fpr(ctx, a->fj);
 
+if (!avail_FP_SP(ctx)) {
+return false;
+}
+
 CHECK_FPE;
 
 tcg_gen_deposit_i64(dest, src1, src2, 0, 31);
@@ -81,6 +85,10 @@ static bool trans_fcopysign_d(DisasContext *ctx, 
arg_fcopysign_d *a)
 TCGv src1 = get_fpr(ctx, a->fk);
 TCGv src2 = get_fpr(ctx, a->fj);
 
+if (!avail_FP_DP(ctx)) {
+return false;
+}
+
 CHECK_FPE;
 
 tcg_gen_deposit_i64(dest, src1, src2, 0, 63);
@@ -94,6 +102,10 @@ static bool trans_fabs_s(DisasContext *ctx, arg_fabs_s *a)
 TCGv dest = get_fpr(ctx, a->fd);
 TCGv src = get_fpr(ctx, a->fj);
 
+if (!avail_FP_SP(ctx)) {
+return false;
+}
+
 CHECK_FPE;
 
 tcg_gen_andi_i64(dest, src, MAKE_64BIT_MASK(0, 31));
@@ -108,6 +120,10 @@ static bool trans_fabs_d(DisasContext *ctx, arg_fabs_d *a)
 TCGv dest = get_fpr(ctx, a->fd);
 TCGv src = get_fpr(ctx, a->fj);
 
+if (!avail_FP_DP(ctx)) {
+return false;
+}
+
 CHECK_FPE;
 
 tcg_gen_andi_i64(dest, src, MAKE_64BIT_MASK(0, 63));
@@ -121,6 +137,10 @@ static bool trans_fneg_s(DisasContext *ctx, arg_fneg_s *a)
 TCGv dest = get_fpr(ctx, a->fd);
 TCGv src = get_fpr(ctx, a->fj);
 
+if (!avail_FP_SP(ctx)) {
+return false;
+}
+
 CHECK_FPE;
 
 tcg_gen_xori_i64(dest, src, 0x8000);
@@ -135,6 +155,10 @@ static bool trans_fneg_d(DisasContext *ctx, arg_fneg_d *a)
 TCGv dest = get_fpr(ctx, a->fd);
 TCGv src = get_fpr(ctx, a->fj);
 
+if (!avail_FP_DP(ctx)) {
+return false;
+}
+
 CHECK_FPE;
 
 tcg_gen_xori_i64(dest, src, 0x8000LL);
@@ -143,41 +167,41 @@ static bool trans_fneg_d(DisasContext *ctx, arg_fneg_d *a)
 return true;
 }
 
-TRANS(fadd_s, ALL, gen_fff, gen_helper_fadd_s)
-TRANS(fadd_d, ALL, gen_fff, gen_helper_fadd_d)
-TRANS(fsub_s, ALL, gen_fff, gen_helper_fsub_s)
-TRANS(fsub_d, ALL, gen_fff, gen_helper_fsub_d)
-TRANS(fmul_s, ALL, gen_fff, gen_helper_fmul_s)
-TRANS(fmul_d, ALL, gen_fff, gen_helper_fmul_d)
-TRANS(fdiv_s, ALL, gen_fff, gen_helper_fdiv_s)
-TRANS(fdiv_d, ALL, gen_fff, gen_helper_fdiv_d)
-TRANS(fmax_s, ALL, gen_fff, gen_helper_fmax_s)
-TRANS(fmax_d, ALL, gen_fff, gen_helper_fmax_d)
-TRANS(fmin_s, ALL, gen_fff, gen_helper_fmin_s)
-TRANS(fmin_d, ALL, gen_fff, gen_helper_fmin_d)
-TRANS(fmaxa_s, ALL, gen_fff, gen_helper_fmaxa_s)
-TRANS(fmaxa_d, ALL, gen_fff, gen_helper_fmaxa_d)
-TRANS(fmina_s, ALL, gen_fff, gen_helper_fmina_s)
-TRANS(fmina_d, ALL, gen_fff, gen_helper_fmina_d)
-TRANS(fscaleb_s, ALL, gen_fff, gen_helper_fscaleb_s)
-TRANS(fscaleb_d, ALL, gen_fff, gen_helper_fscaleb_d)
-TRANS(fsqrt_s, ALL, gen_ff, gen_helper_fsqrt_s)
-TRANS(fsqrt_d, ALL, gen_ff, gen_helper_fsqrt_d)
-TRANS(frecip_s, ALL, gen_ff, gen_helpe

Re: [PATCH] hw/loongarch: Fix ACPI processor id off-by-one error

2023-08-21 Thread bibo mao



在 2023/8/21 09:29, Jiajie Chen 写道:
> 
> On 2023/8/21 09:24, bibo mao wrote:
>> + Add xianglai
>>
>> Good catch.
>>
>> In theory, it is logical id, and it can be not equal to physical id.
>> However it must be equal to _UID in cpu dsdt table which is missing
>> now.
> 
> Yes, the logical id can be different from index. The spec says:
> 
> If the processor structure represents an actual processor, this field must 
> match the value of ACPI processor ID field in the processor’s entry in the 
> MADT. If the processor structure represents a group of associated processors, 
> the structure might match a processor container in the name space. In that 
> case this entry will match the value of the _UID method of the associated 
> processor container. Where there is a match it must be represented. The flags 
> field, described in /Processor Structure Flags/, includes a bit to describe 
> whether the ACPI processor ID is valid.
> 
> I believe PPTT, MADT and DSDT should all adhere to the same logical id 
> mapping.
yes, you are right and I had a mistake.

Logical id in MADT/DSDT/PPTT should be the same, there is physical id
associated with arch_ids->cpus[i].arch_id already. 

And get_arch_id for cpuclass to get physical id and DSDT table are missing
about LoongArch platform, however it is another issue.
 
Reviewed-by: Bibo Mao 

Regards
Bibo Mao
> 
>> Can pptt table parse error be fixed if cpu dsdt table is added?
>>
>> Regards
>> Bibo Mao
>>
>>
>> 在 2023/8/20 18:56, Jiajie Chen 写道:
>>> In hw/acpi/aml-build.c:build_pptt() function, the code assumes that the
>>> ACPI processor id equals to the cpu index, for example if we have 8
>>> cpus, then the ACPI processor id should be in range 0-7.
>>>
>>> However, in hw/loongarch/acpi-build.c:build_madt() function we broke the
>>> assumption. If we have 8 cpus again, the ACPI processor id in MADT table
>>> would be in range 1-8. It violates the following description taken from
>>> ACPI spec 6.4 table 5.138:
>>>
>>> If the processor structure represents an actual processor, this field
>>> must match the value of ACPI processor ID field in the processor’s entry
>>> in the MADT.
>>>
>>> It will break the latest Linux 6.5-rc6 with the
>>> following error message:
>>>
>>> ACPI PPTT: PPTT table found, but unable to locate core 7 (8)
>>> Invalid BIOS PPTT
>>>
>>> Here 7 is the last cpu index, 8 is the ACPI processor id learned from
>>> MADT.
>>>
>>> With this patch, Linux can properly detect SMT threads when "-smp
>>> 8,sockets=1,cores=4,threads=2" is passed:
>>>
>>> Thread(s) per core:  2
>>> Core(s) per socket:  2
>>> Socket(s):   2
>>>
>>> The detection of number of sockets is still wrong, but that is out of
>>> scope of the commit.
>>>
>>> Signed-off-by: Jiajie Chen 
>>> ---
>>>  hw/loongarch/acpi-build.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
>>> index 0b62c3a2f7..ae292fc543 100644
>>> --- a/hw/loongarch/acpi-build.c
>>> +++ b/hw/loongarch/acpi-build.c
>>> @@ -127,7 +127,7 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
>>> LoongArchMachineState *lams)
>>>  build_append_int_noprefix(table_data, 17, 1);/* Type */
>>>  build_append_int_noprefix(table_data, 15, 1);/* Length */
>>>  build_append_int_noprefix(table_data, 1, 1); /* Version */
>>> -build_append_int_noprefix(table_data, i + 1, 4); /* ACPI Processor 
>>> ID */
>>> +build_append_int_noprefix(table_data, i, 4); /* ACPI Processor 
>>> ID */
>>>  build_append_int_noprefix(table_data, arch_id, 4); /* Core ID */
>>>  build_append_int_noprefix(table_data, 1, 4); /* Flags */
>>>  }

Re: [PATCH 1/2] vhost-user: fix lost reconnect

2023-08-21 Thread Raphael Norwitz


> On Aug 17, 2023, at 2:40 AM, Li Feng  wrote:
> 
> 
>> 2023年8月14日 下午8:11，Raphael Norwitz  写道：
>> 
>> Why can’t we rather fix this by adding a “event_cb” param to 
>> vhost_user_async_close and then call qemu_chr_fe_set_handlers in 
>> vhost_user_async_close_bh()?
>> 
>> Even if calling vhost_dev_cleanup() twice is safe today I worry future 
>> changes may easily stumble over the reconnect case and introduce crashes or 
>> double frees.
>> 
> I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ has 
> been called in vhost_user_async_close, and will be called in event->cb, so 
> why need add a
> new event_cb?
> 

I’m suggesting calling the data->event_cb instead of the data->cb if we hit the 
error case where vhost->vdev is NULL. Something like:

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..edf1dccd44 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -2648,6 +2648,10 @@ typedef struct {
 static void vhost_user_async_close_bh(void *opaque)
 {
 VhostAsyncCallback *data = opaque;
+
+VirtIODevice *vdev = VIRTIO_DEVICE(data->dev);
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+
 struct vhost_dev *vhost = data->vhost;
 
 /*
@@ -2657,6 +2661,9 @@ static void vhost_user_async_close_bh(void *opaque)
  */
 if (vhost->vdev) {
 data->cb(data->dev);
+} else if (data->event_cb) {
+qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, data->event_cb,
+ NULL, data->dev, NULL, true);
 }
 
 g_free(data);

data->event_cb would be vhost_user_blk_event().

I think that makes the error path a lot easier to reason about and more future 
proof.

> For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in struct 
> vhost-dev to mark if it’s inited like this:
> 

This is better than the original, but let me know what you think of my 
alternative.

> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index e2f6ffb446..edc80c0231 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
> goto fail_busyloop;
> }
> 
> +hdev->inited = true;
> return 0;
> 
> fail_busyloop:
> @@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
> {
> int i;
> 
> +if (!hdev->inited) {
> +return;
> +}
> +hdev->inited = false;
> trace_vhost_dev_cleanup(hdev);
> 
> for (i = 0; i < hdev->nvqs; ++i) {
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index ca3131b1af..74b1aec960 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -123,6 +123,7 @@ struct vhost_dev {
> /* @started: is the vhost device started? */
> bool started;
> bool log_enabled;
> +bool inited;
> uint64_t log_size;
> Error *migration_blocker;
> const VhostOps *vhost_ops;
> 
> Thanks.
> 
>> 
>>> On Aug 4, 2023, at 1:29 AM, Li Feng  wrote:
>>> 
>>> When the vhost-user is reconnecting to the backend, and if the vhost-user 
>>> fails
>>> at the get_features in vhost_dev_init(), then the reconnect will fail
>>> and it will not be retriggered forever.
>>> 
>>> The reason is:
>>> When the vhost-user fail at get_features, the vhost_dev_cleanup will be 
>>> called
>>> immediately.
>>> 
>>> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
>>> 
>>> The reconnect path is:
>>> vhost_user_blk_event
>>> vhost_user_async_close(.. vhost_user_blk_disconnect ..)
>>>   qemu_chr_fe_set_handlers <- clear the notifier callback
>>> schedule vhost_user_async_close_bh
>>> 
>>> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
>>> called, then the event fd callback will not be reinstalled.
>>> 
>>> With this patch, the vhost_user_blk_disconnect will call the
>>> vhost_dev_cleanup() again, it's safe.
>>> 
>>> All vhost-user devices have this issue, including vhost-user-blk/scsi.
>>> 
>>> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
>>> 
>>> Signed-off-by: Li Feng 
>>> ---
>>> hw/virtio/vhost-user.c | 10 +-
>>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>> 
>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>>> index 8dcf049d42..697b403fe2 100644
>>> --- a/hw/virtio/vhost-user.c
>>> +++ b/hw/virtio/vhost-user.c
>>> @@ -2648,16 +2648,8 @@ typedef struct {
>>> static void vhost_user_async_close_bh(void *opaque)
>>> {
>>>   VhostAsyncCallback *data = opaque;
>>> -struct vhost_dev *vhost = data->vhost;
>>> 
>>> -/*
>>> - * If the vhost_dev has been cleared in the meantime there is
>>> - * nothing left to do as some other path has completed the
>>> - * cleanup.
>>> - */
>>> -if (vhost->vdev) {
>>> -data->cb(data->dev);
>>> -}
>>> +data->cb(data->dev);
>>> 
>>>   g_free(data);
>>> }
>>> -- 
>>> 2.41.0
>>> 
>> 
>

Re: [PATCH v7 9/9] docs/system: add basic virtio-gpu documentation

2023-08-21 Thread Gurchetan Singh

On Fri, Aug 18, 2023 at 11:13 PM Akihiko Odaki 
wrote:

> On 2023/08/19 10:17, Gurchetan Singh wrote:
> >
> >
> > On Fri, Aug 18, 2023 at 5:08 AM Akihiko Odaki  > > wrote:
> >
> > On 2023/08/18 8:47, Gurchetan Singh wrote:
> >  >
> >  >
> >  > On Wed, Aug 16, 2023 at 10:28 PM Akihiko Odaki
> > mailto:akihiko.od...@gmail.com>
> >  >  > >> wrote:
> >  >
> >  > On 2023/08/17 11:23, Gurchetan Singh wrote:
> >  >  > From: Gurchetan Singh  > 
> >  >  > >>
> >  >  >
> >  >  > This adds basic documentation for virtio-gpu.
> >  >  >
> >  >  > Suggested-by: Akihiko Odaki  > 
> >  >  > >>
> >  >  > Signed-off-by: Gurchetan Singh
> > mailto:gurchetansi...@chromium.org>
> >  >  > >>
> >  >  > Tested-by: Alyssa Ross mailto:h...@alyssa.is>
> > >>
> >  >  > Tested-by: Emmanouil Pitsidianakis
> >  >  > 
> >  > >>
> >  >  > Reviewed-by: Emmanouil Pitsidianakis
> >  >  > 
> >  > >>
> >  >  > ---
> >  >  > v2: - Incorporated suggestions by Akihiko Odaki
> >  >  >  - Listed the currently supported capset_names
> (Bernard)
> >  >  >
> >  >  > v3: - Incorporated suggestions by Akihiko Odaki and Alyssa
> > Ross
> >  >  >
> >  >  > v4: - Incorporated suggestions by Akihiko Odaki
> >  >  >
> >  >  > v5: - Removed pci suffix from examples
> >  >  >  - Verified that -device virtio-gpu-rutabaga works.
> > Strangely
> >  >  >enough, I don't remember changing anything, and I
> > remember
> >  >  >it not working.  I did rebase to top of tree though.
> >  >  >  - Fixed meson examples in crosvm docs
> >  >  >
> >  >  >   docs/system/device-emulation.rst   |   1 +
> >  >  >   docs/system/devices/virtio-gpu.rst | 113
> >  > +
> >  >  >   2 files changed, 114 insertions(+)
> >  >  >   create mode 100644 docs/system/devices/virtio-gpu.rst
> >  >  >
> >  >  > diff --git a/docs/system/device-emulation.rst
> >  > b/docs/system/device-emulation.rst
> >  >  > index 4491c4cbf7..1167f3a9f2 100644
> >  >  > --- a/docs/system/device-emulation.rst
> >  >  > +++ b/docs/system/device-emulation.rst
> >  >  > @@ -91,6 +91,7 @@ Emulated Devices
> >  >  >  devices/nvme.rst
> >  >  >  devices/usb.rst
> >  >  >  devices/vhost-user.rst
> >  >  > +   devices/virtio-gpu.rst
> >  >  >  devices/virtio-pmem.rst
> >  >  >  devices/vhost-user-rng.rst
> >  >  >  devices/canokey.rst
> >  >  > diff --git a/docs/system/devices/virtio-gpu.rst
> >  > b/docs/system/devices/virtio-gpu.rst
> >  >  > new file mode 100644
> >  >  > index 00..8c5c708272
> >  >  > --- /dev/null
> >  >  > +++ b/docs/system/devices/virtio-gpu.rst
> >  >  > @@ -0,0 +1,113 @@
> >  >  > +..
> >  >  > +   SPDX-License-Identifier: GPL-2.0
> >  >  > +
> >  >  > +virtio-gpu
> >  >  > +==
> >  >  > +
> >  >  > +This document explains the setup and usage of the
> > virtio-gpu device.
> >  >  > +The virtio-gpu device paravirtualizes the GPU and display
> >  > controller.
> >  >  > +
> >  >  > +Linux kernel support
> >  >  > +
> >  >  > +
> >  >  > +virtio-gpu requires a guest Linux kernel built with the
> >  >  > +``CONFIG_DRM_VIRTIO_GPU`` option.
> >  >  > +
> >  >  > +QEMU virtio-gpu variants
> >  >  > +
> >  >  > +
> >  >  > +QEMU virtio-gpu device variants come in the following
> form:
> >  >  > +
> >  >  > + * ``virtio-vga[-BACKEND]``
> >  >  > + * ``virtio-gpu[-BACKEND][-INTERFACE]``
> >  >  > + * ``vhost-user-vga``
> >  >  > + * ``vhost-user-pci``
> >  >  > +
> >  >  > +**Backends:** QEMU provides a 2D virtio-gpu backend, and
> two
> >  > accelerated
> >  >  >

[PATCH v10 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-21 Thread Gurchetan Singh

This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
  - Very simple initial display integration: just used Pixman
  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
calls

Next steps for Android VMs:
  - The next step would be improving display integration and UI interfaces
with the goal of the QEMU upstream graphics being in an emulator
release [d].

Next steps for Linux VMs for display virtualization:
  - For widespread distribution, someone needs to package Sommelier or the
wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
which allows disabling KMS hypercalls.  If anyone cares enough, it'll
probably be possible to build a custom VM variant that uses this display
virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
- Used error_report(..)
- Used g_autofree to fix leaks on error paths
- Removed unnecessary casts
- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files

v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau and
Bernard Berschow:
- Parenthesis in CHECK macro
- CHECK_RESULT(result, ..) --> CHECK(!result, ..)
- delay until g->parent_obj.enable = 1
- Additional cast fixes
- initialize directly in virtio_gpu_rutabaga_realize(..)
- add debug callback to hook into QEMU error's APIs

v3: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
- Autodetect Wayland socket when not explicitly specified
- Fix map_blob error paths
- Add comment why we need both `res` and `resource` in create blob
- Cast and whitespace fixes
- Big endian check comes before virtio_gpu_rutabaga_init().
- VirtIOVGARUTABAGA --> VirtIOVGARutabaga

v4: Incorporated feedback from Akihiko Odaki and Alyssa Ross:

[PATCH v10 2/9] virtio-gpu: CONTEXT_INIT feature

2023-08-21 Thread Gurchetan Singh

From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index ca1fb7b16f..4f2b0ba1f3 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8377c365ef 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.42.0.rc1.204.g551eb34607-goog

[PATCH v10 1/9] virtio: Add shared memory capability

2023-08-21 Thread Gurchetan Singh

From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Reviewed-by: Gurchetan Singh 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Acked-by: Huang Rui 
Tested-by: Huang Rui 
Reviewed-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index edbc0daa18..da8c9ea12d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.42.0.rc1.204.g551eb34607-goog

[PATCH v10 7/9] gfxstream + rutabaga: meson support

2023-08-21 Thread Gurchetan Singh

- Add meson detection of rutabaga_gfx
- Build virtio-gpu-rutabaga.c + associated vga/pci files when
  present

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v3: Fix alignment issues (Akihiko)

 hw/display/meson.build| 22 ++
 meson.build   |  7 +++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 34 insertions(+)

diff --git a/hw/display/meson.build b/hw/display/meson.build
index 413ba4ab24..e362d625dd 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -79,6 +79,13 @@ if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
  if_true: [files('virtio-gpu-gl.c', 
'virtio-gpu-virgl.c'), pixman, virgl])
 hw_display_modules += {'virtio-gpu-gl': virtio_gpu_gl_ss}
   endif
+
+  if rutabaga.found()
+virtio_gpu_rutabaga_ss = ss.source_set()
+virtio_gpu_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', rutabaga],
+   if_true: [files('virtio-gpu-rutabaga.c'), 
pixman])
+hw_display_modules += {'virtio-gpu-rutabaga': virtio_gpu_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
@@ -95,6 +102,12 @@ if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
  if_true: [files('virtio-gpu-pci-gl.c'), pixman])
 hw_display_modules += {'virtio-gpu-pci-gl': virtio_gpu_pci_gl_ss}
   endif
+  if rutabaga.found()
+virtio_gpu_pci_rutabaga_ss = ss.source_set()
+virtio_gpu_pci_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', 
'CONFIG_VIRTIO_PCI', rutabaga],
+   if_true: 
[files('virtio-gpu-pci-rutabaga.c'), pixman])
+hw_display_modules += {'virtio-gpu-pci-rutabaga': 
virtio_gpu_pci_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
@@ -113,6 +126,15 @@ if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
   virtio_vga_gl_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'),
 if_false: files('acpi-vga-stub.c'))
   hw_display_modules += {'virtio-vga-gl': virtio_vga_gl_ss}
+
+  if rutabaga.found()
+virtio_vga_rutabaga_ss = ss.source_set()
+virtio_vga_rutabaga_ss.add(when: ['CONFIG_VIRTIO_VGA', rutabaga],
+   if_true: [files('virtio-vga-rutabaga.c'), 
pixman])
+virtio_vga_rutabaga_ss.add(when: 'CONFIG_ACPI', if_true: 
files('acpi-vga.c'),
+if_false: 
files('acpi-vga-stub.c'))
+hw_display_modules += {'virtio-vga-rutabaga': virtio_vga_rutabaga_ss}
+  endif
 endif
 
 system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_lcdc.c'))
diff --git a/meson.build b/meson.build
index 98e68ef0b1..293f388e53 100644
--- a/meson.build
+++ b/meson.build
@@ -1069,6 +1069,12 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
dependencies: virgl))
   endif
 endif
+rutabaga = not_found
+if not get_option('rutabaga_gfx').auto() or have_system or have_vhost_user_gpu
+  rutabaga = dependency('rutabaga_gfx_ffi',
+ method: 'pkg-config',
+ required: get_option('rutabaga_gfx'))
+endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
   blkio = dependency('blkio',
@@ -4272,6 +4278,7 @@ summary_info += {'libtasn1':  tasn1}
 summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 summary_info += {'virgl support': virgl}
+summary_info += {'rutabaga support':  rutabaga}
 summary_info += {'blkio support': blkio}
 summary_info += {'curl support':  curl}
 summary_info += {'Multipath support': mpathpersist}
diff --git a/meson_options.txt b/meson_options.txt
index aaea5ddd77..dea3bf7d9c 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -224,6 +224,8 @@ option('vmnet', type : 'feature', value : 'auto',
description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
+option('rutabaga_gfx', type : 'feature', value : 'auto',
+   description: 'rutabaga_gfx support')
 option('png', type : 'feature', value : 'auto',
description: 'PNG support with libpng')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9da3fe299b..9a95b4f782 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -154,6 +154,7 @@ meson_options_help() {
   printf "%s\n" '  rbd Ceph block device driver'
   printf "%s\n" '  rdmaEnable RDMA-based migration'
   printf "%s\n" '  replication replication support'
+  printf "%s\n" '  rutabaga-gfxrutabaga_gfx support'
   printf "%s\n" '  sdl SDL user interface'
   printf "%s\n" '  sdl-image

[PATCH v10 5/9] gfxstream + rutabaga prep: added need defintions, fields, and options

2023-08-21 Thread Gurchetan Singh

This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: void *rutabaga --> struct rutabaga *rutabaga (Akihiko)
have a separate rutabaga device instead of using GL device (Bernard)

v2: VirtioGpuRutabaga --> VirtIOGPURutabaga (Akihiko)
move MemoryRegionInfo into VirtIOGPURutabaga (Akihiko)
remove 'ctx' field (Akihiko)
remove 'rutabaga_active'

v6: remove command from commit message, refer to docs instead (Manos)

 include/hw/virtio/virtio-gpu.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 55973e112f..e2a07e68d9 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -38,6 +38,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL)
 #define TYPE_VHOST_USER_GPU "vhost-user-gpu"
 OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU)
 
+#define TYPE_VIRTIO_GPU_RUTABAGA "virtio-gpu-rutabaga-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabaga, VIRTIO_GPU_RUTABAGA)
+
 struct virtio_gpu_simple_resource {
 uint32_t resource_id;
 uint32_t width;
@@ -94,6 +97,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
 VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
+VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -108,6 +112,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_rutabaga_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
 #define virtio_gpu_hostmem_enabled(_cfg) \
 (_cfg.hostmem > 0)
 
@@ -232,6 +238,28 @@ struct VhostUserGPU {
 bool backend_blocked;
 };
 
+#define MAX_SLOTS 4096
+
+struct MemoryRegionInfo {
+int used;
+MemoryRegion mr;
+uint32_t resource_id;
+};
+
+struct rutabaga;
+
+struct VirtIOGPURutabaga {
+struct VirtIOGPU parent_obj;
+
+struct MemoryRegionInfo memory_regions[MAX_SLOTS];
+char *capset_names;
+char *wayland_socket_path;
+char *wsi;
+bool headless;
+uint32_t num_capsets;
+struct rutabaga *rutabaga;
+};
+
 #define VIRTIO_GPU_FILL_CMD(out) do {   \
 size_t s;   \
 s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
-- 
2.42.0.rc1.204.g551eb34607-goog

[PATCH v10 9/9] docs/system: add basic virtio-gpu documentation

2023-08-21 Thread Gurchetan Singh

This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v2: - Incorporated suggestions by Akihiko Odaki
- Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

v5: - Removed pci suffix from examples
- Verified that -device virtio-gpu-rutabaga works.  Strangely
  enough, I don't remember changing anything, and I remember
  it not working.  I did rebase to top of tree though.
- Fixed meson examples in crosvm docs

v8: - Remove different links for "rutabaga_gfx" and
  "gfxstream-enabled rutabaga" (Akihiko)

 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/virtio-gpu.rst | 112 +
 2 files changed, 113 insertions(+)
 create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
devices/nvme.rst
devices/usb.rst
devices/vhost-user.rst
+   devices/virtio-gpu.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..2b3eb536f9
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,112 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage rutabaga_gfx to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests. Fo

[PATCH v10 3/9] virtio-gpu: hostmem

2023-08-21 Thread Gurchetan Singh

From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Tested-by: Alyssa Ross 
Acked-by: Michael S. Tsirkin 
---
 hw/display/virtio-gpu-pci.c| 14 ++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..da6a99f038 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(&vpci_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index bbd5c6561a..48ef0d9fad 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1509,6 +1509,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index e6fb0aa876..c8552ff760 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(&vpci_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, &vga->vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 8377c365ef..de4f624e94 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -137,6 +140,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.42.0.rc1.204.g551eb34607-goog

[PATCH v10 8/9] gfxstream + rutabaga: enable rutabaga

2023-08-21 Thread Gurchetan Singh

This change enables rutabaga to receive virtio-gpu-3d hypercalls
when it is active.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v3: Whitespace fix (Akihiko)
v9: reorder virtio_gpu_have_udmabuf() after checking if rutabaga
is enabled to avoid spurious warnings (Akihiko)

 hw/display/virtio-gpu-base.c | 3 ++-
 hw/display/virtio-gpu.c  | 5 +++--
 softmmu/qdev-monitor.c   | 3 +++
 softmmu/vl.c | 1 +
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..50c5373b65 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 {
 VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev);
 
-if (virtio_gpu_virgl_enabled(g->conf)) {
+if (virtio_gpu_virgl_enabled(g->conf) ||
+virtio_gpu_rutabaga_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_VIRGL);
 }
 if (virtio_gpu_edid_enabled(g->conf)) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3e658f1fef..fe094addef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1361,8 +1361,9 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
-if (!virtio_gpu_have_udmabuf()) {
-error_setg(errp, "cannot enable blob resources without udmabuf");
+if (!virtio_gpu_rutabaga_enabled(g->parent_obj.conf) &&
+!virtio_gpu_have_udmabuf()) {
+error_setg(errp, "need rutabaga or udmabuf for blob resources");
 return;
 }
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..1b8005ae55 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -86,6 +86,9 @@ static const QDevAlias qdev_alias_table[] = {
 { "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-gpu-gl-device", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-gpu-gl-pci", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_PCI },
+{ "virtio-gpu-rutabaga-device", "virtio-gpu-rutabaga",
+  QEMU_ARCH_VIRTIO_MMIO },
+{ "virtio-gpu-rutabaga-pci", "virtio-gpu-rutabaga", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_VIRTIO_CCW },
 { "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_VIRTIO_PCI },
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b0b96f67fa..2f98eefdf3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -216,6 +216,7 @@ static struct {
 { .driver = "ati-vga",  .flag = &default_vga   },
 { .driver = "vhost-user-vga",   .flag = &default_vga   },
 { .driver = "virtio-vga-gl",.flag = &default_vga   },
+{ .driver = "virtio-vga-rutabaga",  .flag = &default_vga   },
 };
 
 static QemuOptsList qemu_rtc_opts = {
-- 
2.42.0.rc1.204.g551eb34607-goog

[PATCH v10 4/9] virtio-gpu: blob prep

2023-08-21 Thread Gurchetan Singh

From: Antonio Caggiano 

This adds preparatory functions needed to:

 - decode blob cmds
 - tracking iovecs

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
 hw/display/virtio-gpu.c  | 10 +++---
 include/hw/virtio/virtio-gpu-bswap.h | 18 ++
 include/hw/virtio/virtio-gpu.h   |  5 +
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 48ef0d9fad..3e658f1fef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -33,15 +33,11 @@
 
 #define VIRTIO_GPU_VM_VERSION 1
 
-static struct virtio_gpu_simple_resource*
-virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 static struct virtio_gpu_simple_resource *
 virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id,
bool require_backing,
const char *caller, uint32_t *error);
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res);
 static void virtio_gpu_reset_bh(void *opaque);
 
 void virtio_gpu_update_cursor_data(VirtIOGPU *g,
@@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct 
virtio_gpu_update_cursor *cursor)
   cursor->resource_id ? 1 : 0);
 }
 
-static struct virtio_gpu_simple_resource *
+struct virtio_gpu_simple_resource *
 virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
 {
 struct virtio_gpu_simple_resource *res;
@@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 g_free(iov);
 }
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res)
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res)
 {
 virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
 res->iov = NULL;
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index 9124108485..dd1975e2d4 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -63,10 +63,28 @@ virtio_gpu_create_blob_bswap(struct 
virtio_gpu_resource_create_blob *cblob)
 {
 virtio_gpu_ctrl_hdr_bswap(&cblob->hdr);
 le32_to_cpus(&cblob->resource_id);
+le32_to_cpus(&cblob->blob_mem);
 le32_to_cpus(&cblob->blob_flags);
+le32_to_cpus(&cblob->nr_entries);
+le64_to_cpus(&cblob->blob_id);
 le64_to_cpus(&cblob->size);
 }
 
+static inline void
+virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob)
+{
+virtio_gpu_ctrl_hdr_bswap(&mblob->hdr);
+le32_to_cpus(&mblob->resource_id);
+le64_to_cpus(&mblob->offset);
+}
+
+static inline void
+virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob)
+{
+virtio_gpu_ctrl_hdr_bswap(&ublob->hdr);
+le32_to_cpus(&ublob->resource_id);
+}
+
 static inline void
 virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb)
 {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index de4f624e94..55973e112f 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
 void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout,
struct virtio_gpu_resp_edid *edid);
 /* virtio-gpu.c */
+struct virtio_gpu_simple_resource *
+virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
+
 void virtio_gpu_ctrl_response(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd,
   struct virtio_gpu_ctrl_hdr *resp,
@@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.42.0.rc1.204.g551eb34607-goog

Re: [PATCH v2 32/58] i386/tdx: Track RAM entries for TDX VM

2023-08-21 Thread Isaku Yamahata

On Fri, Aug 18, 2023 at 05:50:15AM -0400,
Xiaoyao Li  wrote:

> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> index e9d2888162ce..9b3c427766ef 100644
> --- a/target/i386/kvm/tdx.h
> +++ b/target/i386/kvm/tdx.h
> @@ -15,6 +15,17 @@ typedef struct TdxGuestClass {
>  ConfidentialGuestSupportClass parent_class;
>  } TdxGuestClass;
>  
> +enum TdxRamType{
> +TDX_RAM_UNACCEPTED,
> +TDX_RAM_ADDED,
> +};
> +
> +typedef struct TdxRamEntry {
> +uint64_t address;
> +uint64_t length;
> +uint32_t type;

nitpick: enum TdxRamType. and related function arguments.

-- 
Isaku Yamahata

Re: [PATCH v2 19/58] qom: implement property helper for sha384

2023-08-21 Thread Isaku Yamahata

On Mon, Aug 21, 2023 at 10:25:35AM +0100,
"Daniel P. Berrangé"  wrote:

> On Fri, Aug 18, 2023 at 05:50:02AM -0400, Xiaoyao Li wrote:
> > From: Isaku Yamahata 
> > 
> > Implement property_add_sha384() which converts hex string <-> uint8_t[48]
> > It will be used for TDX which uses sha384 for measurement.
> 
> I think it is likely a better idea to use base64 for the encoding
> the binary hash - we use base64 for all the sev-guest properties
> that were binary data.
> 
> At which points the property set/get logic is much simpler as it
> is just needing a call to  g_base64_encode / g_base64_decode and
> length validation for the decode case.

Hex string is poplar to show hash value, isn't it?  Anyway it's easy for human
operator, shell scripts, libvirt or whatever to convert those representations
with utility commands like base64 or xxd, or library call.  Either way would
work.
-- 
Isaku Yamahata

Re: [PATCH v2 08/58] i386/tdx: Adjust the supported CPUID based on TDX restrictions

2023-08-21 Thread Isaku Yamahata

On Fri, Aug 18, 2023 at 05:49:51AM -0400,
Xiaoyao Li  wrote:

> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 56cb826f6125..3198bc9fd5fb 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
...
> +static inline uint32_t host_cpuid_reg(uint32_t function,
> +  uint32_t index, int reg)
> +{
> +uint32_t eax, ebx, ecx, edx;
> +uint32_t ret = 0;
> +
> +host_cpuid(function, index, &eax, &ebx, &ecx, &edx);
> +
> +switch (reg) {
> +case R_EAX:
> +ret |= eax;
> +break;
> +case R_EBX:
> +ret |= ebx;
> +break;
> +case R_ECX:
> +ret |= ecx;
> +break;
> +case R_EDX:
> +ret |= edx;

Nitpick: "|" isn't needed as we initialize ret = 0 above. Just '='.
-- 
Isaku Yamahata

virtio queue numbering and optional queues

2023-08-21 Thread Daniel Verkamp

Hello virtio folks,

I noticed a mismatch between the way the specification defines
device-specific virtqueue indexes and the way device and driver
implementers have interpreted the specification. As a practical example,
consider the traditional memory balloon device [1]. The first two queues
(indexes 0 and 1) are available as part of the baseline device, but the
rest of the queues are tied to feature bits.

Section 5.5.2, "Virtqueues", gives a list that appears to be a mapping from
queue index to queue name/function, defining queue index 3 as free_page_vq
and index 4 as reporting_vq, and declaring that "free_page_vq only exists
if VIRTIO_BALLOON_F_FREE_PAGE_HINT is set" and "reporting_vq only exists if
VIRTIO_BALLOON_F_PAGE_REPORTING is set." This wording is a bit vague, but I
assume "is set" means "is negotiated" (not just "advertised by the
device"). Also presumably "exists" means something like "may only be used
by the driver if the feature bit is negotiated" and "should be ignored by
the device if the feature bit is not negotiated", although it would be nice
to have a proper definition in the spec somewhere.

Section 5.5.3, "Feature bits", gives definitions of the feature bits, with
similar descriptions of the relationship between the feature bits and
virtqueue availability, although the wording is slightly different
("present" rather than "exists"). No dependency between feature bits is
defined, so it seems like it should be valid for a device or driver to
support or accept one of the higher-numbered features while not supporting
a lower-numbered one.


Notably, there is no mention of queue index assignments changing based on
negotiated features in either of these sections. Hence a reader can only
assume that the queue index assignments are fixed (i.e. stats_vq will
always be vq index 4 if F_STATS_VQ is negotiated, regardless of any other
feature bits).

Now consider a scenario where VIRTIO_BALLOON_F_STATS_VQ and
VIRTIO_BALLOON_F_PAGE_REPORTING are negotiated but
VIRTIO_BALLOON_F_FREE_PAGE_HINT is not (perhaps the device supports all of
the defined features but the driver only wants to use reporting_vq, not
free_page_vq). In this case, what queue index should be used by the driver
when enabling reporting_vq? My reading of the specification is that the
reporting_vq is always queue index 4, independent of whether
VIRTIO_BALLOON_F_STATS_VQ or VIRTIO_BALLOON_F_FREE_PAGE_HINT are
negotiated, but this contradicts existing device and driver
implementations, which will use queue index 3 (the next one after stats_vq
= 2) as reporting_vq in this case.

The qemu virtio-ballon device [2] assigns the next-highest unused queue
index when calling virtio_add_queue(), and in the scenario presented above,
free_page_vq will not be added since F_STATS_VQ is not negotiated, so
reporting_vq will be assigned queue index 3, rather than 4. (Additionally,
qemu always adds the stats_vq regardless of negotiated features, but that's
irrelevant in this case since we are assuming the STATS_VQ feature is
negotiated.)

The Linux virtio driver code originally seemed to use the correct (by my
reading) indexes, but it was changed to match the layout used by qemu in a
2019 commit ("virtio_pci: use queue idx instead of array idx to set up the
vq") [3] - in other words, it will now also expect queue index 3 to be
reporting_vq in the scenario laid out above.

I'm not sure how to resolve the mismatch between the specification and
actual implementation behavior. The simplest change would probably be to
rewrite the specification to drop the explicit queue indexes in section
5.5.2 and add some wording about how queues are numbered based on
negotiated feature bits (this would need to be applied to other device
types that have specified queue indexes as well). However, this would also
technically be an incompatible change of the specification. On the other
hand, changing the device and driver implementations to match the
specification would be even more challenging, since it would be an
incompatible change in actual practice, not just a change of the spec to
match consensus implementation behavior.


Perhaps drivers could add a quirk to detect old versions of the qemu device
and use the old behavior, while enabling the correct behavior only for
other device vendors and newer qemu device revisions, and the qemu device
could add an opt-in feature to enable the correct behavior that users would
need to enable only when they know they have a sufficiently new driver with
the fix.


Or maybe there could be a new feature bit that would opt into following the
spec-defined queue indexes (VIRTIO_F_VERSION_2?) and some new wording to
require devices to use the old behavior when that bit is not negotiated,
but that also feels less than ideal to me.

Any thoughts on how to proceed with this situation? Is my reading of the
specification just wrong?

Thanks,

-- Daniel

[1]:
https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-3160002

Re: [PATCH v2 45/58] i386/tdx: Limit the range size for MapGPA

2023-08-21 Thread Isaku Yamahata

On Fri, Aug 18, 2023 at 05:50:28AM -0400,
Xiaoyao Li  wrote:

> From: Isaku Yamahata 
> 
> If the range for TDG.VP.VMCALL is too large, process the limited
> size and return retry error.  It's bad for VMM to take too long time,
> e.g. second order, with blocking vcpu execution.  It results in too many
> missing timer interrupts.

This patch requires the guest side patch. [1]
Unless with large guest memory, it's unlikely to hit the limit with KVM/qemu,
though.

[1] https://lore.kernel.org/all/20230811021246.821-1-de...@microsoft.com/

> 
> Signed-off-by: Isaku Yamahata 
> Signed-off-by: Xiaoyao Li 
> ---
>  target/i386/kvm/tdx.c | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 0c43c1f7759f..ced55be506d1 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -994,12 +994,16 @@ static hwaddr tdx_shared_bit(X86CPU *cpu)
>  return (cpu->phys_bits > 48) ? BIT_ULL(51) : BIT_ULL(47);
>  }
>  
> +/* 64MB at most in one call. What value is appropriate? */
> +#define TDX_MAP_GPA_MAX_LEN (64 * 1024 * 1024)
> +
>  static void tdx_handle_map_gpa(X86CPU *cpu, struct kvm_tdx_vmcall *vmcall)
>  {
>  hwaddr shared_bit = tdx_shared_bit(cpu);
>  hwaddr gpa = vmcall->in_r12 & ~shared_bit;
>  bool private = !(vmcall->in_r12 & shared_bit);
>  hwaddr size = vmcall->in_r13;
> +bool retry = false;
>  int ret = 0;
>  
>  vmcall->status_code = TDG_VP_VMCALL_INVALID_OPERAND;
> @@ -1018,12 +1022,25 @@ static void tdx_handle_map_gpa(X86CPU *cpu, struct 
> kvm_tdx_vmcall *vmcall)
>  return;
>  }
>  
> +if (size > TDX_MAP_GPA_MAX_LEN) {
> +retry = true;
> +size = TDX_MAP_GPA_MAX_LEN;
> +}
> +
>  if (size > 0) {
>  ret = kvm_convert_memory(gpa, size, private);
>  }
>  
>  if (!ret) {
> -vmcall->status_code = TDG_VP_VMCALL_SUCCESS;
> +if (retry) {
> +vmcall->status_code = TDG_VP_VMCALL_RETRY;
> +vmcall->out_r11 = gpa + size;
> +if (!private) {
> +vmcall->out_r11 |= shared_bit;
> +}
> +} else {
> +vmcall->status_code = TDG_VP_VMCALL_SUCCESS;
> +}
>  }
>  }
>  
> -- 
> 2.34.1
> 
> 

-- 
Isaku Yamahata

Re: [PATCH v6 00/12] Add VIRTIO sound card

2023-08-21 Thread Volker Rümelin


Am 21.08.23 um 08:10 schrieb Manos Pitsidianakis:

Hello Volker,

On Sun, 20 Aug 2023 14:46, Volker Rümelin  wrote:
I tested the virtio-sound-pci device. It seems the device works 
unreliably. Audio playback has a lot of dropouts. I can actually hear 
my mouse moving around. Audio recording with audacity doesn't work. 
Either recording stops with an error or the recorded stream is silent.


I'll see if I can change the code so audio playback works reliably. I 
don't think it makes sense to review the current code as it is. I 
will of course report any issues I find.


have you been having this bad performance with pulseaudio/pipewire? 
Are you using alsa for playback/recording in the guest?


I am asking because this was my setup and I was wondering if it 
affected the code I ended up with. For me I had normal playback, 
except for a short delay at first (maybe something to do with alsa 
buffer lengths, I am not familiar with ALSA much).


If you can share your guest and host setup you used for this I can try 
replicating it.




Hi Manos,

on the host I use pipewire. The audio device used for playback and 
recording is a Intel HDA device. I also tested recording from the 
playback monitor of the HDA device. The important command line arguments 
are: ./qemu-system-x86_64 -machine q35 -device 
virtio-vga-gl,xres=1280,yres=768,bus=pcie.0 -display 
gtk,zoom-to-fit=off,gl=on -machine pcspk-audiodev=audio0 -device 
virtio-sound-pci,bus=pcie.0,audiodev=audio0 -audiodev 
pipewire,out.frequency=48000,in.frequency=48000,id=audio0


The guest is Linux OpenSUSE 15.5 system. The guest uses PulseAudio. This 
means audacity ALSA audio recording was routed through PulseAudio. 
Audacity doesn't really start recording but after a few seconds it 
reports 'Wait timed out' and 'Error opening sound device. Try changing 
the audio host, recording device and the project sample rate'.


When I start QEMU with -audiodev 
pipewire,out.mixing-engine=off,in.mixing-engine=off,id=audio0 audacity 
only records silence.


For playback I use Rhythmbox or Audacity. If you don't immediately hear 
dropouts try activating and deactivating the QEMU gtk window in quick 
succession. A slightly increased processor load like moving the mouse 
around also increases the dropout rate.


With best regards,
Volker

trace_exec_tb(tb, pc) does not have cpu index

2023-08-21 Thread Igor Lesik

Hi.

I am wondering why trace events like trace_exec_tb(tb, pc) do not have cpu 
index, how to make sense of the trace in case of multiple vCPUs?
I have changed it to trace_exec_tb(tb, pc, cpu->cpu_index) to read my trace, 
and now wondering should not it be there by default? Am I missing something?

I am using "simple" trace backend.

Thanks,
Igor

Re: [PATCH 0/2] block-backend: process I/O in the current AioContext

2023-08-21 Thread Stefan Hajnoczi

On Fri, Aug 18, 2023 at 05:24:22PM +0200, Kevin Wolf wrote:
> Am 15.08.2023 um 18:05 hat Stefan Hajnoczi geschrieben:
> > Switch blk_aio_*() APIs over to multi-queue by using
> > qemu_get_current_aio_context() instead of blk_get_aio_context(). This change
> > will allow devices to process I/O in multiple IOThreads in the future.
> 
> Both code paths still use blk_aio_em_aiocb_info, which is:
> 
> static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
> {
> BlkAioEmAIOCB *acb = container_of(acb_, BlkAioEmAIOCB, common);
> 
> return blk_get_aio_context(acb->rwco.blk);
> }
> 
> static const AIOCBInfo blk_aio_em_aiocb_info = {
> .aiocb_size = sizeof(BlkAioEmAIOCB),
> .get_aio_context= blk_aio_em_aiocb_get_aio_context,
> };
> 
> .get_aio_context() is called by bdrv_aio_cancel(), which already looks
> wrong before this patch because in theory it can end up polling the
> AioContext of a different thread. After this patch, .get_aio_context()
> doesn't even necessarily return the AioContext that runs the request any
> more.
> 
> The only thing that might save us is that I can't find any device that
> both supports iothreads and calls bdrv_aio_cancel(). But we shouldn't
> rely on that.
> 
> Maybe the solution is to just remove .get_aio_context altogether and use
> AIO_WAIT_WHILE(NULL, ...) in bdrv_aio_cancel().

I will remove AIOCBInfo.get_aio_context in v2.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2] target/riscv: Update CSR bits name for svadu extension

2023-08-21 Thread Alistair Francis

On Wed, Aug 16, 2023 at 10:20 AM Weiwei Li  wrote:
>
> The Svadu specification updated the name of the *envcfg bit from
> HADE to ADUE.
>
> Signed-off-by: Weiwei Li 
> Signed-off-by: Junqiang Wang 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>
> v2:
> * rename hade variable name to adue suggested by Daniel
>
>  target/riscv/cpu.c|  4 ++--
>  target/riscv/cpu_bits.h   |  8 
>  target/riscv/cpu_helper.c |  6 +++---
>  target/riscv/csr.c| 12 ++--
>  4 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 6b93b04453..f04a985d55 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -875,9 +875,9 @@ static void riscv_cpu_reset_hold(Object *obj)
>  env->two_stage_lookup = false;
>
>  env->menvcfg = (cpu->cfg.ext_svpbmt ? MENVCFG_PBMTE : 0) |
> -   (cpu->cfg.ext_svadu ? MENVCFG_HADE : 0);
> +   (cpu->cfg.ext_svadu ? MENVCFG_ADUE : 0);
>  env->henvcfg = (cpu->cfg.ext_svpbmt ? HENVCFG_PBMTE : 0) |
> -   (cpu->cfg.ext_svadu ? HENVCFG_HADE : 0);
> +   (cpu->cfg.ext_svadu ? HENVCFG_ADUE : 0);
>
>  /* Initialized default priorities of local interrupts. */
>  for (i = 0; i < ARRAY_SIZE(env->miprio); i++) {
> diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> index 59f0ffd9e1..1c2ffae883 100644
> --- a/target/riscv/cpu_bits.h
> +++ b/target/riscv/cpu_bits.h
> @@ -745,12 +745,12 @@ typedef enum RISCVException {
>  #define MENVCFG_CBIE   (3UL << 4)
>  #define MENVCFG_CBCFE  BIT(6)
>  #define MENVCFG_CBZE   BIT(7)
> -#define MENVCFG_HADE   (1ULL << 61)
> +#define MENVCFG_ADUE   (1ULL << 61)
>  #define MENVCFG_PBMTE  (1ULL << 62)
>  #define MENVCFG_STCE   (1ULL << 63)
>
>  /* For RV32 */
> -#define MENVCFGH_HADE  BIT(29)
> +#define MENVCFGH_ADUE  BIT(29)
>  #define MENVCFGH_PBMTE BIT(30)
>  #define MENVCFGH_STCE  BIT(31)
>
> @@ -763,12 +763,12 @@ typedef enum RISCVException {
>  #define HENVCFG_CBIE   MENVCFG_CBIE
>  #define HENVCFG_CBCFE  MENVCFG_CBCFE
>  #define HENVCFG_CBZE   MENVCFG_CBZE
> -#define HENVCFG_HADE   MENVCFG_HADE
> +#define HENVCFG_ADUE   MENVCFG_ADUE
>  #define HENVCFG_PBMTE  MENVCFG_PBMTE
>  #define HENVCFG_STCE   MENVCFG_STCE
>
>  /* For RV32 */
> -#define HENVCFGH_HADE   MENVCFGH_HADE
> +#define HENVCFGH_ADUE   MENVCFGH_ADUE
>  #define HENVCFGH_PBMTE  MENVCFGH_PBMTE
>  #define HENVCFGH_STCE   MENVCFGH_STCE
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 9f611d89bb..3a02079290 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -861,11 +861,11 @@ static int get_physical_address(CPURISCVState *env, 
> hwaddr *physical,
>  }
>
>  bool pbmte = env->menvcfg & MENVCFG_PBMTE;
> -bool hade = env->menvcfg & MENVCFG_HADE;
> +bool adue = env->menvcfg & MENVCFG_ADUE;
>
>  if (first_stage && two_stage && env->virt_enabled) {
>  pbmte = pbmte && (env->henvcfg & HENVCFG_PBMTE);
> -hade = hade && (env->henvcfg & HENVCFG_HADE);
> +adue = adue && (env->henvcfg & HENVCFG_ADUE);
>  }
>
>  int ptshift = (levels - 1) * ptidxbits;
> @@ -1026,7 +1026,7 @@ restart:
>
>  /* Page table updates need to be atomic with MTTCG enabled */
>  if (updated_pte != pte && !is_debug) {
> -if (!hade) {
> +if (!adue) {
>  return TRANSLATE_FAIL;
>  }
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index ea7585329e..b4c66dc8ca 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1951,7 +1951,7 @@ static RISCVException write_menvcfg(CPURISCVState *env, 
> int csrno,
>  if (riscv_cpu_mxl(env) == MXL_RV64) {
>  mask |= (cfg->ext_svpbmt ? MENVCFG_PBMTE : 0) |
>  (cfg->ext_sstc ? MENVCFG_STCE : 0) |
> -(cfg->ext_svadu ? MENVCFG_HADE : 0);
> +(cfg->ext_svadu ? MENVCFG_ADUE : 0);
>  }
>  env->menvcfg = (env->menvcfg & ~mask) | (val & mask);
>
> @@ -1971,7 +1971,7 @@ static RISCVException write_menvcfgh(CPURISCVState 
> *env, int csrno,
>  const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
>  uint64_t mask = (cfg->ext_svpbmt ? MENVCFG_PBMTE : 0) |
>  (cfg->ext_sstc ? MENVCFG_STCE : 0) |
> -(cfg->ext_svadu ? MENVCFG_HADE : 0);
> +(cfg->ext_svadu ? MENVCFG_ADUE : 0);
>  uint64_t valh = (uint64_t)val << 32;
>
>  env->menvcfg = (env->menvcfg & ~mask) | (valh & mask);
> @@ -2023,7 +2023,7 @@ static RIS

Re: [PATCH 2/3] hw/char: riscv_htif: replace exit(0) with proper shutdown

2023-08-21 Thread Alistair Francis

On Fri, Aug 18, 2023 at 5:03 AM Clément Chigot  wrote:
>
> This replaces the exit(0) call by a shutdown request, ensuring a proper
> cleanup of Qemu. Otherwise, some connections like gdb could be broken
> without being correctly flushed.
>
> Signed-off-by: Clément Chigot 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/char/riscv_htif.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/hw/char/riscv_htif.c b/hw/char/riscv_htif.c
> index 37d3ccc76b..c49d20a221 100644
> --- a/hw/char/riscv_htif.c
> +++ b/hw/char/riscv_htif.c
> @@ -31,6 +31,7 @@
>  #include "qemu/error-report.h"
>  #include "exec/address-spaces.h"
>  #include "sysemu/dma.h"
> +#include "sysemu/runstate.h"
>
>  #define RISCV_DEBUG_HTIF 0
>  #define HTIF_DEBUG(fmt, ...) 
>   \
> @@ -205,7 +206,16 @@ static void htif_handle_tohost_write(HTIFState *s, 
> uint64_t val_written)
>  g_free(sig_data);
>  }
>
> -exit(exit_code);
> +/*
> + * Shutdown request is a clean way to stop the QEMU, compared
> + * to a direct call to exit(). But we can't pass the exit 
> code
> + * through it so avoid doing that when it can matter.
> + */
> +if (exit_code) {
> +exit(exit_code);
> +} else {
> +
> qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
> +}
>  } else {
>  uint64_t syscall[8];
>  cpu_physical_memory_read(payload, syscall, sizeof(syscall));
> --
> 2.25.1
>
>

Re: [PATCH 1/3] hw/misc/sifive_test.c: replace exit(0) with proper shutdown

2023-08-21 Thread Alistair Francis

On Fri, Aug 18, 2023 at 5:03 AM Clément Chigot  wrote:
>
> This replaces the exit(0) call by a shutdown request, ensuring a proper
> cleanup of Qemu. Otherwise, some connections like gdb could be broken
> without being correctly flushed.
>
> Signed-off-by: Clément Chigot 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/misc/sifive_test.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/misc/sifive_test.c b/hw/misc/sifive_test.c
> index 56df45bfe5..ab0674f8fe 100644
> --- a/hw/misc/sifive_test.c
> +++ b/hw/misc/sifive_test.c
> @@ -25,6 +25,7 @@
>  #include "qemu/module.h"
>  #include "sysemu/runstate.h"
>  #include "hw/misc/sifive_test.h"
> +#include "sysemu/sysemu.h"
>
>  static uint64_t sifive_test_read(void *opaque, hwaddr addr, unsigned int 
> size)
>  {
> @@ -41,7 +42,8 @@ static void sifive_test_write(void *opaque, hwaddr addr,
>  case FINISHER_FAIL:
>  exit(code);
>  case FINISHER_PASS:
> -exit(0);
> +qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
> +return;
>  case FINISHER_RESET:
>  qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
>  return;
> --
> 2.25.1
>
>

Re: [PATCH v6 01/12] Add virtio-sound device stub

2023-08-21 Thread Volker Rümelin


Am 21.08.23 um 08:06 schrieb Manos Pitsidianakis:

Hello Volker!

On Sun, 20 Aug 2023 12:33, Volker Rümelin  wrote:
I think the virtio-snd.c code, the trace events and the Kconfig 
VIRTIO_SND should be moved to hw/audio. The code for nearly all audio 
devices is in this directory. This would be similar to other virtio 
devices. E.g. the virtio-scsi code is in hw/scsi and the virtio-net 
code is in hw/net.


This was where it was initially but in previous patchset versions it 
was recommended to move them to hw/virtio. I don't mind either 
approach though.


Hi Manos,

Ok, then don't change the directory. I guess I will have to discuss this 
with Alex first.


With best regards,
Volker

Re: [PATCH v3 00/19] crypto: Provide clmul.h and host accel

2023-08-21 Thread Richard Henderson


On 8/21/23 11:08, Ard Biesheuvel wrote:

OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
on arm64/ThunderX2, and the speedup is 7x (\o/)


Excellent, thanks.


r~

Re: [PATCH v3 00/19] crypto: Provide clmul.h and host accel

2023-08-21 Thread Ard Biesheuvel

On Mon, 21 Aug 2023 at 18:18, Richard Henderson
 wrote:
>
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
>
> Changes for v3:
>   * Update target/i386 ops_sse.h.
>   * Apply r-b.
>
> Changes for v2:
>   * Only accelerate clmul_64; keep generic helpers for other sizes.
>   * Drop most of the Int128 interfaces, except for clmul_64.
>   * Use the same acceleration format as aes-round.h.
>
>
> r~
>
>
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-a...@kernel.org/
>
>
> Richard Henderson (19):
>   crypto: Add generic 8-bit carry-less multiply routines
>   target/arm: Use clmul_8* routines
>   target/s390x: Use clmul_8* routines
>   target/ppc: Use clmul_8* routines
>   crypto: Add generic 16-bit carry-less multiply routines
>   target/arm: Use clmul_16* routines
>   target/s390x: Use clmul_16* routines
>   target/ppc: Use clmul_16* routines
>   crypto: Add generic 32-bit carry-less multiply routines
>   target/arm: Use clmul_32* routines
>   target/s390x: Use clmul_32* routines
>   target/ppc: Use clmul_32* routines
>   crypto: Add generic 64-bit carry-less multiply routine
>   target/arm: Use clmul_64
>   target/i386: Use clmul_64
>   target/s390x: Use clmul_64
>   target/ppc: Use clmul_64
>   host/include/i386: Implement clmul.h
>   host/include/aarch64: Implement clmul.h
>

OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
on arm64/ThunderX2, and the speedup is 7x (\o/)

Tested-by: Ard Biesheuvel 
Acked-by: Ard Biesheuvel 



Distro qemu (no acceleration):

$ qemu-x86_64 --version
qemu-x86_64 version 7.2.4 (Debian 1:7.2+dfsg-7+deb12u1)

$ apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM   8856.13k13820.95k17375.49k16826.37k
16870.06k17208.66k


QEMU built with this series applied onto latest master:

$ ~/build/qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfffa320b0fcbfffd:0x8041020c01dc47a9
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM  14237.01k34176.34k70633.13k97372.84k
119668.74k   122049.88k

Re: [PATCH 2/2] hw/intc: Make rtc variable names consistent

2023-08-21 Thread Alistair Francis

On Mon, Aug 21, 2023 at 12:15 PM Jason Chien  wrote:
>
> Ping.

This has been applied to the RISC-V tree. It will go in after the QEMU
release freeze is over (probably a week or two).

Alistair

>
> On Fri, Aug 11, 2023 at 2:25 AM Alistair Francis  wrote:
>>
>> On Fri, Jul 28, 2023 at 4:57 AM Jason Chien  wrote:
>> >
>> > The variables whose values are given by cpu_riscv_read_rtc() should be 
>> > named
>> > "rtc". The variables whose value are given by cpu_riscv_read_rtc_raw()
>> > should be named "rtc_r".
>> >
>> > Signed-off-by: Jason Chien 
>>
>> Reviewed-by: Alistair Francis 
>>
>> Alistair
>>
>> > ---
>> >  hw/intc/riscv_aclint.c | 6 +++---
>> >  1 file changed, 3 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
>> > index bf77e29a70..25cf7a5d9d 100644
>> > --- a/hw/intc/riscv_aclint.c
>> > +++ b/hw/intc/riscv_aclint.c
>> > @@ -64,13 +64,13 @@ static void 
>> > riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
>> >  uint64_t next;
>> >  uint64_t diff;
>> >
>> > -uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
>> > +uint64_t rtc = cpu_riscv_read_rtc(mtimer);
>> >
>> >  /* Compute the relative hartid w.r.t the socket */
>> >  hartid = hartid - mtimer->hartid_base;
>> >
>> >  mtimer->timecmp[hartid] = value;
>> > -if (mtimer->timecmp[hartid] <= rtc_r) {
>> > +if (mtimer->timecmp[hartid] <= rtc) {
>> >  /*
>> >   * If we're setting an MTIMECMP value in the "past",
>> >   * immediately raise the timer interrupt
>> > @@ -81,7 +81,7 @@ static void 
>> > riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
>> >
>> >  /* otherwise, set up the future timer interrupt */
>> >  qemu_irq_lower(mtimer->timer_irqs[hartid]);
>> > -diff = mtimer->timecmp[hartid] - rtc_r;
>> > +diff = mtimer->timecmp[hartid] - rtc;
>> >  /* back to ns (note args switched in muldiv64) */
>> >  uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND, 
>> > timebase_freq);
>> >
>> > --
>> > 2.17.1
>> >
>> >

Re: [PATCH v3 19/19] host/include/aarch64: Implement clmul.h

2023-08-21 Thread Philippe Mathieu-Daudé


On 21/8/23 18:18, Richard Henderson wrote:

Detect PMULL in cpuinfo; implement the accel hook.

Signed-off-by: Richard Henderson 
---
  host/include/aarch64/host/cpuinfo.h  |  1 +
  host/include/aarch64/host/crypto/clmul.h | 41 
  util/cpuinfo-aarch64.c   |  4 ++-
  3 files changed, 45 insertions(+), 1 deletion(-)
  create mode 100644 host/include/aarch64/host/crypto/clmul.h


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v3 15/19] target/i386: Use clmul_64

2023-08-21 Thread Philippe Mathieu-Daudé


On 21/8/23 18:18, Richard Henderson wrote:

Use generic routine for 64-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
  target/i386/ops_sse.h | 40 +---
  1 file changed, 9 insertions(+), 31 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v5 4/5] qmp: Added new command to retrieve eBPF blob.

2023-08-21 Thread Markus Armbruster

Andrew Melnichenko  writes:

> Hi all,
> Thanks for the comments - I'll update and send new patches.
>
> On Sat, Aug 5, 2023 at 10:34 AM Markus Armbruster  wrote:
>>
>> Andrew Melnychenko  writes:
>>
>> > Now, the binary objects may be retrieved by id.
>> > It would require for future qmp commands that may require specific
>> > eBPF blob.
>> >
>> > Added command "request-ebpf". This command returns
>> > eBPF program encoded base64. The program taken from the
>> > skeleton and essentially is an ELF object that can be
>> > loaded in the future with libbpf.
>> >
>> > The reason to use the command to provide the eBPF object
>> > instead of a separate artifact was to avoid issues related
>> > to finding the eBPF itself. eBPF object is an ELF binary
>> > that contains the eBPF program and eBPF map description(BTF).
>> > Overall, eBPF object should contain the program and enough
>> > metadata to create/load eBPF with libbpf. As the eBPF
>> > maps/program should correspond to QEMU, the eBPF can't
>> > be used from different QEMU build.
>> >
>> > The first solution was a helper that comes with QEMU
>> > and loads appropriate eBPF objects. And the issue is
>> > to find a proper helper if the system has several
>> > different QEMUs installed and/or built from the source,
>> > which helpers may not be compatible.
>> >
>> > Another issue is QEMU updating while there is a running
>> > QEMU instance. With an updated helper, it may not be
>> > possible to hotplug virtio-net device to the already
>> > running QEMU. Overall, requesting the eBPF object from
>> > QEMU itself solves possible failures with acceptable effort.
>> >
>> > Links:
>> > [PATCH 3/5] qmp: Added the helper stamp check.
>> > https://lore.kernel.org/all/20230219162100.174318-4-and...@daynix.com/
>> >
>> > Signed-off-by: Andrew Melnychenko 
>> > ---
>>
>> [...]
>>
>> > diff --git a/qapi/ebpf.json b/qapi/ebpf.json
>> > new file mode 100644
>> > index 000..40851f8c177
>> > --- /dev/null
>> > +++ b/qapi/ebpf.json

[...]

>> > +##
>> > +# @request-ebpf:
>> > +#
>> > +# Returns eBPF object that can be loaded with libbpf.
>> > +# Management applications (g.e. libvirt) may load it and pass file
>> > +# descriptors to QEMU. Which allows running QEMU without BPF capabilities.
>> > +# It's crucial that eBPF program/map is compatible with QEMU, so it's
>> > +# provided through QMP.
>> > +#
>> > +# Returns: RSS eBPF object encoded in base64.
>>
>> What does "RSS" mean?
>
> RSS - Receive-side Scaling.

Suggest to use something like "receive-side scaling (RSS)" the first
time.

You could also put a general introduction right below the header, like

  ##
  # = eBPF Objects
  #
  # Text goes here
  ##

This is not a demand.

[...]

Re: [PATCH 05/21] block: Introduce bdrv_schedule_unref()

2023-08-21 Thread Kevin Wolf

Am 18.08.2023 um 18:26 hat Eric Blake geschrieben:
> On Fri, Aug 18, 2023 at 11:24:00AM -0500, Eric Blake wrote:
> > > +++ b/block/graph-lock.c
> > > @@ -163,17 +163,26 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
> > >  void bdrv_graph_wrunlock(void)
> > >  {
> > >  GLOBAL_STATE_CODE();
> > > -QEMU_LOCK_GUARD(&aio_context_list_lock);
> > >  assert(qatomic_read(&has_writer));
> > >  
> > > +WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
> > > +/*
> > > + * No need for memory barriers, this works in pair with
> > > + * the slow path of rdlock() and both take the lock.
> > > + */
> > > +qatomic_store_release(&has_writer, 0);
> > > +
> > > +/* Wake up all coroutine that are waiting to read the graph */
> > > +qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > 
> > So if I understand coroutines correctly, this says all pending
> > coroutines are now scheduled to run (or maybe they do try to run here,
> > but then immediately return control back to this coroutine to await
> > the right lock conditions since we are still in the block guarded by
> > list_lock)...
> > 
> > > +}
> > > +
> > >  /*
> > > - * No need for memory barriers, this works in pair with
> > > - * the slow path of rdlock() and both take the lock.
> > > + * Run any BHs that were scheduled during the wrlock section and that
> > > + * callers might expect to have finished (e.g. bdrv_unref() calls). 
> > > Do this
> > > + * only after restarting coroutines so that nested event loops in 
> > > BHs don't
> > > + * deadlock if their condition relies on the coroutine making 
> > > progress.
> > >   */
> > > -qatomic_store_release(&has_writer, 0);
> > > -
> > > -/* Wake up all coroutine that are waiting to read the graph */
> > > -qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
> > > +aio_bh_poll(qemu_get_aio_context());
> > 
> > ...and as long as the other coroutines sharing this thread don't
> > actually get to make progress until the next point at which the
> > current coroutine yields, and as long as our aio_bh_poll() doesn't
> > also include a yield point, then we are ensured that the BH has
> > completed before the next yield point in our caller.
> > 
> > There are times (like today) where I'm still trying to wrap my mind
> > about the subtle differences between true multi-threading
> > vs. cooperative coroutines sharing a single thread via the use of
> > yield points.  coroutines are cool when they can get rid of some of
> > the races that you have to worry about in true multi-threading.
> 
> That said, once we introduce multi-queue, can we ever have a scenario
> where a different iothread might be trying to access the graph and
> perform a reopen in the time while this thread has not completed the
> BH close?  Or is that protected by some other mutual exclusion (where
> the only one we have to worry about is reopen by a coroutine in the
> same thread, because all other threads are locked out of graph
> modifications)?

We don't have to worry about that one because reopen (and taking the
writer lock in general) only happen in the main thread, which is exactly
the thread that this code runs in.

The thing that we need to take into consideration is that aio_bh_poll()
could call something that wants to take the writer lock and modify the
graph again. It's not really a problem, though, because we're already
done at that point. Any readers that we resumed above will just
synchronise with the writer in the usual way and one of them will have
to wait. But there is nothing that is still waiting to be resumed and
could deadlock.

Kevin

Re: [PATCH v10 00/10] migration: Modify 'migrate' and 'migrate-incoming' QAPI commands for migration

2023-08-21 Thread Peter Xu

Het,

On Mon, Aug 21, 2023 at 11:43:02AM +0530, Het Gala wrote:
> Hi qemu-devel community,
> 
> A gentle reminder and request for all migration maintainers - Peter, Juan,
> Dr. Gilbert and others too for review of the patchset series. Received
> reviewed-by from Daniel on migration implementation patches but need final
> approval from migration maintainers before getting it merged. Also got
> acked-by tag from Markus on the QAPI patches. This is Part1 of the 4
> patchset series. Ultimate goal of the whole 4 series is to 'introduce
> multiple interface support on top of existing multifd capability'. Hope to
> get approval or comments from migration maintainers on the patches soon.

This all looks right to me from high level.  I'd just trust Daniel's in
depth review already, considering that Juan will probably give a final
round of look soon, anyway.

Juan was just busy in the past few weeks; I suppose he'll catch up very
soon.

Thanks,

-- 
Peter Xu

[PATCH v3 02/19] target/arm: Use clmul_8* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 8-bit carry-less multiply.
Remove our local version of pmull_h.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_internal.h |  5 
 target/arm/tcg/mve_helper.c   |  8 ++
 target/arm/tcg/vec_helper.c   | 53 ---
 3 files changed, 9 insertions(+), 57 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index 1f4ed80ff7..c4afba6d9f 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -219,11 +219,6 @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, 
bool, uint32_t *);
 int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
 int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 
-/*
- * 8 x 8 -> 16 vector polynomial multiply where the inputs are
- * in the low 8 bits of each 16-bit element
-*/
-uint64_t pmull_h(uint64_t op1, uint64_t op2);
 /*
  * 16 x 16 -> 32 vector polynomial multiply where the inputs are
  * in the low 16 bits of each 32-bit element
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index 403b345ea3..96ddfb4b3a 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -26,6 +26,7 @@
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
 #include "fpu/softfloat.h"
+#include "crypto/clmul.h"
 
 static uint16_t mve_eci_mask(CPUARMState *env)
 {
@@ -984,15 +985,12 @@ DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
  * Polynomial multiply. We can always do this generating 64 bits
  * of the result at a time, so we don't need to use DO_2OP_L.
  */
-#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
 #define VMULLPW_MASK 0xULL
-#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
-#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
 #define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
 #define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
 
-DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
-DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
+DO_2OP(vmullpbh, 8, uint64_t, clmul_8x4_even)
+DO_2OP(vmullpth, 8, uint64_t, clmul_8x4_odd)
 DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
 DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 6712a2c790..cd630ff905 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -23,6 +23,7 @@
 #include "tcg/tcg-gvec-desc.h"
 #include "fpu/softfloat.h"
 #include "qemu/int128.h"
+#include "crypto/clmul.h"
 #include "vec_internal.h"
 
 /*
@@ -1986,21 +1987,11 @@ void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, 
uint32_t desc)
  */
 void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
 {
-intptr_t i, j, opr_sz = simd_oprsz(desc);
+intptr_t i, opr_sz = simd_oprsz(desc);
 uint64_t *d = vd, *n = vn, *m = vm;
 
 for (i = 0; i < opr_sz / 8; ++i) {
-uint64_t nn = n[i];
-uint64_t mm = m[i];
-uint64_t rr = 0;
-
-for (j = 0; j < 8; ++j) {
-uint64_t mask = (nn & 0x0101010101010101ull) * 0xff;
-rr ^= mm & mask;
-mm = (mm << 1) & 0xfefefefefefefefeull;
-nn >>= 1;
-}
-d[i] = rr;
+d[i] = clmul_8x8_low(n[i], m[i]);
 }
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -2038,22 +2029,6 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, 
uint32_t desc)
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
-/*
- * 8x8->16 polynomial multiply.
- *
- * The byte inputs are expanded to (or extracted from) half-words.
- * Note that neon and sve2 get the inputs from different positions.
- * This allows 4 bytes to be processed in parallel with uint64_t.
- */
-
-static uint64_t expand_byte_to_half(uint64_t x)
-{
-return  (x & 0x00ff)
- | ((x & 0xff00) << 8)
- | ((x & 0x00ff) << 16)
- | ((x & 0xff00) << 24);
-}
-
 uint64_t pmull_w(uint64_t op1, uint64_t op2)
 {
 uint64_t result = 0;
@@ -2067,29 +2042,16 @@ uint64_t pmull_w(uint64_t op1, uint64_t op2)
 return result;
 }
 
-uint64_t pmull_h(uint64_t op1, uint64_t op2)
-{
-uint64_t result = 0;
-int i;
-for (i = 0; i < 8; ++i) {
-uint64_t mask = (op1 & 0x0001000100010001ull) * 0x;
-result ^= op2 & mask;
-op1 >>= 1;
-op2 <<= 1;
-}
-return result;
-}
-
 void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
 {
 int hi = simd_data(desc);
 uint64_t *d = vd, *n = vn, *m = vm;
 uint64_t nn = n[hi], mm = m[hi];
 
-d[0] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
+d[0] = clmul_8x4_packed(nn, mm);
 nn >>= 32;
 mm >>= 32;
-d[1] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
+d[1] = clmul_8x4_packed(nn, mm);
 
 clear_tail(d, 16, simd_maxsz(desc));
 }
@@ -2102,10 +2064,7 @@ void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, 
uint32_t

[PATCH v3 18/19] host/include/i386: Implement clmul.h

2023-08-21 Thread Richard Henderson

Detect PCLMUL in cpuinfo; implement the accel hook.

Signed-off-by: Richard Henderson 
---
 host/include/i386/host/cpuinfo.h|  1 +
 host/include/i386/host/crypto/clmul.h   | 29 +
 host/include/x86_64/host/crypto/clmul.h |  1 +
 include/qemu/cpuid.h|  3 +++
 util/cpuinfo-i386.c |  1 +
 5 files changed, 35 insertions(+)
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h

diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index 073d0a426f..7ae21568f7 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -27,6 +27,7 @@
 #define CPUINFO_ATOMIC_VMOVDQA  (1u << 16)
 #define CPUINFO_ATOMIC_VMOVDQU  (1u << 17)
 #define CPUINFO_AES (1u << 18)
+#define CPUINFO_PCLMUL  (1u << 19)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/i386/host/crypto/clmul.h 
b/host/include/i386/host/crypto/clmul.h
new file mode 100644
index 00..dc3c814797
--- /dev/null
+++ b/host/include/i386/host/crypto/clmul.h
@@ -0,0 +1,29 @@
+/*
+ * x86 specific clmul acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef X86_HOST_CRYPTO_CLMUL_H
+#define X86_HOST_CRYPTO_CLMUL_H
+
+#include "host/cpuinfo.h"
+#include 
+
+#if defined(__PCLMUL__)
+# define HAVE_CLMUL_ACCEL  true
+# define ATTR_CLMUL_ACCEL
+#else
+# define HAVE_CLMUL_ACCEL  likely(cpuinfo & CPUINFO_PCLMUL)
+# define ATTR_CLMUL_ACCEL  __attribute__((target("pclmul")))
+#endif
+
+static inline Int128 ATTR_CLMUL_ACCEL
+clmul_64_accel(uint64_t n, uint64_t m)
+{
+union { __m128i v; Int128 s; } u;
+
+u.v = _mm_clmulepi64_si128(_mm_set_epi64x(0, n), _mm_set_epi64x(0, m), 0);
+return u.s;
+}
+
+#endif /* X86_HOST_CRYPTO_CLMUL_H */
diff --git a/host/include/x86_64/host/crypto/clmul.h 
b/host/include/x86_64/host/crypto/clmul.h
new file mode 100644
index 00..f25eced416
--- /dev/null
+++ b/host/include/x86_64/host/crypto/clmul.h
@@ -0,0 +1 @@
+#include "host/include/i386/host/crypto/clmul.h"
diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h
index 35325f1995..b11161555b 100644
--- a/include/qemu/cpuid.h
+++ b/include/qemu/cpuid.h
@@ -25,6 +25,9 @@
 #endif
 
 /* Leaf 1, %ecx */
+#ifndef bit_PCLMUL
+#define bit_PCLMUL  (1 << 1)
+#endif
 #ifndef bit_SSE4_1
 #define bit_SSE4_1  (1 << 19)
 #endif
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index 3a7b7e0ad1..36783fd199 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -39,6 +39,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 info |= (c & bit_SSE4_1 ? CPUINFO_SSE4 : 0);
 info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
 info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
+info |= (c & bit_PCLMUL ? CPUINFO_PCLMUL : 0);
 
 /* Our AES support requires PSHUFB as well. */
 info |= ((c & bit_AES) && (c & bit_SSSE3) ? CPUINFO_AES : 0);
-- 
2.34.1

[PATCH v3 17/19] target/ppc: Use clmul_64

2023-08-21 Thread Richard Henderson

Use generic routine for 64-bit carry-less multiply.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index ce793cf163..432834c7d5 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1456,20 +1456,9 @@ void helper_vpmsumw(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 
 void helper_VPMSUMD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-int i, j;
-Int128 tmp, prod[2] = {int128_zero(), int128_zero()};
-
-for (j = 0; j < 64; j++) {
-for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
-if (a->VsrD(i) & (1ull << j)) {
-tmp = int128_make64(b->VsrD(i));
-tmp = int128_lshift(tmp, j);
-prod[i] = int128_xor(prod[i], tmp);
-}
-}
-}
-
-r->s128 = int128_xor(prod[0], prod[1]);
+Int128 e = clmul_64(a->u64[0], b->u64[0]);
+Int128 o = clmul_64(a->u64[1], b->u64[1]);
+r->s128 = int128_xor(e, o);
 }
 
 #if HOST_BIG_ENDIAN
-- 
2.34.1

[PATCH v3 12/19] target/ppc: Use clmul_32* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 32-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 26 ++
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 10e19d8c9b..ce793cf163 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1446,28 +1446,14 @@ void helper_vpmsumh(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 }
 }
 
-#define PMSUM(name, srcfld, trgfld, trgtyp)   \
-void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)  \
-{ \
-int i, j; \
-trgtyp prod[sizeof(ppc_avr_t) / sizeof(a->srcfld[0])];\
-  \
-VECTOR_FOR_INORDER_I(i, srcfld) { \
-prod[i] = 0;  \
-for (j = 0; j < sizeof(a->srcfld[0]) * 8; j++) {  \
-if (a->srcfld[i] & (1ull << j)) { \
-prod[i] ^= ((trgtyp)b->srcfld[i] << j);   \
-} \
-} \
-} \
-  \
-VECTOR_FOR_INORDER_I(i, trgfld) { \
-r->trgfld[i] = prod[2 * i] ^ prod[2 * i + 1]; \
-} \
+void helper_vpmsumw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+for (int i = 0; i < 2; ++i) {
+uint64_t aa = a->u64[i], bb = b->u64[i];
+r->u64[i] = clmul_32(aa, bb) ^ clmul_32(aa >> 32, bb >> 32);
+}
 }
 
-PMSUM(vpmsumw, u32, u64, uint64_t)
-
 void helper_VPMSUMD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
 int i, j;
-- 
2.34.1

[PATCH v3 15/19] target/i386: Use clmul_64

2023-08-21 Thread Richard Henderson

Use generic routine for 64-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/i386/ops_sse.h | 40 +---
 1 file changed, 9 insertions(+), 31 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index a0e425733f..33908c0691 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -20,6 +20,7 @@
 
 #include "crypto/aes.h"
 #include "crypto/aes-round.h"
+#include "crypto/clmul.h"
 
 #if SHIFT == 0
 #define Reg MMXReg
@@ -2122,41 +2123,18 @@ target_ulong helper_crc32(uint32_t crc1, target_ulong 
msg, uint32_t len)
 
 #endif
 
-#if SHIFT == 1
-static void clmulq(uint64_t *dest_l, uint64_t *dest_h,
-  uint64_t a, uint64_t b)
-{
-uint64_t al, ah, resh, resl;
-
-ah = 0;
-al = a;
-resh = resl = 0;
-
-while (b) {
-if (b & 1) {
-resl ^= al;
-resh ^= ah;
-}
-ah = (ah << 1) | (al >> 63);
-al <<= 1;
-b >>= 1;
-}
-
-*dest_l = resl;
-*dest_h = resh;
-}
-#endif
-
 void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
 uint32_t ctrl)
 {
-uint64_t a, b;
-int i;
+int a_idx = (ctrl & 1) != 0;
+int b_idx = (ctrl & 16) != 0;
 
-for (i = 0; i < 1 << SHIFT; i += 2) {
-a = v->Q(((ctrl & 1) != 0) + i);
-b = s->Q(((ctrl & 16) != 0) + i);
-clmulq(&d->Q(i), &d->Q(i + 1), a, b);
+for (int i = 0; i < SHIFT; i++) {
+uint64_t a = v->Q(2 * i + a_idx);
+uint64_t b = s->Q(2 * i + b_idx);
+Int128 *r = (Int128 *)&d->ZMM_X(i);
+
+*r = clmul_64(a, b);
 }
 }
 
-- 
2.34.1

[PATCH v3 16/19] target/s390x: Use clmul_64

2023-08-21 Thread Richard Henderson

Use the generic routine for 64-bit carry-less multiply.
Remove our local version of galois_multiply64.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 58 +++
 1 file changed, 12 insertions(+), 46 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index ba284b5379..b18d8a6d16 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -21,13 +21,6 @@ static bool s390_vec_is_zero(const S390Vector *v)
 return !v->doubleword[0] && !v->doubleword[1];
 }
 
-static void s390_vec_xor(S390Vector *res, const S390Vector *a,
- const S390Vector *b)
-{
-res->doubleword[0] = a->doubleword[0] ^ b->doubleword[0];
-res->doubleword[1] = a->doubleword[1] ^ b->doubleword[1];
-}
-
 static void s390_vec_and(S390Vector *res, const S390Vector *a,
  const S390Vector *b)
 {
@@ -166,26 +159,6 @@ DEF_VCTZ(16)
 
 /* like binary multiplication, but XOR instead of addition */
 
-static S390Vector galois_multiply64(uint64_t a, uint64_t b)
-{
-S390Vector res = {};
-S390Vector va = {
-.doubleword[1] = a,
-};
-S390Vector vb = {
-.doubleword[1] = b,
-};
-
-while (!s390_vec_is_zero(&vb)) {
-if (vb.doubleword[1] & 0x1) {
-s390_vec_xor(&res, &res, &va);
-}
-s390_vec_shl(&va, &va, 1);
-s390_vec_shr(&vb, &vb, 1);
-}
-return res;
-}
-
 /*
  * There is no carry across the two doublewords, so their order does
  * not matter.  Nor is there partial overlap between registers.
@@ -265,32 +238,25 @@ void HELPER(gvec_vgfma32)(void *v1, const void *v2, const 
void *v3,
 void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
  uint32_t desc)
 {
-S390Vector tmp1, tmp2;
-uint64_t a, b;
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+Int128 r;
 
-a = s390_vec_read_element64(v2, 0);
-b = s390_vec_read_element64(v3, 0);
-tmp1 = galois_multiply64(a, b);
-a = s390_vec_read_element64(v2, 1);
-b = s390_vec_read_element64(v3, 1);
-tmp2 = galois_multiply64(a, b);
-s390_vec_xor(v1, &tmp1, &tmp2);
+r = int128_xor(clmul_64(q2[0], q3[0]), clmul_64(q2[1], q3[1]));
+q1[0] = int128_gethi(r);
+q1[1] = int128_getlo(r);
 }
 
 void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
   const void *v4, uint32_t desc)
 {
-S390Vector tmp1, tmp2;
-uint64_t a, b;
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+Int128 r;
 
-a = s390_vec_read_element64(v2, 0);
-b = s390_vec_read_element64(v3, 0);
-tmp1 = galois_multiply64(a, b);
-a = s390_vec_read_element64(v2, 1);
-b = s390_vec_read_element64(v3, 1);
-tmp2 = galois_multiply64(a, b);
-s390_vec_xor(&tmp1, &tmp1, &tmp2);
-s390_vec_xor(v1, &tmp1, v4);
+r = int128_xor(clmul_64(q2[0], q3[0]), clmul_64(q2[1], q3[1]));
+q1[0] = q4[0] ^ int128_gethi(r);
+q1[1] = q4[1] ^ int128_getlo(r);
 }
 
 #define DEF_VMAL(BITS) 
\
-- 
2.34.1

[PATCH v3 19/19] host/include/aarch64: Implement clmul.h

2023-08-21 Thread Richard Henderson

Detect PMULL in cpuinfo; implement the accel hook.

Signed-off-by: Richard Henderson 
---
 host/include/aarch64/host/cpuinfo.h  |  1 +
 host/include/aarch64/host/crypto/clmul.h | 41 
 util/cpuinfo-aarch64.c   |  4 ++-
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h

diff --git a/host/include/aarch64/host/cpuinfo.h 
b/host/include/aarch64/host/cpuinfo.h
index 769626b098..fe8c3b3fd1 100644
--- a/host/include/aarch64/host/cpuinfo.h
+++ b/host/include/aarch64/host/cpuinfo.h
@@ -10,6 +10,7 @@
 #define CPUINFO_LSE (1u << 1)
 #define CPUINFO_LSE2(1u << 2)
 #define CPUINFO_AES (1u << 3)
+#define CPUINFO_PMULL   (1u << 4)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/aarch64/host/crypto/clmul.h 
b/host/include/aarch64/host/crypto/clmul.h
new file mode 100644
index 00..bb516d8b2f
--- /dev/null
+++ b/host/include/aarch64/host/crypto/clmul.h
@@ -0,0 +1,41 @@
+/*
+ * AArch64 specific clmul acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef AARCH64_HOST_CRYPTO_CLMUL_H
+#define AARCH64_HOST_CRYPTO_CLMUL_H
+
+#include "host/cpuinfo.h"
+#include 
+
+/*
+ * 64x64->128 pmull is available with FEAT_PMULL.
+ * Both FEAT_AES and FEAT_PMULL are covered under the same macro.
+ */
+#ifdef __ARM_FEATURE_AES
+# define HAVE_CLMUL_ACCEL  true
+#else
+# define HAVE_CLMUL_ACCEL  likely(cpuinfo & CPUINFO_PMULL)
+#endif
+#if !defined(__ARM_FEATURE_AES) && defined(CONFIG_ARM_AES_BUILTIN)
+# define ATTR_CLMUL_ACCEL  __attribute__((target("+crypto")))
+#else
+# define ATTR_CLMUL_ACCEL
+#endif
+
+static inline Int128 ATTR_CLMUL_ACCEL
+clmul_64_accel(uint64_t n, uint64_t m)
+{
+union { poly128_t v; Int128 s; } u;
+
+#ifdef CONFIG_ARM_AES_BUILTIN
+u.v = vmull_p64((poly64_t)n, (poly64_t)m);
+#else
+asm(".arch_extension aes\n\t"
+"pmull %0.1q, %1.1d, %2.1d" : "=w"(u.v) : "w"(n), "w"(m));
+#endif
+return u.s;
+}
+
+#endif /* AARCH64_HOST_CRYPTO_CLMUL_H */
diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c
index ababc39550..1d565b8420 100644
--- a/util/cpuinfo-aarch64.c
+++ b/util/cpuinfo-aarch64.c
@@ -56,12 +56,14 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 unsigned long hwcap = qemu_getauxval(AT_HWCAP);
 info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0);
 info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0);
-info |= (hwcap & HWCAP_AES ? CPUINFO_AES: 0);
+info |= (hwcap & HWCAP_AES ? CPUINFO_AES : 0);
+info |= (hwcap & HWCAP_PMULL ? CPUINFO_PMULL : 0);
 #endif
 #ifdef CONFIG_DARWIN
 info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE;
 info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE2") * CPUINFO_LSE2;
 info |= sysctl_for_bool("hw.optional.arm.FEAT_AES") * CPUINFO_AES;
+info |= sysctl_for_bool("hw.optional.arm.FEAT_PMULL") * CPUINFO_PMULL;
 #endif
 
 cpuinfo = info;
-- 
2.34.1

[PATCH v3 07/19] target/s390x: Use clmul_16* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 16-bit carry-less multiply.
Remove our local version of galois_multiply16.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index edff4d6b2b..11477556e5 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -180,7 +180,6 @@ static uint##TBITS##_t 
galois_multiply##BITS(uint##TBITS##_t a,\
 }  
\
 return res;
\
 }
-DEF_GALOIS_MULTIPLY(16, 32)
 DEF_GALOIS_MULTIPLY(32, 64)
 
 static S390Vector galois_multiply64(uint64_t a, uint64_t b)
@@ -231,6 +230,30 @@ void HELPER(gvec_vgfma8)(void *v1, const void *v2, const 
void *v3,
 q1[1] = do_gfma8(q2[1], q3[1], q4[1]);
 }
 
+static inline uint64_t do_gfma16(uint64_t n, uint64_t m, uint64_t a)
+{
+return clmul_16x2_even(n, m) ^ clmul_16x2_odd(n, m) ^ a;
+}
+
+void HELPER(gvec_vgfm16)(void *v1, const void *v2, const void *v3, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+
+q1[0] = do_gfma16(q2[0], q3[0], 0);
+q1[1] = do_gfma16(q2[1], q3[1], 0);
+}
+
+void HELPER(gvec_vgfma16)(void *v1, const void *v2, const void *v3,
+ const void *v4, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+
+q1[0] = do_gfma16(q2[0], q3[0], q4[0]);
+q1[1] = do_gfma16(q2[1], q3[1], q4[1]);
+}
+
 #define DEF_VGFM(BITS, TBITS)  
\
 void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3, 
\
  uint32_t desc)
\
@@ -248,7 +271,6 @@ void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, 
const void *v3, \
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFM(16, 32)
 DEF_VGFM(32, 64)
 
 void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
@@ -284,7 +306,6 @@ void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, 
const void *v3,\
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFMA(16, 32)
 DEF_VGFMA(32, 64)
 
 void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
-- 
2.34.1

[PATCH v3 09/19] crypto: Add generic 32-bit carry-less multiply routines

2023-08-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 include/crypto/clmul.h |  7 +++
 crypto/clmul.c | 13 +
 2 files changed, 20 insertions(+)

diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
index c7ad28aa85..0ea25a252c 100644
--- a/include/crypto/clmul.h
+++ b/include/crypto/clmul.h
@@ -54,4 +54,11 @@ uint64_t clmul_16x2_even(uint64_t, uint64_t);
  */
 uint64_t clmul_16x2_odd(uint64_t, uint64_t);
 
+/**
+ * clmul_32:
+ *
+ * Perform a 32x32->64 carry-less multiply.
+ */
+uint64_t clmul_32(uint32_t, uint32_t);
+
 #endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
index 2c87cfbf8a..36ada1be9d 100644
--- a/crypto/clmul.c
+++ b/crypto/clmul.c
@@ -79,3 +79,16 @@ uint64_t clmul_16x2_odd(uint64_t n, uint64_t m)
 {
 return clmul_16x2_even(n >> 16, m >> 16);
 }
+
+uint64_t clmul_32(uint32_t n, uint32_t m32)
+{
+uint64_t r = 0;
+uint64_t m = m32;
+
+for (int i = 0; i < 32; ++i) {
+r ^= n & 1 ? m : 0;
+n >>= 1;
+m <<= 1;
+}
+return r;
+}
-- 
2.34.1

[PATCH v3 05/19] crypto: Add generic 16-bit carry-less multiply routines

2023-08-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 include/crypto/clmul.h | 16 
 crypto/clmul.c | 21 +
 2 files changed, 37 insertions(+)

diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
index 153b5e3057..c7ad28aa85 100644
--- a/include/crypto/clmul.h
+++ b/include/crypto/clmul.h
@@ -38,4 +38,20 @@ uint64_t clmul_8x4_odd(uint64_t, uint64_t);
  */
 uint64_t clmul_8x4_packed(uint32_t, uint32_t);
 
+/**
+ * clmul_16x2_even:
+ *
+ * Perform two 16x16->32 carry-less multiplies.
+ * The odd words of the inputs are ignored.
+ */
+uint64_t clmul_16x2_even(uint64_t, uint64_t);
+
+/**
+ * clmul_16x2_odd:
+ *
+ * Perform two 16x16->32 carry-less multiplies.
+ * The even bytes of the inputs are ignored.
+ */
+uint64_t clmul_16x2_odd(uint64_t, uint64_t);
+
 #endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
index 82d873fee5..2c87cfbf8a 100644
--- a/crypto/clmul.c
+++ b/crypto/clmul.c
@@ -58,3 +58,24 @@ uint64_t clmul_8x4_packed(uint32_t n, uint32_t m)
 {
 return clmul_8x4_even_int(unpack_8_to_16(n), unpack_8_to_16(m));
 }
+
+uint64_t clmul_16x2_even(uint64_t n, uint64_t m)
+{
+uint64_t r = 0;
+
+n &= 0xull;
+m &= 0xull;
+
+for (int i = 0; i < 16; ++i) {
+uint64_t mask = (n & 0x00010001ull) * 0xull;
+r ^= m & mask;
+n >>= 1;
+m <<= 1;
+}
+return r;
+}
+
+uint64_t clmul_16x2_odd(uint64_t n, uint64_t m)
+{
+return clmul_16x2_even(n >> 16, m >> 16);
+}
-- 
2.34.1

[PATCH v3 08/19] target/ppc: Use clmul_16* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 16-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 343874863a..10e19d8c9b 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1438,6 +1438,14 @@ void helper_vpmsumb(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 }
 }
 
+void helper_vpmsumh(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+for (int i = 0; i < 2; ++i) {
+uint64_t aa = a->u64[i], bb = b->u64[i];
+r->u64[i] = clmul_16x2_even(aa, bb) ^ clmul_16x2_odd(aa, bb);
+}
+}
+
 #define PMSUM(name, srcfld, trgfld, trgtyp)   \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)  \
 { \
@@ -1458,7 +1466,6 @@ void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t 
*b)  \
 } \
 }
 
-PMSUM(vpmsumh, u16, u32, uint32_t)
 PMSUM(vpmsumw, u32, u64, uint64_t)
 
 void helper_VPMSUMD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1

[PATCH v3 13/19] crypto: Add generic 64-bit carry-less multiply routine

2023-08-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 host/include/generic/host/crypto/clmul.h | 15 +++
 include/crypto/clmul.h   | 19 +++
 crypto/clmul.c   | 18 ++
 3 files changed, 52 insertions(+)
 create mode 100644 host/include/generic/host/crypto/clmul.h

diff --git a/host/include/generic/host/crypto/clmul.h 
b/host/include/generic/host/crypto/clmul.h
new file mode 100644
index 00..915bfb88d3
--- /dev/null
+++ b/host/include/generic/host/crypto/clmul.h
@@ -0,0 +1,15 @@
+/*
+ * No host specific carry-less multiply acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef GENERIC_HOST_CRYPTO_CLMUL_H
+#define GENERIC_HOST_CRYPTO_CLMUL_H
+
+#define HAVE_CLMUL_ACCEL  false
+#define ATTR_CLMUL_ACCEL
+
+Int128 clmul_64_accel(uint64_t, uint64_t)
+QEMU_ERROR("unsupported accel");
+
+#endif /* GENERIC_HOST_CRYPTO_CLMUL_H */
diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
index 0ea25a252c..c82d2d7559 100644
--- a/include/crypto/clmul.h
+++ b/include/crypto/clmul.h
@@ -8,6 +8,9 @@
 #ifndef CRYPTO_CLMUL_H
 #define CRYPTO_CLMUL_H
 
+#include "qemu/int128.h"
+#include "host/crypto/clmul.h"
+
 /**
  * clmul_8x8_low:
  *
@@ -61,4 +64,20 @@ uint64_t clmul_16x2_odd(uint64_t, uint64_t);
  */
 uint64_t clmul_32(uint32_t, uint32_t);
 
+/**
+ * clmul_64:
+ *
+ * Perform a 64x64->128 carry-less multiply.
+ */
+Int128 clmul_64_gen(uint64_t, uint64_t);
+
+static inline Int128 clmul_64(uint64_t a, uint64_t b)
+{
+if (HAVE_CLMUL_ACCEL) {
+return clmul_64_accel(a, b);
+} else {
+return clmul_64_gen(a, b);
+}
+}
+
 #endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
index 36ada1be9d..abf79cc49a 100644
--- a/crypto/clmul.c
+++ b/crypto/clmul.c
@@ -92,3 +92,21 @@ uint64_t clmul_32(uint32_t n, uint32_t m32)
 }
 return r;
 }
+
+Int128 clmul_64_gen(uint64_t n, uint64_t m)
+{
+uint64_t rl = 0, rh = 0;
+
+/* Bit 0 can only influence the low 64-bit result.  */
+if (n & 1) {
+rl = m;
+}
+
+for (int i = 1; i < 64; ++i) {
+uint64_t mask = -(n & 1);
+rl ^= (m << i) & mask;
+rh ^= (m >> (64 - i)) & mask;
+n >>= 1;
+}
+return int128_make128(rl, rh);
+}
-- 
2.34.1

[PATCH v3 06/19] target/arm: Use clmul_16* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 16-bit carry-less multiply.
Remove our local version of pmull_w.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_internal.h |  6 --
 target/arm/tcg/mve_helper.c   |  8 ++--
 target/arm/tcg/vec_helper.c   | 13 -
 3 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index c4afba6d9f..3ca1b94ccf 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -219,12 +219,6 @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, 
bool, uint32_t *);
 int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
 int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 
-/*
- * 16 x 16 -> 32 vector polynomial multiply where the inputs are
- * in the low 16 bits of each 32-bit element
- */
-uint64_t pmull_w(uint64_t op1, uint64_t op2);
-
 /**
  * bfdotadd:
  * @sum: addend
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index 96ddfb4b3a..c666a96ba1 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -985,14 +985,10 @@ DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
  * Polynomial multiply. We can always do this generating 64 bits
  * of the result at a time, so we don't need to use DO_2OP_L.
  */
-#define VMULLPW_MASK 0xULL
-#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
-#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
-
 DO_2OP(vmullpbh, 8, uint64_t, clmul_8x4_even)
 DO_2OP(vmullpth, 8, uint64_t, clmul_8x4_odd)
-DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
-DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
+DO_2OP(vmullpbw, 8, uint64_t, clmul_16x2_even)
+DO_2OP(vmullptw, 8, uint64_t, clmul_16x2_odd)
 
 /*
  * Because the computation type is at least twice as large as required,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index cd630ff905..5def86b573 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2029,19 +2029,6 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, 
uint32_t desc)
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
-uint64_t pmull_w(uint64_t op1, uint64_t op2)
-{
-uint64_t result = 0;
-int i;
-for (i = 0; i < 16; ++i) {
-uint64_t mask = (op1 & 0x00010001ull) * 0x;
-result ^= op2 & mask;
-op1 >>= 1;
-op2 <<= 1;
-}
-return result;
-}
-
 void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
 {
 int hi = simd_data(desc);
-- 
2.34.1

[PATCH v3 14/19] target/arm: Use clmul_64

2023-08-21 Thread Richard Henderson

Use generic routine for 64-bit carry-less multiply.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_helper.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index ffb4b44ce4..1f93510b85 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2003,28 +2003,14 @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, 
uint32_t desc)
  */
 void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
 {
-intptr_t i, j, opr_sz = simd_oprsz(desc);
+intptr_t i, opr_sz = simd_oprsz(desc);
 intptr_t hi = simd_data(desc);
 uint64_t *d = vd, *n = vn, *m = vm;
 
 for (i = 0; i < opr_sz / 8; i += 2) {
-uint64_t nn = n[i + hi];
-uint64_t mm = m[i + hi];
-uint64_t rhi = 0;
-uint64_t rlo = 0;
-
-/* Bit 0 can only influence the low 64-bit result.  */
-if (nn & 1) {
-rlo = mm;
-}
-
-for (j = 1; j < 64; ++j) {
-uint64_t mask = -((nn >> j) & 1);
-rlo ^= (mm << j) & mask;
-rhi ^= (mm >> (64 - j)) & mask;
-}
-d[i] = rlo;
-d[i + 1] = rhi;
+Int128 r = clmul_64(n[i + hi], m[i + hi]);
+d[i] = int128_getlo(r);
+d[i + 1] = int128_gethi(r);
 }
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1

[PATCH v3 10/19] target/arm: Use clmul_32* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 32-bit carry-less multiply.
Remove our local version of pmull_d.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_helper.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 5def86b573..ffb4b44ce4 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2055,18 +2055,6 @@ void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, 
uint32_t desc)
 }
 }
 
-static uint64_t pmull_d(uint64_t op1, uint64_t op2)
-{
-uint64_t result = 0;
-int i;
-
-for (i = 0; i < 32; ++i) {
-uint64_t mask = -((op1 >> i) & 1);
-result ^= (op2 << i) & mask;
-}
-return result;
-}
-
 void HELPER(sve2_pmull_d)(void *vd, void *vn, void *vm, uint32_t desc)
 {
 intptr_t sel = H4(simd_data(desc));
@@ -2075,7 +2063,7 @@ void HELPER(sve2_pmull_d)(void *vd, void *vn, void *vm, 
uint32_t desc)
 uint64_t *d = vd;
 
 for (i = 0; i < opr_sz / 8; ++i) {
-d[i] = pmull_d(n[2 * i + sel], m[2 * i + sel]);
+d[i] = clmul_32(n[2 * i + sel], m[2 * i + sel]);
 }
 }
 #endif
-- 
2.34.1

[PATCH v3 03/19] target/s390x: Use clmul_8* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 8-bit carry-less multiply.
Remove our local version of galois_multiply8.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 32 ---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index 53ab5c5eb3..edff4d6b2b 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -14,6 +14,7 @@
 #include "vec.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
+#include "crypto/clmul.h"
 
 static bool s390_vec_is_zero(const S390Vector *v)
 {
@@ -179,7 +180,6 @@ static uint##TBITS##_t 
galois_multiply##BITS(uint##TBITS##_t a,\
 }  
\
 return res;
\
 }
-DEF_GALOIS_MULTIPLY(8, 16)
 DEF_GALOIS_MULTIPLY(16, 32)
 DEF_GALOIS_MULTIPLY(32, 64)
 
@@ -203,6 +203,34 @@ static S390Vector galois_multiply64(uint64_t a, uint64_t b)
 return res;
 }
 
+/*
+ * There is no carry across the two doublewords, so their order does
+ * not matter.  Nor is there partial overlap between registers.
+ */
+static inline uint64_t do_gfma8(uint64_t n, uint64_t m, uint64_t a)
+{
+return clmul_8x4_even(n, m) ^ clmul_8x4_odd(n, m) ^ a;
+}
+
+void HELPER(gvec_vgfm8)(void *v1, const void *v2, const void *v3, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+
+q1[0] = do_gfma8(q2[0], q3[0], 0);
+q1[1] = do_gfma8(q2[1], q3[1], 0);
+}
+
+void HELPER(gvec_vgfma8)(void *v1, const void *v2, const void *v3,
+ const void *v4, uint32_t desc)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+
+q1[0] = do_gfma8(q2[0], q3[0], q4[0]);
+q1[1] = do_gfma8(q2[1], q3[1], q4[1]);
+}
+
 #define DEF_VGFM(BITS, TBITS)  
\
 void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3, 
\
  uint32_t desc)
\
@@ -220,7 +248,6 @@ void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, 
const void *v3, \
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFM(8, 16)
 DEF_VGFM(16, 32)
 DEF_VGFM(32, 64)
 
@@ -257,7 +284,6 @@ void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, 
const void *v3,\
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFMA(8, 16)
 DEF_VGFMA(16, 32)
 DEF_VGFMA(32, 64)
 
-- 
2.34.1

[PATCH v3 04/19] target/ppc: Use clmul_8* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 8-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 834da80fe3..343874863a 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -26,6 +26,7 @@
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
 #include "crypto/aes-round.h"
+#include "crypto/clmul.h"
 #include "fpu/softfloat.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
@@ -1425,6 +1426,18 @@ void helper_vbpermq(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 #undef VBPERMQ_INDEX
 #undef VBPERMQ_DW
 
+/*
+ * There is no carry across the two doublewords, so their order does
+ * not matter.  Nor is there partial overlap between registers.
+ */
+void helper_vpmsumb(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+for (int i = 0; i < 2; ++i) {
+uint64_t aa = a->u64[i], bb = b->u64[i];
+r->u64[i] = clmul_8x4_even(aa, bb) ^ clmul_8x4_odd(aa, bb);
+}
+}
+
 #define PMSUM(name, srcfld, trgfld, trgtyp)   \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)  \
 { \
@@ -1445,7 +1458,6 @@ void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t 
*b)  \
 } \
 }
 
-PMSUM(vpmsumb, u8, u16, uint16_t)
 PMSUM(vpmsumh, u16, u32, uint32_t)
 PMSUM(vpmsumw, u32, u64, uint64_t)
 
-- 
2.34.1

[PATCH v3 11/19] target/s390x: Use clmul_32* routines

2023-08-21 Thread Richard Henderson

Use generic routines for 32-bit carry-less multiply.
Remove our local version of galois_multiply32.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 75 +--
 1 file changed, 22 insertions(+), 53 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index 11477556e5..ba284b5379 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -165,22 +165,6 @@ DEF_VCTZ(8)
 DEF_VCTZ(16)
 
 /* like binary multiplication, but XOR instead of addition */
-#define DEF_GALOIS_MULTIPLY(BITS, TBITS)   
\
-static uint##TBITS##_t galois_multiply##BITS(uint##TBITS##_t a,
\
- uint##TBITS##_t b)
\
-{  
\
-uint##TBITS##_t res = 0;   
\
-   
\
-while (b) {
\
-if (b & 0x1) { 
\
-res = res ^ a; 
\
-}  
\
-a = a << 1;
\
-b = b >> 1;
\
-}  
\
-return res;
\
-}
-DEF_GALOIS_MULTIPLY(32, 64)
 
 static S390Vector galois_multiply64(uint64_t a, uint64_t b)
 {
@@ -254,24 +238,29 @@ void HELPER(gvec_vgfma16)(void *v1, const void *v2, const 
void *v3,
 q1[1] = do_gfma16(q2[1], q3[1], q4[1]);
 }
 
-#define DEF_VGFM(BITS, TBITS)  
\
-void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3, 
\
- uint32_t desc)
\
-{  
\
-int i; 
\
-   
\
-for (i = 0; i < (128 / TBITS); i++) {  
\
-uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2); 
\
-uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2); 
\
-uint##TBITS##_t d = galois_multiply##BITS(a, b);   
\
-   
\
-a = s390_vec_read_element##BITS(v2, i * 2 + 1);
\
-b = s390_vec_read_element##BITS(v3, i * 2 + 1);
\
-d = d ^ galois_multiply32(a, b);   
\
-s390_vec_write_element##TBITS(v1, i, d);   
\
-}  
\
+static inline uint64_t do_gfma32(uint64_t n, uint64_t m, uint64_t a)
+{
+return clmul_32(n, m) ^ clmul_32(n >> 32, m >> 32) ^ a;
+}
+
+void HELPER(gvec_vgfm32)(void *v1, const void *v2, const void *v3, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+
+q1[0] = do_gfma32(q2[0], q3[0], 0);
+q1[1] = do_gfma32(q2[1], q3[1], 0);
+}
+
+void HELPER(gvec_vgfma32)(void *v1, const void *v2, const void *v3,
+ const void *v4, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+
+q1[0] = do_gfma32(q2[0], q3[0], q4[0]);
+q1[1] = do_gfma32(q2[1], q3[1], q4[1]);
 }
-DEF_VGFM(32, 64)
 
 void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
  uint32_t desc)
@@ -288,26 +277,6 @@ void HELPER(gvec_vgfm64)(void *v1, const void *v2, const 
void *v3,
 s390_vec_xor(v1, &tmp1, &tmp2);
 }
 
-#define DEF_VGFMA(BITS, TBITS) 
\
-void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, const void *v3,
\
-  const void *v4, uint32_t desc)   
\
-{  
\
-int i; 
\
-   
\
-for (i = 0; i < (128 / TBITS); i++) {  
\
-uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2); 
\
-uint##BITS##_t b = s390_vec_read_element##BITS(v3, i *

[PATCH v3 00/19] crypto: Provide clmul.h and host accel

2023-08-21 Thread Richard Henderson

Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
carry-less multiply under emulation.

Changes for v3:
  * Update target/i386 ops_sse.h.
  * Apply r-b.

Changes for v2:
  * Only accelerate clmul_64; keep generic helpers for other sizes.
  * Drop most of the Int128 interfaces, except for clmul_64.
  * Use the same acceleration format as aes-round.h.


r~


[1] https://patchew.org/QEMU/20230601123332.3297404-1-a...@kernel.org/


Richard Henderson (19):
  crypto: Add generic 8-bit carry-less multiply routines
  target/arm: Use clmul_8* routines
  target/s390x: Use clmul_8* routines
  target/ppc: Use clmul_8* routines
  crypto: Add generic 16-bit carry-less multiply routines
  target/arm: Use clmul_16* routines
  target/s390x: Use clmul_16* routines
  target/ppc: Use clmul_16* routines
  crypto: Add generic 32-bit carry-less multiply routines
  target/arm: Use clmul_32* routines
  target/s390x: Use clmul_32* routines
  target/ppc: Use clmul_32* routines
  crypto: Add generic 64-bit carry-less multiply routine
  target/arm: Use clmul_64
  target/i386: Use clmul_64
  target/s390x: Use clmul_64
  target/ppc: Use clmul_64
  host/include/i386: Implement clmul.h
  host/include/aarch64: Implement clmul.h

 host/include/aarch64/host/cpuinfo.h  |   1 +
 host/include/aarch64/host/crypto/clmul.h |  41 +
 host/include/generic/host/crypto/clmul.h |  15 ++
 host/include/i386/host/cpuinfo.h |   1 +
 host/include/i386/host/crypto/clmul.h|  29 
 host/include/x86_64/host/crypto/clmul.h  |   1 +
 include/crypto/clmul.h   |  83 ++
 include/qemu/cpuid.h |   3 +
 target/arm/tcg/vec_internal.h|  11 --
 target/i386/ops_sse.h|  40 ++---
 crypto/clmul.c   | 112 ++
 target/arm/tcg/mve_helper.c  |  16 +-
 target/arm/tcg/vec_helper.c  | 102 ++---
 target/ppc/int_helper.c  |  64 
 target/s390x/tcg/vec_int_helper.c| 186 ++-
 util/cpuinfo-aarch64.c   |   4 +-
 util/cpuinfo-i386.c  |   1 +
 crypto/meson.build   |   9 +-
 18 files changed, 434 insertions(+), 285 deletions(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h
 create mode 100644 host/include/generic/host/crypto/clmul.h
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

-- 
2.34.1

[PATCH v3 01/19] crypto: Add generic 8-bit carry-less multiply routines

2023-08-21 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 include/crypto/clmul.h | 41 +
 crypto/clmul.c | 60 ++
 crypto/meson.build |  9 ---
 3 files changed, 107 insertions(+), 3 deletions(-)
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
new file mode 100644
index 00..153b5e3057
--- /dev/null
+++ b/include/crypto/clmul.h
@@ -0,0 +1,41 @@
+/*
+ * Carry-less multiply operations.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef CRYPTO_CLMUL_H
+#define CRYPTO_CLMUL_H
+
+/**
+ * clmul_8x8_low:
+ *
+ * Perform eight 8x8->8 carry-less multiplies.
+ */
+uint64_t clmul_8x8_low(uint64_t, uint64_t);
+
+/**
+ * clmul_8x4_even:
+ *
+ * Perform four 8x8->16 carry-less multiplies.
+ * The odd bytes of the inputs are ignored.
+ */
+uint64_t clmul_8x4_even(uint64_t, uint64_t);
+
+/**
+ * clmul_8x4_odd:
+ *
+ * Perform four 8x8->16 carry-less multiplies.
+ * The even bytes of the inputs are ignored.
+ */
+uint64_t clmul_8x4_odd(uint64_t, uint64_t);
+
+/**
+ * clmul_8x4_packed:
+ *
+ * Perform four 8x8->16 carry-less multiplies.
+ */
+uint64_t clmul_8x4_packed(uint32_t, uint32_t);
+
+#endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
new file mode 100644
index 00..82d873fee5
--- /dev/null
+++ b/crypto/clmul.c
@@ -0,0 +1,60 @@
+/*
+ * Carry-less multiply operations.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#include "qemu/osdep.h"
+#include "crypto/clmul.h"
+
+uint64_t clmul_8x8_low(uint64_t n, uint64_t m)
+{
+uint64_t r = 0;
+
+for (int i = 0; i < 8; ++i) {
+uint64_t mask = (n & 0x0101010101010101ull) * 0xff;
+r ^= m & mask;
+m = (m << 1) & 0xfefefefefefefefeull;
+n >>= 1;
+}
+return r;
+}
+
+static uint64_t clmul_8x4_even_int(uint64_t n, uint64_t m)
+{
+uint64_t r = 0;
+
+for (int i = 0; i < 8; ++i) {
+uint64_t mask = (n & 0x0001000100010001ull) * 0x;
+r ^= m & mask;
+n >>= 1;
+m <<= 1;
+}
+return r;
+}
+
+uint64_t clmul_8x4_even(uint64_t n, uint64_t m)
+{
+n &= 0x00ff00ff00ff00ffull;
+m &= 0x00ff00ff00ff00ffull;
+return clmul_8x4_even_int(n, m);
+}
+
+uint64_t clmul_8x4_odd(uint64_t n, uint64_t m)
+{
+return clmul_8x4_even(n >> 8, m >> 8);
+}
+
+static uint64_t unpack_8_to_16(uint64_t x)
+{
+return  (x & 0x00ff)
+ | ((x & 0xff00) << 8)
+ | ((x & 0x00ff) << 16)
+ | ((x & 0xff00) << 24);
+}
+
+uint64_t clmul_8x4_packed(uint32_t n, uint32_t m)
+{
+return clmul_8x4_even_int(unpack_8_to_16(n), unpack_8_to_16(m));
+}
diff --git a/crypto/meson.build b/crypto/meson.build
index 5f03a30d34..9ac1a89802 100644
--- a/crypto/meson.build
+++ b/crypto/meson.build
@@ -48,9 +48,12 @@ if have_afalg
 endif
 crypto_ss.add(when: gnutls, if_true: files('tls-cipher-suites.c'))
 
-util_ss.add(files('sm4.c'))
-util_ss.add(files('aes.c'))
-util_ss.add(files('init.c'))
+util_ss.add(files(
+  'aes.c',
+  'clmul.c',
+  'init.c',
+  'sm4.c',
+))
 if gnutls.found()
   util_ss.add(gnutls)
 endif
-- 
2.34.1

Re: [PATCH 2/2] hw/intc: Make rtc variable names consistent

2023-08-21 Thread Jason Chien

Ping.

On Fri, Aug 11, 2023 at 2:25 AM Alistair Francis 
wrote:

> On Fri, Jul 28, 2023 at 4:57 AM Jason Chien 
> wrote:
> >
> > The variables whose values are given by cpu_riscv_read_rtc() should be
> named
> > "rtc". The variables whose value are given by cpu_riscv_read_rtc_raw()
> > should be named "rtc_r".
> >
> > Signed-off-by: Jason Chien 
>
> Reviewed-by: Alistair Francis 
>
> Alistair
>
> > ---
> >  hw/intc/riscv_aclint.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
> > index bf77e29a70..25cf7a5d9d 100644
> > --- a/hw/intc/riscv_aclint.c
> > +++ b/hw/intc/riscv_aclint.c
> > @@ -64,13 +64,13 @@ static void
> riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
> >  uint64_t next;
> >  uint64_t diff;
> >
> > -uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
> > +uint64_t rtc = cpu_riscv_read_rtc(mtimer);
> >
> >  /* Compute the relative hartid w.r.t the socket */
> >  hartid = hartid - mtimer->hartid_base;
> >
> >  mtimer->timecmp[hartid] = value;
> > -if (mtimer->timecmp[hartid] <= rtc_r) {
> > +if (mtimer->timecmp[hartid] <= rtc) {
> >  /*
> >   * If we're setting an MTIMECMP value in the "past",
> >   * immediately raise the timer interrupt
> > @@ -81,7 +81,7 @@ static void
> riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
> >
> >  /* otherwise, set up the future timer interrupt */
> >  qemu_irq_lower(mtimer->timer_irqs[hartid]);
> > -diff = mtimer->timecmp[hartid] - rtc_r;
> > +diff = mtimer->timecmp[hartid] - rtc;
> >  /* back to ns (note args switched in muldiv64) */
> >  uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND,
> timebase_freq);
> >
> > --
> > 2.17.1
> >
> >
>

Re: [8.1 regression] Re: [PULL 05/19] virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties

2023-08-21 Thread Alex Williamson

On Mon, 21 Aug 2023 14:20:38 +0400
Marc-André Lureau  wrote:

> Hi Alex
> 
> On Thu, Aug 17, 2023 at 1:25 AM Alex Williamson
>  wrote:
> >
> > On Wed, 16 Aug 2023 15:08:10 -0600
> > Alex Williamson  wrote:  
> > > > diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
> > > > index 8f9fbf583e..3d19dbe382 100644
> > > > --- a/ui/egl-helpers.c
> > > > +++ b/ui/egl-helpers.c
> > > > @@ -314,9 +314,9 @@ void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf)
> > > >  }
> > > >
> > > >  attrs[i++] = EGL_WIDTH;
> > > > -attrs[i++] = dmabuf->width;
> > > > +attrs[i++] = dmabuf->backing_width;
> > > >  attrs[i++] = EGL_HEIGHT;
> > > > -attrs[i++] = dmabuf->height;
> > > > +attrs[i++] = dmabuf->backing_height;
> > > >  attrs[i++] = EGL_LINUX_DRM_FOURCC_EXT;
> > > >  attrs[i++] = dmabuf->fourcc;
> > > >
> > > > diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
> > > > index 42db1bb6cf..eee821d73a 100644
> > > > --- a/ui/gtk-egl.c
> > > > +++ b/ui/gtk-egl.c
> > > > @@ -262,9 +262,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener 
> > > > *dcl,
> > > >  }
> > > >
> > > >  gd_egl_scanout_texture(dcl, dmabuf->texture,
> > > > -   dmabuf->y0_top, dmabuf->width, 
> > > > dmabuf->height,
> > > > -   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
> > > > -   dmabuf->scanout_height, NULL);
> > > > +   dmabuf->y0_top,
> > > > +   dmabuf->backing_width, 
> > > > dmabuf->backing_height,
> > > > +   dmabuf->x, dmabuf->y, dmabuf->width,
> > > > +   dmabuf->height, NULL);
> > > >
> > > >  if (dmabuf->allow_fences) {
> > > >  vc->gfx.guest_fb.dmabuf = dmabuf;
> > > > @@ -284,7 +285,8 @@ void gd_egl_cursor_dmabuf(DisplayChangeListener 
> > > > *dcl,
> > > >  if (!dmabuf->texture) {
> > > >  return;
> > > >  }
> > > > -egl_fb_setup_for_tex(&vc->gfx.cursor_fb, dmabuf->width, 
> > > > dmabuf->height,
> > > > +egl_fb_setup_for_tex(&vc->gfx.cursor_fb,
> > > > + dmabuf->backing_width, 
> > > > dmabuf->backing_height,
> > > >   dmabuf->texture, false);
> > > >  } else {
> > > >  egl_fb_destroy(&vc->gfx.cursor_fb);
> > > > diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
> > > > index a9a7fdf50c..4513d3d059 100644
> > > > --- a/ui/gtk-gl-area.c
> > > > +++ b/ui/gtk-gl-area.c
> > > > @@ -301,9 +301,10 @@ void 
> > > > gd_gl_area_scanout_dmabuf(DisplayChangeListener *dcl,
> > > >  }
> > > >
> > > >  gd_gl_area_scanout_texture(dcl, dmabuf->texture,
> > > > -   dmabuf->y0_top, dmabuf->width, 
> > > > dmabuf->height,
> > > > -   dmabuf->x, dmabuf->y, 
> > > > dmabuf->scanout_width,
> > > > -   dmabuf->scanout_height, NULL);
> > > > +   dmabuf->y0_top,
> > > > +   dmabuf->backing_width, 
> > > > dmabuf->backing_height,
> > > > +   dmabuf->x, dmabuf->y, dmabuf->width,
> > > > +   dmabuf->height, NULL);
> > > >
> > > >  if (dmabuf->allow_fences) {
> > > >  vc->gfx.guest_fb.dmabuf = dmabuf;  
> > >  
> >
> > I suspect the issues is in these last few chunks where width and height
> > are replaced with backing_width and backing height, but
> > hw/vfio/display.c never sets backing_*.  It appears that the following
> > resolves the issue:
> >
> > diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> > index bec864f482f4..837d9e6a309e 100644
> > --- a/hw/vfio/display.c
> > +++ b/hw/vfio/display.c
> > @@ -243,6 +243,8 @@ static VFIODMABuf 
> > *vfio_display_get_dmabuf(VFIOPCIDevice *vdev,
> >  dmabuf->dmabuf_id  = plane.dmabuf_id;
> >  dmabuf->buf.width  = plane.width;
> >  dmabuf->buf.height = plane.height;
> > +dmabuf->buf.backing_width = plane.width;
> > +dmabuf->buf.backing_height = plane.height;
> >  dmabuf->buf.stride = plane.stride;
> >  dmabuf->buf.fourcc = plane.drm_format;
> >  dmabuf->buf.modifier = plane.drm_format_mod;
> >
> > I'll post that formally, but I really have no idea how dmabuf display
> > works, so confirmation would be appreciated.  Thanks,  
> 
> Looks correct to me. I wish Kim would chime in.
> 
> I am not familiar with vfio/display. Looking at the kernel side, it
> seems it doesn't have a concept for scanout geometry that is different
> from the backing dmabuf/texture dimension.
> 
> Should we make this a blocker for release? Are you sending the patch?

I did send a patch, Kim commented there:

https://lore.kernel.org/all/20230816215550.1723696-1-alex.william...@redhat.com/

Follow-up suggest vhost-user-gpu is also affected.  Empirically the
patch I sent works, I think it's correct, but Gerd is probably most
qualified to respond to the comments.  I don't know how a "scan

Re: Funny results with long double denorms on m68k

2023-08-21 Thread Keith Packard via


> When I developped the FPU emulation I compared the result of QEMU and a real 
> hardware using 
> https://github.com/vivier/m68k-testfloat and 
> https://github.com/vivier/m68k-softfloat

It looks like the second of those has similar issues with m68k denorms?

https://github.com/vivier/m68k-softfloat/blob/6ecdd5c9627d02c7502de4acaf54c5c5b0a43bdf/softfloat/bits64/softfloat.c#L640

-- 
-keith


signature.asc
Description: PGP signature

Re: [PATCH v2 00/18] crypto: Provide clmul.h and host accel

2023-08-21 Thread Ard Biesheuvel

On Mon, 21 Aug 2023 at 17:15, Richard Henderson
 wrote:
>
> On 8/21/23 07:57, Ard Biesheuvel wrote:
> >> Richard Henderson (18):
> >>crypto: Add generic 8-bit carry-less multiply routines
> >>target/arm: Use clmul_8* routines
> >>target/s390x: Use clmul_8* routines
> >>target/ppc: Use clmul_8* routines
> >>crypto: Add generic 16-bit carry-less multiply routines
> >>target/arm: Use clmul_16* routines
> >>target/s390x: Use clmul_16* routines
> >>target/ppc: Use clmul_16* routines
> >>crypto: Add generic 32-bit carry-less multiply routines
> >>target/arm: Use clmul_32* routines
> >>target/s390x: Use clmul_32* routines
> >>target/ppc: Use clmul_32* routines
> >>crypto: Add generic 64-bit carry-less multiply routine
> >>target/arm: Use clmul_64
> >>target/s390x: Use clmul_64
> >>target/ppc: Use clmul_64
> >>host/include/i386: Implement clmul.h
> >>host/include/aarch64: Implement clmul.h
> >>
> >
> > I didn't re-run the OpenSSL benchmark, but the x86 Linux kernel still
> > passes all its crypto selftests when running under TCG emulation on a
> > TX2 arm64 host, so
> >
> > Tested-by: Ard Biesheuvel 
>
> Oh, whoops.  What's missing here?  Any target/i386 changes.
>

Ah yes - I hadn't spotted that. The below seems to do the trick.

--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2156,7 +2156,10 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State
*env, Reg *d, Reg *v, Reg *s,
 for (i = 0; i < 1 << SHIFT; i += 2) {
 a = v->Q(((ctrl & 1) != 0) + i);
 b = s->Q(((ctrl & 16) != 0) + i);
-clmulq(&d->Q(i), &d->Q(i + 1), a, b);
+
+Int128 r = clmul_64(a, b);
+d->Q(i) = int128_getlo(r);
+d->Q(i + 1) = int128_gethi(r);
 }
 }

[and the #include added and clmulq() dropped]

I did a quick RFC4106 benchmark with tcrypt (which doesn't speed up as
much as OpenSSL but it is a bit of a hassle cross-rebuilding that)

no acceleration:

tcrypt: test 7 (160 bit key, 8192 byte blocks): 1547 operations in 1
seconds (12673024 bytes)

AES only:

tcrypt: test 7 (160 bit key, 8192 byte blocks): 1679 operations in 1
seconds (13754368 bytes)

AES and PMULL

tcrypt: test 7 (160 bit key, 8192 byte blocks): 3298 operations in 1
seconds (27017216 bytes)

Re: [PATCH] target/ppc: Fix LQ, STQ register-pair order for big-endian

2023-08-21 Thread Richard Henderson


On 8/21/23 08:30, Nicholas Piggin wrote:

LQ, STQ have the same register-pair ordering as LQARX/STQARX., which is
the even (lower) register contains the most significant bits. This is
not implemented correctly for big-endian.

do_ldst_quad() has variables low_addr_gpr and high_addr_gpr which is
confusing because they are low and high addresses, whereas LQARX/STQARX.
and most such things use the low and high values for lo/hi variables.
The conversion to native 128-bit memory access functions missed this
strangeness.

Fix this by changing the if condition, and change the variable names to
hi/lo to match convention.

Cc:qemu-sta...@nongnu.org
Reported-by: Ivan Warren
Fixes: 57b38ffd0c6f ("target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, 
STQ")
Resolves:https://gitlab.com/qemu-project/qemu/-/issues/1836
Signed-off-by: Nicholas Piggin
---
Hi Ivan,

Thanks for your report. This gets AIX7.2 booting for me again with TCG,
if you would be able to confirm that it works there, it would be great.

Thanks,
Nick

  target/ppc/translate/fixedpoint-impl.c.inc | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)


Thanks for the catch.

Reviewed-by: Richard Henderson 


r~

[PATCH] target/ppc: Fix LQ, STQ register-pair order for big-endian

2023-08-21 Thread Nicholas Piggin

LQ, STQ have the same register-pair ordering as LQARX/STQARX., which is
the even (lower) register contains the most significant bits. This is
not implemented correctly for big-endian.

do_ldst_quad() has variables low_addr_gpr and high_addr_gpr which is
confusing because they are low and high addresses, whereas LQARX/STQARX.
and most such things use the low and high values for lo/hi variables.
The conversion to native 128-bit memory access functions missed this
strangeness.

Fix this by changing the if condition, and change the variable names to
hi/lo to match convention.

Cc: qemu-sta...@nongnu.org
Reported-by: Ivan Warren 
Fixes: 57b38ffd0c6f ("target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, 
STQ")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1836
Signed-off-by: Nicholas Piggin 
---
Hi Ivan,

Thanks for your report. This gets AIX7.2 booting for me again with TCG,
if you would be able to confirm that it works there, it would be great.

Thanks,
Nick

 target/ppc/translate/fixedpoint-impl.c.inc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
b/target/ppc/translate/fixedpoint-impl.c.inc
index f47f1a50e8..b423c09c26 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -71,7 +71,7 @@ static bool do_ldst_quad(DisasContext *ctx, arg_D *a, bool 
store, bool prefixed)
 {
 #if defined(TARGET_PPC64)
 TCGv ea;
-TCGv_i64 low_addr_gpr, high_addr_gpr;
+TCGv_i64 lo, hi;
 TCGv_i128 t16;
 
 REQUIRE_INSNS_FLAGS(ctx, 64BX);
@@ -94,21 +94,21 @@ static bool do_ldst_quad(DisasContext *ctx, arg_D *a, bool 
store, bool prefixed)
 gen_set_access_type(ctx, ACCESS_INT);
 ea = do_ea_calc(ctx, a->ra, tcg_constant_tl(a->si));
 
-if (prefixed || !ctx->le_mode) {
-low_addr_gpr = cpu_gpr[a->rt];
-high_addr_gpr = cpu_gpr[a->rt + 1];
+if (ctx->le_mode && prefixed) {
+lo = cpu_gpr[a->rt];
+hi = cpu_gpr[a->rt + 1];
 } else {
-low_addr_gpr = cpu_gpr[a->rt + 1];
-high_addr_gpr = cpu_gpr[a->rt];
+lo = cpu_gpr[a->rt + 1];
+hi = cpu_gpr[a->rt];
 }
 t16 = tcg_temp_new_i128();
 
 if (store) {
-tcg_gen_concat_i64_i128(t16, low_addr_gpr, high_addr_gpr);
+tcg_gen_concat_i64_i128(t16, lo, hi);
 tcg_gen_qemu_st_i128(t16, ea, ctx->mem_idx, DEF_MEMOP(MO_128));
 } else {
 tcg_gen_qemu_ld_i128(t16, ea, ctx->mem_idx, DEF_MEMOP(MO_128));
-tcg_gen_extr_i128_i64(low_addr_gpr, high_addr_gpr, t16);
+tcg_gen_extr_i128_i64(lo, hi, t16);
 }
 #else
 qemu_build_not_reached();
-- 
2.40.1

Re: [RFC PATCH] target/arm: properly document FEAT_CRC32

2023-08-21 Thread Peter Maydell

On Wed, 22 Feb 2023 at 11:01, Alex Bennée  wrote:
>
> This is a mandatory feature for Armv8.1 architectures but we don't
> state the feature clearly in our emulation list. While checking verify
> our cortex-a76 model matches up with the current TRM by breaking out
> the long form isar into a more modern readable FIELD_DP code.
>
> Signed-off-by: Alex Bennée 
> ---
>  docs/system/arm/emulation.rst |  1 +
>  target/arm/cpu64.c| 29 ++---
>  target/arm/cpu_tcg.c  |  2 +-
>  3 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
> index 2062d71261..2c4fde5eef 100644
> --- a/docs/system/arm/emulation.rst
> +++ b/docs/system/arm/emulation.rst
> @@ -14,6 +14,7 @@ the following architecture extensions:
>  - FEAT_BBM at level 2 (Translation table break-before-make levels)
>  - FEAT_BF16 (AArch64 BFloat16 instructions)
>  - FEAT_BTI (Branch Target Identification)
> +- FEAT_CRC32 (CRC32 instruction)
>  - FEAT_CSV2 (Cache speculation variant 2)
>  - FEAT_CSV2_1p1 (Cache speculation variant 2, version 1.1)
>  - FEAT_CSV2_1p2 (Cache speculation variant 2, version 1.2)

Would you mind resubmitting a version of this patch that just
fixes this documentation error and doesn't also do the other
stuff that caused this patch to not get through code review?

thanks
-- PMM

Re: [PATCH v2 00/18] crypto: Provide clmul.h and host accel

2023-08-21 Thread Richard Henderson


On 8/21/23 07:57, Ard Biesheuvel wrote:

Richard Henderson (18):
   crypto: Add generic 8-bit carry-less multiply routines
   target/arm: Use clmul_8* routines
   target/s390x: Use clmul_8* routines
   target/ppc: Use clmul_8* routines
   crypto: Add generic 16-bit carry-less multiply routines
   target/arm: Use clmul_16* routines
   target/s390x: Use clmul_16* routines
   target/ppc: Use clmul_16* routines
   crypto: Add generic 32-bit carry-less multiply routines
   target/arm: Use clmul_32* routines
   target/s390x: Use clmul_32* routines
   target/ppc: Use clmul_32* routines
   crypto: Add generic 64-bit carry-less multiply routine
   target/arm: Use clmul_64
   target/s390x: Use clmul_64
   target/ppc: Use clmul_64
   host/include/i386: Implement clmul.h
   host/include/aarch64: Implement clmul.h



I didn't re-run the OpenSSL benchmark, but the x86 Linux kernel still
passes all its crypto selftests when running under TCG emulation on a
TX2 arm64 host, so

Tested-by: Ard Biesheuvel 


Oh, whoops.  What's missing here?  Any target/i386 changes.


r~

1 2 3 >

1 - 100 of 200 matches

Mail list logo