date:20230920

Re: [PATCH v2 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-20 Thread Markus Armbruster

Eric Blake  writes:

> On Wed, Sep 20, 2023 at 08:31:49PM +0200, Markus Armbruster wrote:
> ...
>> The only reliable way to prevent unintended variable name capture is
>> -Wshadow.
>> 
>> One blocker for enabling it is shadowing hiding in function-like
>> macros like
>> 
>>  qdict_put(dict, "name", qobject_ref(...))
>> 
>> qdict_put() wraps its last argument in QOBJECT(), and the last
>> argument here contains another QOBJECT().
>> 
>> Use dark preprocessor sorcery to make the macros that give us this
>> problem use different variable names on every call.
>> 
>> Signed-off-by: Markus Armbruster 
>> Reviewed-by: Eric Blake 
>
> It's changed (for the better) since v1, so I'm re-reviewing.
>
>> ---
>>  include/qapi/qmp/qobject.h | 11 +--
>>  include/qemu/atomic.h  | 17 -
>>  include/qemu/compiler.h|  3 +++
>>  include/qemu/osdep.h   | 31 +++
>>  4 files changed, 47 insertions(+), 15 deletions(-)
>> 
>> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
>> index 9003b71fd3..d36cc97805 100644
>> --- a/include/qapi/qmp/qobject.h
>> +++ b/include/qapi/qmp/qobject.h
>> @@ -45,10 +45,17 @@ struct QObject {
>>  struct QObjectBase_ base;
>>  };
>>  
>> -#define QOBJECT(obj) ({ \
>> +/*
>> + * Preprocessory sorcery ahead: use a different identifier for the
>
> s/Preprocessory/Preprocessor/ (multiple times in the patch)

Dang!  Will fix.

>> + * local variable in each expansion, so we can nest macro calls
>> + * without shadowing variables.
>> + */
>> +#define QOBJECT_INTERNAL(obj, _obj) ({  \
>>  typeof(obj) _obj = (obj);   \
>> -_obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>> +_obj\
>> +? container_of(&(_obj)->base, QObject, base) : NULL;\
>
> As pointed out before, you can write &_obj->base instead of
> &(_obj)->base, now that we know _obj is a single identifier rather
> than an arbitrary expression.  Not strictly necessary since the extra
> () doesn't change semantics...

Makes sense, I just forgot here.

>>  })
>> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), MAKE_IDENTFIER(_obj))
>>  
>>  /* Required for qobject_to() */
>>  #define QTYPE_CAST_TO_QNull QTYPE_QNULL
>> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
>> index d95612f7a0..d4cbd01909 100644
>> --- a/include/qemu/atomic.h
>> +++ b/include/qemu/atomic.h
>> @@ -157,13 +157,20 @@
>>  smp_read_barrier_depends();
>>  #endif
>>  
>> -#define qatomic_rcu_read(ptr)  \
>> -({ \
>> +/*
>> + * Preprocessory sorcery ahead: use a different identifier for the
>> + * local variable in each expansion, so we can nest macro calls
>> + * without shadowing variables.
>> + */
>> +#define qatomic_rcu_read_internal(ptr, _val)\
>> +({  \
>>  qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
>> -typeof_strip_qual(*ptr) _val;  \
>> -qatomic_rcu_read__nocheck(ptr, &_val); \
>> -_val;  \
>> +typeof_strip_qual(*ptr) _val;   \
>> +qatomic_rcu_read__nocheck(ptr, &_val);  \
>
> ...but it looks odd for the patch to not be consistent on that front.
>
>> +_val;   \
>>  })
>> +#define qatomic_rcu_read(ptr) \
>> +qatomic_rcu_read_internal((ptr), MAKE_IDENTFIER(_val))
>>  
>>  #define qatomic_rcu_set(ptr, i) do {   \
>>  qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
>> diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
>> index a309f90c76..03236d830c 100644
>> --- a/include/qemu/compiler.h
>> +++ b/include/qemu/compiler.h
>> @@ -37,6 +37,9 @@
>>  #define tostring(s) #s
>>  #endif
>>  
>> +/* Expands into an identifier stemN, where N is another number each time */
>> +#define MAKE_IDENTFIER(stem) glue(stem, __COUNTER__)
>
> I like how this turned out.
>
> With the spelling fix, and optionally with the redundant () dropped,
> you can keep my R-b.

Thanks!

Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg

2023-09-20 Thread Jason Wang

On Tue, Sep 19, 2023 at 7:43 PM Jiqian Chen  wrote:
>
> When guest vm does S3, Qemu will reset and clear some things of virtio
> devices, but guest can't aware that, so that may cause some problems.
> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> resume, that function will destroy render resources of virtio-gpu. As
> a result, after guest resume, the display can't come back and we only
> saw a black screen. Due to guest can't re-create all the resources, so
> we need to let Qemu not to destroy them when S3.
>
> For above purpose, we need a mechanism that allows guests and QEMU to
> negotiate their reset behavior. So this patch add a new parameter
> named freeze_mode to struct virtio_pci_common_cfg. And when guest
> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
> devices can change their reset behavior on Qemu side according to
> freeze_mode. What's more, freeze_mode can be used for all virtio
> devices to affect the behavior of Qemu, not just virtio gpu device.

A simple question, why is this issue specific to pci?

Thanks


>
> Signed-off-by: Jiqian Chen 
> ---
>  transport-pci.tex | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..2543536 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure 
> layout}\label{sec:Virtio Transport
>  le64 queue_desc;/* read-write */
>  le64 queue_driver;  /* read-write */
>  le64 queue_device;  /* read-write */
> +le16 freeze_mode;   /* read-write */
>  le16 queue_notif_config_data;   /* read-only for driver */
>  le16 queue_reset;   /* read-write */
>
> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure 
> layout}\label{sec:Virtio Transport
>  \item[\field{queue_device}]
>  The driver writes the physical address of Device Area here.  See 
> section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>
> +\item[\field{freeze_mode}]
> +The driver writes this to set the freeze mode of virtio pci.
> +VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> +VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and 
> virtio-pci enters S3 suspension;
> +Other values are reserved for future use, like S4, etc.
> +
>  \item[\field{queue_notif_config_data}]
>  This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been 
> negotiated.
>  The driver will use this value when driver sends available buffer
> --
> 2.34.1
>

RE: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Cédric Le Goater 
>Sent: Wednesday, September 20, 2023 9:02 PM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On 9/20/23 14:51, Jason Gunthorpe wrote:
>> On Wed, Sep 20, 2023 at 02:19:42PM +0200, Cédric Le Goater wrote:
>>> On 9/20/23 05:42, Duan, Zhenzhong wrote:


> -Original Message-
> From: Cédric Le Goater 
> Sent: Wednesday, September 20, 2023 1:08 AM
> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
> On 8/30/23 12:37, Zhenzhong Duan wrote:
>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>> iommufd support, enabled by default.
>
> Why would someone want to disable support at compile time ? It might

 For those users who only want to support legacy container feature?
 Let me know if you still prefer to drop this patch, I'm fine with that.
>>>
>>> I think it is too early.
>>>
> have been useful for dev but now QEMU should self-adjust at runtime
> depending only on the host capabilities AFAIUI. Am I missing something ?

 IOMMUFD doesn't support all features of legacy container, so QEMU
 doesn't self-adjust at runtime by checking if host supports IOMMUFD.
 We need to specify it explicitly to use IOMMUFD as below:

   -object iommufd,id=iommufd0
   -device vfio-pci,host=:02:00.0,iommufd=iommufd0
>>>
>>> OK. I am not sure this is the correct interface yet. At first glance,
>>> I wouldn't introduce a new object for a simple backend depending on a
>>> kernel interface. I would tend to prefer a "iommu-something" property
>>> of the vfio-pci device with string values: "legacy", "iommufd", "default"
>>> and define the various interfaces (the ops you proposed) for each
>>> depending on the user preference and the capabilities of the host and
>>> possibly the device.
>>
>> I think the idea came from Alex? The major point is to be able to have
>> libvirt open /dev/iommufd and FD pass it into qemu
>
>ok.
>
>> and then share that single FD across all VFIOs.
>
>I will ask Alex to help me catch up on the topic.
>
>> qemu will typically not be able to
>> self-open /dev/iommufd as it is root-only.
>
>I don't understand, we open multiple fds to KVM devices. This is the same.

There are two slight differences:

1. Different group:
$ ll /dev/kvm
crw-rw+ 1 root kvm 10, 232  9月 18 14:23 /dev/kvm
$ ll /dev/iommu
crw-rw 1 root root 10, 124  9月 12 14:14 /dev/iommu

2. Default cgroup device allowed list:
#cgroup_device_acl = [
#"/dev/null", "/dev/full", "/dev/zero",
#"/dev/random", "/dev/urandom",
#"/dev/ptmx", "/dev/kvm"
#]

By default, libvirt creates qemu instance with usr/group libvirt-qemu/kvm
So qemu has permission to open /dev/kvm, but not for /dev/iommu.

When a general user wants to open /dev/kvm, it's not permitted:

duan@duan-server-S2600BT:~$ qemu-system-x86_64 -accel kvm
Could not access KVM kernel module: Permission denied
qemu-system-x86_64: -accel kvm: failed to initialize kvm: Permission denied

Thanks
Zhenzhong

[PATCH] eeprom_at24c: Model 8-bit data addressing for 16-bit devices

2023-09-20 Thread Andrew Jeffery

It appears some (many?) EEPROMs that implement 16-bit data addressing
will accept an 8-bit address and clock out non-uniform data for the
read. This behaviour is exploited by an EEPROM detection routine in part
of OpenBMC userspace with a reasonably broad user base:

https://github.com/openbmc/entity-manager/blob/0422a24bb6033605ce75479f675fedc76abb1167/src/fru_device.cpp#L197-L229

The diversity of the set of EEPROMs that it operates against is unclear,
but this code has been around for a while now.

Separately, The NVM Express Management Interface Specification dictates
the provided behaviour in section 8.2 Vital Product Data:

> If only one byte of the Command Offset is provided by the Management
> Controller, then the least significant byte of the internal offset
> shall be set to that value and the most-significant byte of the
> internal offset shall be cleared to 0h

https://nvmexpress.org/wp-content/uploads/NVM-Express-Management-Interface-Specification-1.2c-2022.10.06-Ratified.pdf

This change makes it possible to expose NVMe VPD in a manner that can be
dynamically detected by OpenBMC.

Signed-off-by: Andrew Jeffery 
---
 hw/nvram/eeprom_at24c.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/hw/nvram/eeprom_at24c.c b/hw/nvram/eeprom_at24c.c
index 613c4929e327..64a61cc0e468 100644
--- a/hw/nvram/eeprom_at24c.c
+++ b/hw/nvram/eeprom_at24c.c
@@ -98,12 +98,20 @@ uint8_t at24c_eeprom_recv(I2CSlave *s)
 EEPROMState *ee = AT24C_EE(s);
 uint8_t ret;
 
-/*
- * If got the byte address but not completely with address size
- * will return the invalid value
- */
 if (ee->haveaddr > 0 && ee->haveaddr < ee->asize) {
-return 0xff;
+/*
+ * Provide behaviour that aligns with NVMe MI 1.2c, section 8.2.
+ *
+ * 
https://nvmexpress.org/wp-content/uploads/NVM-Express-Management-Interface-Specification-1.2c-2022.10.06-Ratified.pdf
+ *
+ * Otherwise, the clocked-out data is meaningless anyway, and so 
reading
+ * off memory is as good a behaviour as anything. This also happens to
+ * help the address-width detection heuristic in OpenBMC's userspace.
+ *
+ * 
https://github.com/openbmc/entity-manager/blob/0422a24bb6033605ce75479f675fedc76abb1167/src/fru_device.cpp#L197-L229
+ */
+ee->haveaddr = ee->asize;
+ee->cur %= ee->rsize;
 }
 
 ret = ee->mem[ee->cur];
-- 
2.39.2

RE: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Duan, Zhenzhong




>-Original Message-
>From: Jason Gunthorpe 
>Sent: Thursday, September 21, 2023 2:20 AM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote:
>
>> > The iommufd design requires one open of the /dev/iommu to be shared
>> > across all the vfios.
>>
>> "requires"?  It's certainly of limited value to have multiple iommufd
>> instances rather than create multiple address spaces within a single
>> iommufd, but what exactly precludes an iommufd per device if QEMU, or
>> any other userspace so desired?  Thanks,
>
>From the kernel side requires is too strong I suppose
>
>Not sure about these qemu patches though?

I had ever tested with multiple IOMMUFDs and mix of IOMMUFD/legacy BE linking 
to different VFIO devices with this series,  all works fine.

Thanks
Zhenzhong

RE: [PATCH v1 13/22] vfio: Add base container

2023-09-20 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: Eric Auger 
>Sent: Thursday, September 21, 2023 1:31 AM
>Subject: Re: [PATCH v1 13/22] vfio: Add base container
>
>Hi Zhenzhong,
>
>On 9/19/23 19:23, Cédric Le Goater wrote:
>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>> From: Yi Liu 
>>>
>>> Abstract the VFIOContainer to be a base object. It is supposed to be
>>> embedded by legacy VFIO container and later on, into the new iommufd
>>> based container.
>>>
>>> The base container implements generic code such as code related to
>>> memory_listener and address space management. The VFIOContainerOps
>>> implements callbacks that depend on the kernel user space being used.
>>>
>>> 'common.c' and vfio device code only manipulates the base container with
>>> wrapper functions that calls the functions defined in
>>> VFIOContainerOpsClass.
>>> Existing 'container.c' code is converted to implement the legacy
>>> container
>>> ops functions.
>>>
>>> Below is the base container. It's named as VFIOContainer, old
>>> VFIOContainer
>>> is replaced with VFIOLegacyContainer.
>>
>> Usualy, we introduce the new interface solely, port the current models
>> on top of the new interface, wire the new models in the current
>> implementation and remove the old implementation. Then, we can start
>> adding extensions to support other implementations.
>>
>> spapr should be taken care of separatly following the principle above.
>> With my PPC hat, I would not even read such a massive change, too risky
>> for the subsystem. This path will need (much) further splitting to be
>> understandable and acceptable.
>We might split this patch by
>1) introducing VFIOLegacyContainer encapsulating the base VFIOContainer,
>without using the ops in a first place:
> common.c would call vfio_container_* with harcoded legacy
>implementation, ie. retrieving the legacy container with container_of.
>2) we would introduce the BE interface without using it.
>3) we would use the new BE interface
>
>Obviously this needs to be further tried out. If you wish I can try to
>split it that way ... Please let me know

Sure, thanks for your help, glad that I can cooperate with you to move
this series forward.
I just updated the branch which rebased to newest upstream for you to pick at 
https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_cdev_v1_rebased 

Thanks
Zhenzhong

RE: [PATCH v1 13/22] vfio: Add base container

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Eric Auger 
>Sent: Wednesday, September 20, 2023 9:54 PM
>Subject: Re: [PATCH v1 13/22] vfio: Add base container
>
>Hi Cedric,
>
>On 9/19/23 19:23, Cédric Le Goater wrote:
>> On 8/30/23 12:37, Zhenzhong Duan wrote:
>>> From: Yi Liu 
>>>
>>> Abstract the VFIOContainer to be a base object. It is supposed to be
>>> embedded by legacy VFIO container and later on, into the new iommufd
>>> based container.
>>>
>>> The base container implements generic code such as code related to
>>> memory_listener and address space management. The VFIOContainerOps
>>> implements callbacks that depend on the kernel user space being used.
>>>
>>> 'common.c' and vfio device code only manipulates the base container with
>>> wrapper functions that calls the functions defined in
>>> VFIOContainerOpsClass.
>>> Existing 'container.c' code is converted to implement the legacy
>>> container
>>> ops functions.
>>>
>>> Below is the base container. It's named as VFIOContainer, old
>>> VFIOContainer
>>> is replaced with VFIOLegacyContainer.
>>
>> Usualy, we introduce the new interface solely, port the current models
>> on top of the new interface, wire the new models in the current
>> implementation and remove the old implementation. Then, we can start
>> adding extensions to support other implementations.
>> spapr should be taken care of separatly following the principle above.
>> With my PPC hat, I would not even read such a massive change, too risky
>> for the subsystem. This path will need (much) further splitting to be
>> understandable and acceptable.
>>
>> Also, please include the .h file first, it helps in reading. Have you
>> considered using an InterfaceClass ?
>in the transition from v1 -> v2, I removed the QOMification of the
>VFIOContainer, following David Gibson's advice. QOM objects are visible
>from the user interface and there was no interest in that. Does it
>answer your question?
>
>- remove the QOMification of the VFIOContainer and simply use standard ops
>(David)
>
>Unfortunately the coverletter log history has disappeared in this new version.
>Zhenzhong, I think it is useful to understand how the series moves on.

I have archive it with a link 
https://lists.nongnu.org/archive/html/qemu-devel/2023-07/msg02529.html
for cleaner cover letter, looks I'm wrong. I'll restore the whole changelog in 
v2.

Thanks
Zhenzhong

RE: [PATCH v1 09/22] vfio/container: Introduce vfio_[attach/detach]_device

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Eric Auger 
>Sent: Wednesday, September 20, 2023 9:33 PM
>Subject: Re: [PATCH v1 09/22] vfio/container: Introduce
>vfio_[attach/detach]_device
>
>Hi Zhenzhong,
>
>In the commit title I would replace vfio/container by vfio/pci to match
>next patches

Make sense, will do.

>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> From: Eric Auger 
>>
>> We want the VFIO devices to be able to use two different
>> IOMMU callbacks, the legacy VFIO one and the new iommufd one.
>s/callbacks/backends
>>
>> Introduce vfio_[attach/detach]_device which aim at hiding the
>> underlying IOMMU backend (IOCTLs, datatypes, ...).
>
..

>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index ee7509e68e..8016d9f0d2 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -37,7 +37,7 @@ vfio_pci_hot_reset_dep_devices(int domain, int bus, int
>slot, int function, int
>>  vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot
>reset: %s"
>>  vfio_populate_device_config(const char *name, unsigned long size, unsigned
>long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 
>0x%lx,
>flags: 0x%lx"
>>  vfio_populate_device_get_irq_info_failure(const char *errstr)
>"VFIO_DEVICE_GET_IRQ_INFO failure: %s"
>> -vfio_realize(const char *name, int group_id) " (%s) group %d"
>> +vfio_realize(const char *name) " (%s)"
>I am not sure this trace point is useful anymore, without the id. Some
>tracepoints shall be BE specific to keep their usefulness and should be
>called from container.c/iommufd.c instead of in the generic function.

Previously I use this trace event just to hint vfio realize starting.
I agree with you that being BE specific could show more useful information.
I'll fix it in v2.

Thanks
Zhenzhong

RE: [PATCH v1 08/22] vfio/common: Move legacy VFIO backend code into separate container.c

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Eric Auger 
>Sent: Wednesday, September 20, 2023 9:12 PM
>Subject: Re: [PATCH v1 08/22] vfio/common: Move legacy VFIO backend code
>into separate container.c
>
>Hi,
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> From: Yi Liu 
>>
>> Move all the code really dependent on the legacy VFIO container/group
>> into a separate file: container.c. What does remain in common.c is
>> the code related to VFIOAddressSpace, MemoryListeners, migration and
>> all other general operations.
>>
>> Move struct VFIOBitmap declaration to vfio-common.h also for containter.c
>> usage.
>note: this may be done in the 3d patch since vfio_bitmap_alloc could
>land in helpers.c

Good idea, will do.

Thanks
Zhenzhong

RE: [PATCH v1 07/22] vfio/common: Refactor vfio_viommu_preset() to be group agnostic

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Eric Auger 
>Sent: Wednesday, September 20, 2023 9:01 PM
>Subject: Re: [PATCH v1 07/22] vfio/common: Refactor vfio_viommu_preset() to
>be group agnostic
>
>
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> So that it doesn't need to be moved into container.c as done
>> in following patch.
>This is a bit weird to refer to container.c which is not yet created. I
>would suggested just reuse the commit title as a commit msg + this will
>become easier to handle multiple IOMMU BEs

Will fix, thanks Eric.

BRs.
Zhenzhong

RE: [PATCH v1 13/22] vfio: Add base container

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Cédric Le Goater 
>Sent: Wednesday, September 20, 2023 8:58 PM
>Subject: Re: [PATCH v1 13/22] vfio: Add base container
>
>On 9/20/23 10:48, Duan, Zhenzhong wrote:
>>
>>
>>> -Original Message-
>>> From: Cédric Le Goater 
>>> Sent: Wednesday, September 20, 2023 1:24 AM
>>> Subject: Re: [PATCH v1 13/22] vfio: Add base container
>>>
>>> On 8/30/23 12:37, Zhenzhong Duan wrote:
 From: Yi Liu 

 Abstract the VFIOContainer to be a base object. It is supposed to be
 embedded by legacy VFIO container and later on, into the new iommufd
 based container.

 The base container implements generic code such as code related to
 memory_listener and address space management. The VFIOContainerOps
 implements callbacks that depend on the kernel user space being used.

 'common.c' and vfio device code only manipulates the base container with
 wrapper functions that calls the functions defined in 
 VFIOContainerOpsClass.
 Existing 'container.c' code is converted to implement the legacy container
 ops functions.

 Below is the base container. It's named as VFIOContainer, old VFIOContainer
 is replaced with VFIOLegacyContainer.
>>>
>>> Usualy, we introduce the new interface solely, port the current models
>>> on top of the new interface, wire the new models in the current
>>> implementation and remove the old implementation. Then, we can start
>>> adding extensions to support other implementations.
>>
>> Not sure if I understand your point correctly. Do you mean to introduce
>> a new type for the base container as below:
>>
>> static const TypeInfo vfio_container_info = {
>>  .parent = TYPE_OBJECT,
>>  .name   = TYPE_VFIO_CONTAINER,
>>  .class_size = sizeof(VFIOContainerClass),
>>  .instance_size  = sizeof(VFIOContainer),
>>  .abstract   = true,
>>  .interfaces = (InterfaceInfo[]) {
>>  { TYPE_VFIO_IOMMU_BACKEND_OPS },
>>  { }
>>  }
>> };
>>
>> and a new interface as below:
>>
>> static const TypeInfo nvram_info = {
>>  .name = TYPE_VFIO_IOMMU_BACKEND_OPS,
>>  .parent = TYPE_INTERFACE,
>>  .class_size = sizeof(VFIOIOMMUBackendOpsClass),
>> };
>>
>> struct VFIOIOMMUBackendOpsClass {
>>  InterfaceClass parent;
>>  VFIODevice *(*dev_iter_next)(VFIOContainer *container, VFIODevice 
>> *curr);
>>  int (*dma_map)(VFIOContainer *container,
>>  ..
>> };
>>
>> and legacy container on top of TYPE_VFIO_CONTAINER?
>>
>> static const TypeInfo vfio_legacy_container_info = {
>>  .parent = TYPE_VFIO_CONTAINER,
>>  .name = TYPE_VFIO_LEGACY_CONTAINER,
>>  .class_init = vfio_legacy_container_class_init,
>> };
>>
>> This object style is rejected early in RFCv1.
>> See https://lore.kernel.org/kvm/20220414104710.28534-8-yi.l@intel.com/
>
>ouch. this is long ago and I was not aware :/ Bare with me, I will
>probably ask the same questions. Nevertheless, we could improve the
>cover and the flow of changes in the patchset to help the reader.

Sure.

>
>>> spapr should be taken care of separatly following the principle above.
>>> With my PPC hat, I would not even read such a massive change, too risky
>>> for the subsystem. This path will need (much) further splitting to be
>>> understandable and acceptable.
>>
>> I'll digging into this and try to split it.
>
>I know I am asking for a lot of work. Thanks for that.

Np, all comments, suggestions, etc are appreciated 

>
>> Meanwhile, there are many changes
>> just renaming the parameter or function name for code readability.
>> For example:
>>
>> -int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
>> -   ram_addr_t size, IOMMUTLBEntry *iotlb)
>> +static int vfio_legacy_dma_unmap(VFIOContainer *bcontainer, hwaddr iova,
>> +  ram_addr_t size, IOMMUTLBEntry *iotlb)
>>
>> -ret = vfio_get_dirty_bitmap(container, iova, size,
>> +ret = vfio_get_dirty_bitmap(bcontainer, iova, size,
>>
>> Let me know if you think such changes are unnecessary which could reduce
>> this patch largely.
>
>Cleanups, renames, some code reshuffling, anything preparing ground for
>the new abstraction is good to have first and can be merged very quickly
>if there are no functional changes. It reduces the overall patchset and
>ease the coming reviews.
>
>You can send such series independently. That's fine.

Got it.

>
>>
>>>
>>> Also, please include the .h file first, it helps in reading.
>>
>> Do you mean to put struct declaration earlier in patch description?
>
>Just add to your .gitconfig :
>
>[diff]
>   orderFile = /path/to/qemu/scripts/git.orderfile
>
>It should be enough

Understood.

>
>>> Have you considered using an InterfaceClass ?
>>
>> See above, with object style rejected, it looks hard to use InterfaceClass.
>
>I am not convinced by the QOM approach. I will dig in the past arguments
>and let's see what we come

RE: [PATCH v1 17/22] util/char_dev: Add open_cdev()

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Daniel P. Berrangé 
>Sent: Wednesday, September 20, 2023 8:39 PM
>Subject: Re: [PATCH v1 17/22] util/char_dev: Add open_cdev()
>
>On Wed, Aug 30, 2023 at 06:37:49PM +0800, Zhenzhong Duan wrote:
>> From: Yi Liu 
>>
>> /dev/vfio/devices/vfioX may not exist. In that case it is still possible
>> to open /dev/char/$major:$minor instead. Add helper function to abstract
>> the cdev open.
>>
>> Suggested-by: Jason Gunthorpe 
>> Signed-off-by: Yi Liu 
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  MAINTAINERS |  6 
>>  include/qemu/char_dev.h | 16 +++
>>  util/chardev_open.c | 61 +
>
>Using the same naming scheme for the .c and .h is strongly desired.

Got it.

>
>>  util/meson.build|  1 +
>>  4 files changed, 84 insertions(+)
>>  create mode 100644 include/qemu/char_dev.h
>>  create mode 100644 util/chardev_open.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 04663fbb6f..74d18593fe 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3372,6 +3372,12 @@ S: Maintained
>>  F: include/qemu/iova-tree.h
>>  F: util/iova-tree.c
>>
>> +cdev Open
>> +M: Yi Liu 
>> +S: Maintained
>> +F: include/qemu/char_dev.h
>> +F: util/chardev_open.c
>> +
>
>
>> diff --git a/util/chardev_open.c b/util/chardev_open.c
>> new file mode 100644
>> index 00..d03e415131
>> --- /dev/null
>> +++ b/util/chardev_open.c
>> @@ -0,0 +1,61 @@
>> +/*
>> + * Copyright (C) 2023 Intel Corporation.
>> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
>> + *
>> + * Authors: Yi Liu 
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Copied from
>> + * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
>> + *
>> + */
>
>Since this is GPL-2.0-only, IMHO it would be preferrable to keep it
>out of the util/ directory, as we're aiming to not add further 2.0
>only code, except for specific subdirs. This only appears to be used
>by code under hw/vfio/, whcih is one of the dirs still permitting
>2.0-only code. So I think better to keep this file where it is used.

I'll copy the original license header to preserve the GPL OR BSD choice.
As it's not restricted by GPL-2.0-only now, I plan to keep it in util/.
Let me know if you still prefer to move to hw/vifo/.

>
>> +#ifndef _GNU_SOURCE
>> +#define _GNU_SOURCE
>> +#endif
>
>This is set globally for building all files in QEMU

Will remove it.

>
>> +#include "qemu/osdep.h"
>> +#include "qemu/char_dev.h"
>> +
>> +static int open_cdev_internal(const char *path, dev_t cdev)
>> +{
>> +struct stat st;
>> +int fd;
>> +
>> +fd = qemu_open_old(path, O_RDWR);
>> +if (fd == -1) {
>> +return -1;
>> +}
>> +if (fstat(fd, ) || !S_ISCHR(st.st_mode) ||
>> +(cdev != 0 && st.st_rdev != cdev)) {
>> +close(fd);
>> +return -1;
>> +}
>> +return fd;
>> +}
>> +
>> +static int open_cdev_robust(dev_t cdev)
>> +{
>> +char *devpath;
>
>g_autofree for this...

Will do.

>
>> +int ret;
>> +
>> +/*
>> + * This assumes that udev is being used and is creating the /dev/char/
>> + * symlinks.
>> + */
>> +devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
>> +ret = open_cdev_internal(devpath, cdev);
>> +g_free(devpath);
>
>...avoids the need for g_free, and also avoids the need for
>the intermediate 'ret' variable.

Yes.

Thanks
Zhenzhong

RE: [PATCH v1 06/22] vfio/common: Add a vfio device iterator

2023-09-20 Thread Duan, Zhenzhong

Hi Eric,

>-Original Message-
>From: Eric Auger 
>Sent: Wednesday, September 20, 2023 8:26 PM
>Subject: Re: [PATCH v1 06/22] vfio/common: Add a vfio device iterator
>
>Hi Zhenzhong,
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> With a vfio device iterator added, we can make some migration and reset
>> related functions group agnostic.
>> E.x:
>> vfio_mig_active
>> vfio_migratable_device_num
>> vfio_devices_all_dirty_tracking
>> vfio_devices_all_device_dirty_tracking
>> vfio_devices_all_running_and_mig_active
>> vfio_devices_dma_logging_stop
>> vfio_devices_dma_logging_start
>> vfio_devices_query_dirty_bitmap
>> vfio_reset_handler
>>
>> Or else we need to add container specific callback variants for above
>> functions just because they iterate devices based on group.
>>
>> Move the reset handler registration/unregistration to a place that is not
>> group specific, saying first vfio address space created instead of the
>> first group.
>I would move the reset handler registration/unregistration changes in a
>separate patch.

Got it.

>besides,  I don't catch what you mean by
>"saying first vfio address space created instead of the first group."

Before patch, reset hander is registered in first group creation,
after patch, it's registered in first address space creation.
The main purpose is to make this code group agnostic.

For the device iteration part of this patch, I plan to follow Alex's
suggestion to use vfio_device_list for both BEs. Thanks for your
time on this patch.

BRs.
Zhenzhong

RE: [PATCH v1 06/22] vfio/common: Add a vfio device iterator

2023-09-20 Thread Duan, Zhenzhong




>-Original Message-
>From: Alex Williamson 
>Subject: Re: [PATCH v1 06/22] vfio/common: Add a vfio device iterator
>
>On Wed, 30 Aug 2023 18:37:38 +0800
>Zhenzhong Duan  wrote:
>
>> With a vfio device iterator added, we can make some migration and reset
>> related functions group agnostic.
>> E.x:
>> vfio_mig_active
>> vfio_migratable_device_num
>> vfio_devices_all_dirty_tracking
>> vfio_devices_all_device_dirty_tracking
>> vfio_devices_all_running_and_mig_active
>> vfio_devices_dma_logging_stop
>> vfio_devices_dma_logging_start
>> vfio_devices_query_dirty_bitmap
>> vfio_reset_handler
>>
>> Or else we need to add container specific callback variants for above
>> functions just because they iterate devices based on group.
>>
>> Move the reset handler registration/unregistration to a place that is not
>> group specific, saying first vfio address space created instead of the
>> first group.
>>
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  hw/vfio/common.c | 224 ++-
>>  1 file changed, 122 insertions(+), 102 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 949ad6714a..51c6e7598e 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -84,6 +84,26 @@ static int vfio_ram_block_discard_disable(VFIOContainer
>*container, bool state)
>>  }
>>  }
>>
>> +static VFIODevice *vfio_container_dev_iter_next(VFIOContainer *container,
>> +VFIODevice *curr)
>> +{
>> +VFIOGroup *group;
>> +
>> +if (!curr) {
>> +group = QLIST_FIRST(>group_list);
>> +} else {
>> +if (curr->next.le_next) {
>> +return curr->next.le_next;
>> +}
>
>
>VFIODevice *device = QLIST_NEXT(curr, next);
>
>if (device) {
>return device;
>}
>
>> +group = curr->group->container_next.le_next;
>
>
>group = QLIST_NEXT(curr->group, container_next);
>
>> +}
>> +
>> +if (!group) {
>> +return NULL;
>> +}
>> +return QLIST_FIRST(>device_list);
>> +}
>> +
>>  /*
>>   * Device state interfaces
>>   */
>> @@ -112,17 +132,22 @@ static int vfio_get_dirty_bitmap(VFIOContainer
>*container, uint64_t iova,
>>
>>  bool vfio_mig_active(void)
>>  {
>> -VFIOGroup *group;
>> +VFIOAddressSpace *space;
>> +VFIOContainer *container;
>>  VFIODevice *vbasedev;
>>
>> -if (QLIST_EMPTY(_group_list)) {
>> +if (QLIST_EMPTY(_address_spaces)) {
>>  return false;
>>  }
>>
>> -QLIST_FOREACH(group, _group_list, next) {
>> -QLIST_FOREACH(vbasedev, >device_list, next) {
>> -if (vbasedev->migration_blocker) {
>> -return false;
>> +QLIST_FOREACH(space, _address_spaces, list) {
>> +QLIST_FOREACH(container, >containers, next) {
>> +vbasedev = NULL;
>> +while ((vbasedev = vfio_container_dev_iter_next(container,
>> +vbasedev))) {
>> +if (vbasedev->migration_blocker) {
>> +return false;
>> +}
>
>Appears easy to avoid setting vbasedev in the loop iterator and
>improving the scope of vbasedev:
>
>VFIODevice *vbasedev = vfio_container_dev_iter_next(container, NULL);
>
>while (vbasedev) {
>if (vbasedev->migration_blocker) {
>return false;
>}
>
>vbasedev = vfio_container_dev_iter_next(container, vbasedev);
>}
>
>>  }
>>  }
>>  }
>> @@ -133,14 +158,19 @@ static Error *multiple_devices_migration_blocker;
>>
>>  static unsigned int vfio_migratable_device_num(void)
>>  {
>> -VFIOGroup *group;
>> +VFIOAddressSpace *space;
>> +VFIOContainer *container;
>>  VFIODevice *vbasedev;
>>  unsigned int device_num = 0;
>>
>> -QLIST_FOREACH(group, _group_list, next) {
>> -QLIST_FOREACH(vbasedev, >device_list, next) {
>> -if (vbasedev->migration) {
>> -device_num++;
>> +QLIST_FOREACH(space, _address_spaces, list) {
>> +QLIST_FOREACH(container, >containers, next) {
>> +vbasedev = NULL;
>> +while ((vbasedev = vfio_container_dev_iter_next(container,
>> +vbasedev))) {
>> +if (vbasedev->migration) {
>> +device_num++;
>> +}
>
>Same as above.
>
>>  }
>>  }
>>  }
>> @@ -207,8 +237,7 @@ static void vfio_set_migration_error(int err)
>>
>>  static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>  {
>> -VFIOGroup *group;
>> -VFIODevice *vbasedev;
>> +VFIODevice *vbasedev = NULL;
>>  MigrationState *ms = migrate_get_current();
>>
>>  if (ms->state != MIGRATION_STATUS_ACTIVE &&
>> @@ -216,19 +245,17 @@ static bool
>vfio_devices_all_dirty_tracking(VFIOContainer *container)
>>  return false;
>>  }
>>
>> -QLIST_FOREACH(group, >group_list, container_next) {
>> -

RE: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Cédric Le Goater 
>Sent: Wednesday, September 20, 2023 8:20 PM
>Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>
>On 9/20/23 05:42, Duan, Zhenzhong wrote:
>>
>>
>>> -Original Message-
>>> From: Cédric Le Goater 
>>> Sent: Wednesday, September 20, 2023 1:08 AM
>>> Subject: Re: [PATCH v1 15/22] Add iommufd configure option
>>>
>>> On 8/30/23 12:37, Zhenzhong Duan wrote:
 This adds "--enable-iommufd/--disable-iommufd" to enable or disable
 iommufd support, enabled by default.
>>>
>>> Why would someone want to disable support at compile time ? It might
>>
>> For those users who only want to support legacy container feature?
>> Let me know if you still prefer to drop this patch, I'm fine with that.
>
>I think it is too early.
>
>>> have been useful for dev but now QEMU should self-adjust at runtime
>>> depending only on the host capabilities AFAIUI. Am I missing something ?
>>
>> IOMMUFD doesn't support all features of legacy container, so QEMU
>> doesn't self-adjust at runtime by checking if host supports IOMMUFD.
>> We need to specify it explicitly to use IOMMUFD as below:
>>
>>  -object iommufd,id=iommufd0
>>  -device vfio-pci,host=:02:00.0,iommufd=iommufd0
>
>OK. I am not sure this is the correct interface yet. At first glance,
>I wouldn't introduce a new object for a simple backend depending on a
>kernel interface. I would tend to prefer a "iommu-something" property
>of the vfio-pci device with string values: "legacy", "iommufd", "default"
>and define the various interfaces (the ops you proposed) for each
>depending on the user preference and the capabilities of the host and
>possibly the device.
>
>I might be wrong and this might have been discussed before. If so, it
>should go in the cover letter with other things : what is this patchset
>providing to VFIO (multiple iommu backends), how it is reaching that
>goal, how is it organized, how do we deal with the special case (spapr),
>what's the user interface, etc.

Got it, I'll add " how is it organized, how do we deal with the special case 
(spapr)"
part, other parts seems already in cover letter, there is a diagram showing
the architecture of VFIO/legacy BE/IOMMUFD BE, etc.

Thanks
Zhenzhong

RE: [PATCH v1 05/22] vfio/common: Extract out vfio_kvm_device_[add/del]_fd

2023-09-20 Thread Duan, Zhenzhong



>-Original Message-
>From: Eric Auger 
>Sent: Wednesday, September 20, 2023 7:49 PM
>Subject: Re: [PATCH v1 05/22] vfio/common: Extract out
>vfio_kvm_device_[add/del]_fd
>
>Hi Zhenzhong,
>
>On 8/30/23 12:37, Zhenzhong Duan wrote:
>> ...which will be used by both legacy and iommufd backend.
>I prefer genuine sentences in the commit msg. Also you explain what you
>do but not why.
>
>suggestion: Introduce two new helpers, vfio_kvm_device_[add/del]_fd
>which take as input a file descriptor which can be either a group fd or
>a cdev fd. This uses the new KVM_DEV_VFIO_FILE VFIO KVM device group,
>which aliases to the legacy KVM_DEV_VFIO_GROUP.
>
>vfio_kvm_device_add/del_group then call those new helpers.

Thanks, will update in v2.

>
>
>
>>
>> Signed-off-by: Yi Liu 
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  hw/vfio/common.c  | 44 +++
>>  include/hw/vfio/vfio-common.h |  3 +++
>>  2 files changed, 32 insertions(+), 15 deletions(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 67150e4575..949ad6714a 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1759,17 +1759,17 @@ void vfio_reset_handler(void *opaque)
>>  }
>>  }
>>
>> -static void vfio_kvm_device_add_group(VFIOGroup *group)
>> +int vfio_kvm_device_add_fd(int fd)
>>  {
>>  #ifdef CONFIG_KVM
>>  struct kvm_device_attr attr = {
>> -.group = KVM_DEV_VFIO_GROUP,
>> -.attr = KVM_DEV_VFIO_GROUP_ADD,
>> -.addr = (uint64_t)(unsigned long)>fd,
>> +.group = KVM_DEV_VFIO_FILE,
>> +.attr = KVM_DEV_VFIO_FILE_ADD,
>> +.addr = (uint64_t)(unsigned long),
>>  };
>>
>>  if (!kvm_enabled()) {
>> -return;
>> +return 0;
>>  }
>>
>>  if (vfio_kvm_device_fd < 0) {
>> @@ -1779,37 +1779,51 @@ static void
>vfio_kvm_device_add_group(VFIOGroup *group)
>>
>>  if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, )) {
>>  error_report("Failed to create KVM VFIO device: %m");
>> -return;
>> +return -ENODEV;
>can't you return -errno?
Will fix.

>>  }
>>
>>  vfio_kvm_device_fd = cd.fd;
>>  }
>>
>>  if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, )) {
>> -error_report("Failed to add group %d to KVM VFIO device: %m",
>> - group->groupid);
>> +error_report("Failed to add fd %d to KVM VFIO device: %m",
>> + fd);
>> +return -errno;
>>  }
>>  #endif
>> +return 0;
>>  }
>>
>> -static void vfio_kvm_device_del_group(VFIOGroup *group)
>> +static void vfio_kvm_device_add_group(VFIOGroup *group)
>> +{
>> +vfio_kvm_device_add_fd(group->fd);
>Since vfio_kvm_device_add_fd now returns an error value, it's a pity not
>to use it and propagate it. Also you could fill an errp with the error
>msg and use it in vfio_connect_container(). But this is a new error
>handling there.

What about having vfio_kvm_device_add_fd return void as
vfio_kvm_device_add_group. I just realize vfio_connect_container()
doesn't get any failure of vfio_kvm_device_add_group, propagating
err to vfio_connect_container() is just to print it out there which I have
done in vfio_kvm_device_add_fd.

>> +}
>> +
>> +int vfio_kvm_device_del_fd(int fd)
>not sure we want this to return an error. But if we do, I think it would
>be nicer to propagate the error up.

Same question as above.

>>  {
>>  #ifdef CONFIG_KVM
>>  struct kvm_device_attr attr = {
>> -.group = KVM_DEV_VFIO_GROUP,
>> -.attr = KVM_DEV_VFIO_GROUP_DEL,
>> -.addr = (uint64_t)(unsigned long)>fd,
>> +.group = KVM_DEV_VFIO_FILE,
>> +.attr = KVM_DEV_VFIO_FILE_DEL,
>> +.addr = (uint64_t)(unsigned long),
>>  };
>>
>>  if (vfio_kvm_device_fd < 0) {
>> -return;
>> +return -EINVAL;
>>  }
>>
>>  if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, )) {
>> -error_report("Failed to remove group %d from KVM VFIO device: %m",
>> - group->groupid);
>> +error_report("Failed to remove fd %d from KVM VFIO device: %m",
>> + fd);
>> +return -EBADF;
>-errno?
Sure.

Thanks
Zhenzhong

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-20 Thread Zhijian Li (Fujitsu)

Sorry to all, i forgot to update my email address to lizhij...@fujitsu.com.

Corrected it.


On 20/09/2023 17:04, Li Zhijian wrote:
> From: Li Zhijian 
> 
> Destination will fail with:
> qemu-system-x86_64: rdma: Too many requests in this message 
> (3638950032).Bailing.
> 
> migrate with RDMA is different from tcp. RDMA has its own control
> message, and all traffic between RDMA_CONTROL_REGISTER_REQUEST and
> RDMA_CONTROL_REGISTER_FINISHED should not be disturbed.
> 
> find_dirty_block() will be called during RDMA_CONTROL_REGISTER_REQUEST
> and RDMA_CONTROL_REGISTER_FINISHED, it will send a extra traffic to
> destination and cause migration to fail.
> 
> Since there's no existing subroutine to indicate whether it's migrated
> by RDMA or not, and RDMA is not compatible with multifd, we use
> migrate_multifd() here.
> 
> Fixes: 294e5a4034 ("multifd: Only flush once each full round of memory")
> Signed-off-by: Li Zhijian 
> ---
>   migration/ram.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 9040d66e61..89ae28e21a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1399,7 +1399,8 @@ static int find_dirty_block(RAMState *rs, 
> PageSearchStatus *pss)
>   pss->page = 0;
>   pss->block = QLIST_NEXT_RCU(pss->block, next);
>   if (!pss->block) {
> -if (!migrate_multifd_flush_after_each_section()) {
> +if (migrate_multifd() &&
> +!migrate_multifd_flush_after_each_section()) {
>   QEMUFile *f = rs->pss[RAM_CHANNEL_PRECOPY].pss_channel;
>   int ret = multifd_send_sync_main(f);
>   if (ret < 0) {

Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-20 Thread Zhijian Li (Fujitsu)

On 20/09/2023 21:01, Fabiano Rosas wrote:
> Li Zhijian  writes:
> 
>> From: Li Zhijian 
>>
>> Previously, we got a confusion error that complains
>> the RDMAControlHeader.repeat:
>> qemu-system-x86_64: rdma: Too many requests in this message 
>> (3638950032).Bailing.
>>
>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>> After this patch, error will become:
>> qemu-system-x86_64: Unknown control message QEMU FILE
>>
>> Signed-off-by: Li Zhijian 
>> ---
>>   migration/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index a2a3db35b1..3073d9953c 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>   size_t remaining = iov[i].iov_len;
>>   uint8_t * data = (void *)iov[i].iov_base;
>>   while (remaining) {
>> -RDMAControlHeader head;
>> +RDMAControlHeader head = {};
>>   
>>   len = MIN(remaining, RDMA_SEND_INCREMENT);
>>   remaining -= len;
> 

2815 RDMAControlHeader head = {};
2816
2817 len = MIN(remaining, RDMA_SEND_INCREMENT);
2818 remaining -= len;
2819
2820 head.len = len;
2821 head.type = RDMA_CONTROL_QEMU_FILE;
2822
2823 ret = qemu_rdma_exchange_send(rdma, , data, NULL, NULL, 
NULL);

> I'm struggling to see how head is used before we set the type a couple
> of lines below. Could you expand on it?

IIUC, head is used for both common migration control path and RDMA specific 
control path.

hook_stage(RAM_SAVE_FLAG_HOOK) {
rdma_hook_process(qemu_rdma_registration_handle) {
   do {
   // this is a RDMA own control block, should not be disturbed by the 
common migration control path.
   // head will be extracted and processed here.
   // qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, which 
is an unexpected message for this block.
   // head.repeat will be examined before the type, so an uninitialized 
repeat will confuse us here.
   } while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
}
}

when qio_channel_rdma_writev() is used for common migration control path, 
repeat is useless and will not be examined.

With this patch, we can quickly know the cause.

> 
> Also, a smoke test could have caught both issues early on. Is there any
> reason for not having any?

i have no idea yet :)

Thanks
Zhijian

[PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error

2023-09-20 Thread “William Roche

From: William Roche 

A Qemu VM can survive a memory error, as qemu can relay the error to the
VM kernel which could also deal with it -- poisoning/off-lining the impacted
page.
This situation creates a hole in the VM memory address space that the VM kernel
knows about (an unreadable page or set of pages).

But the migration of this VM (live migration through the network or
pseudo-migration with the creation of a state file) will crash Qemu when
it sequentially reads the memory address space and stumbles on the
existing hole.

In order to correct this problem, I suggest to treat the poisoned pages as if
they were zero-pages for the migration copy.
This fix also works with underlying large pages, taking into account the
RAMBlock segment "page-size".
This fix is scripts/checkpatch.pl clean.

v2:
  - adding compressed transfer handling of poisoned pages
 
Testing: I could verify that migration now works with a poisoned page
through standard and compressed migration with 4k and large (2M) pages.

v3:
  - Included the Reviewed-by and Tested-by information
  - added a TODO comment above control_save_page()
mentioning Zhijian's feedback about migration failure.


William Roche (1):
  migration: skip poisoned memory pages on "ram saving" phase

 accel/kvm/kvm-all.c  | 14 ++
 accel/stubs/kvm-stub.c   |  5 +
 include/sysemu/kvm.h | 10 ++
 migration/ram-compress.c |  3 ++-
 migration/ram.c  | 24 ++--
 migration/ram.h  |  2 ++
 6 files changed, 55 insertions(+), 3 deletions(-)

-- 
2.39.3

[PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-20 Thread “William Roche

From: William Roche 

A memory page poisoned from the hypervisor level is no longer readable.
Thus, it is now treated as a zero-page for the ram saving migration phase.

The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page_to_file
#5  save_zero_page
#6  ram_save_target_page_legacy
#7  ram_save_host_page
#8  ram_find_and_save_block
#9  ram_save_iterate
#10 qemu_savevm_state_iterate
#11 migration_iteration_run
#12 migration_thread
#13 qemu_thread_start

Fix it by considering poisoned pages as if they were zero-pages for
the migration copy. This fix also works with underlying large pages,
taking into account the RAMBlock segment "page-size".

Standard migration and compressed transfers are handled by this code.
RDMA transfer isn't touched.

Reviewed-by: Peter Xu 
Tested-by: Li Zhijian  # RDMA
Signed-off-by: William Roche 
---
 accel/kvm/kvm-all.c  | 14 ++
 accel/stubs/kvm-stub.c   |  5 +
 include/sysemu/kvm.h | 10 ++
 migration/ram-compress.c |  3 ++-
 migration/ram.c  | 24 ++--
 migration/ram.h  |  2 ++
 6 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index ff1578bb32..7fb13c8a56 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1152,6 +1152,20 @@ static void kvm_unpoison_all(void *param)
 }
 }
 
+bool kvm_hwpoisoned_page(RAMBlock *block, void *offset)
+{
+HWPoisonPage *pg;
+ram_addr_t ram_addr = (ram_addr_t) offset;
+
+QLIST_FOREACH(pg, _page_list, list) {
+if ((ram_addr >= pg->ram_addr) &&
+(ram_addr - pg->ram_addr < block->page_size)) {
+return true;
+}
+}
+return false;
+}
+
 void kvm_hwpoison_page_add(ram_addr_t ram_addr)
 {
 HWPoisonPage *page;
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 235dc661bc..c0a31611df 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -133,3 +133,8 @@ uint32_t kvm_dirty_ring_size(void)
 {
 return 0;
 }
+
+bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr)
+{
+return false;
+}
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index ee9025f8e9..858688227a 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -570,4 +570,14 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+/**
+ * kvm_hwpoisoned_page - indicate if the given page is poisoned
+ * @block: memory block of the given page
+ * @ram_addr: offset of the page
+ *
+ * Returns: true: page is poisoned
+ *  false: page not yet poisoned
+ */
+bool kvm_hwpoisoned_page(RAMBlock *block, void *ram_addr);
 #endif
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 06254d8c69..1916ce709d 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -34,6 +34,7 @@
 #include "qemu/error-report.h"
 #include "migration.h"
 #include "options.h"
+#include "ram.h"
 #include "io/channel-null.h"
 #include "exec/target_page.h"
 #include "exec/ramblock.h"
@@ -198,7 +199,7 @@ static CompressResult do_compress_ram_page(QEMUFile *f, 
z_stream *stream,
 
 assert(qemu_file_buffer_empty(f));
 
-if (buffer_is_zero(p, page_size)) {
+if (migration_buffer_is_zero(block, offset, page_size)) {
 return RES_ZEROPAGE;
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..21357666dc 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1129,6 +1129,26 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
 ram_discard_range(rbname, offset, TARGET_PAGE_SIZE);
 }
 
+/**
+ * migration_buffer_is_zero: indicate if the page at the given
+ * location is entirely filled with zero, or is a poisoned page.
+ *
+ * @block: block that contains the page
+ * @offset: offset inside the block for the page
+ * @len: size to consider
+ */
+bool migration_buffer_is_zero(RAMBlock *block, ram_addr_t offset,
+ size_t len)
+{
+uint8_t *p = block->host + offset;
+
+if (kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) {
+return true;
+}
+
+return buffer_is_zero(p, len);
+}
+
 /**
  * save_zero_page_to_file: send the zero page to the file
  *
@@ -1142,10 +1162,9 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
 static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
   RAMBlock *block, ram_addr_t offset)
 {
-uint8_t *p = block->host + offset;
 int len = 0;
 
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+if (migration_buffer_is_zero(block, offset, TARGET_PAGE_SIZE)) {
 len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);

Re: [PATCH v1 07/22] vfio/common: Refactor vfio_viommu_preset() to be group agnostic

2023-09-20 Thread Alex Williamson

On Wed, 30 Aug 2023 18:37:39 +0800
Zhenzhong Duan  wrote:

> So that it doesn't need to be moved into container.c as done
> in following patch.
> 
> Signed-off-by: Zhenzhong Duan 
> ---
>  hw/vfio/common.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 51c6e7598e..fda5fc87b9 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -219,7 +219,22 @@ void vfio_unblock_multiple_devices_migration(void)
>  
>  bool vfio_viommu_preset(VFIODevice *vbasedev)
>  {
> -return vbasedev->group->container->space->as != _space_memory;
> +VFIOAddressSpace *space;
> +VFIOContainer *container;
> +VFIODevice *tmp_dev;
> +
> +QLIST_FOREACH(space, _address_spaces, list) {
> +QLIST_FOREACH(container, >containers, next) {
> +tmp_dev = NULL;
> +while ((tmp_dev = vfio_container_dev_iter_next(container,
> +   tmp_dev))) {
> +if (vbasedev == tmp_dev) {
> +return space->as != _space_memory;
> +}
> +}
> +}
> +}
> +g_assert_not_reached();

Should the VFIODevice just have a pointer to the VFIOAddressSpace?
Thanks,

Alex


>  }
>  
>  static void vfio_set_migration_error(int err)

[PATCH v2 09/11] hw/net: GMAC Rx Implementation

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

- Implementation of Receive function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Rx

NOTE: At this point in development we believe this function is working
as intended, and the kernel supports these findings, but we need the
Transmit function to work before we upload

Signed-off-by: Nabih Estefan Diaz 

hw/net: npcm_gmac Flush queued packets when starting RX

When RX starts, we need to flush the queued packets so that they
can be received by the GMAC device. Without this it won't work
with TAP NIC device.

Signed-off-by: Hao Wu 

hw/net: Handle RX desc full in NPCM GMAC

When RX descriptor list is full, it returns a DMA_STATUS for software to handle 
it. But there's no way to indicate the software ha handled all RX descriptors 
and the whole pipeline stalls.

We do something similar to NPCM7XX EMC to handle this case.

1. Return packet size when RX descriptor is full, effectively dropping these 
packets in such a case.
2. When software clears RX descriptor full bit, continue receiving further 
packets by flushing QEMU packet queue.

Signed-off-by: Hao Wu 

hw/net: Receive and drop packets when descriptors are full in GMAC

Effectively this allows QEMU to receive and drop incoming packets when
RX descriptors are full. Similar to EMC, this lets GMAC to drop packets
faster, especially during bootup sequence.

Signed-off-by: Hao Wu 
---
 hw/net/npcm_gmac.c | 353 ++---
 include/hw/net/npcm_gmac.h |  28 +--
 2 files changed, 339 insertions(+), 42 deletions(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 6f8109e0ee..67f123e3c4 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -23,7 +23,11 @@
 #include "hw/registerfields.h"
 #include "hw/net/mii.h"
 #include "hw/net/npcm_gmac.h"
+#include "linux/if_ether.h"
 #include "migration/vmstate.h"
+#include "net/checksum.h"
+#include "net/net.h"
+#include "qemu/cutils.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
 #include "sysemu/dma.h"
@@ -91,7 +95,6 @@ REG32(NPCM_GMAC_PTP_TTSR, 0x71c)
 #define NPCM_DMA_BUS_MODE_SWR   BIT(0)
 
 static const uint32_t npcm_gmac_cold_reset_values[NPCM_GMAC_NR_REGS] = {
-/* Reduce version to 3.2 so that the kernel can enable interrupt. */
 [R_NPCM_GMAC_VERSION] = 0x1032,
 [R_NPCM_GMAC_TIMER_CTRL]  = 0x03e8,
 [R_NPCM_GMAC_MAC0_ADDR_HI]= 0x8000,
@@ -146,6 +149,17 @@ static void gmac_phy_set_link(NPCMGMACState *s, bool 
active)
 
 static bool gmac_can_receive(NetClientState *nc)
 {
+NPCMGMACState *gmac = NPCM_GMAC(qemu_get_nic_opaque(nc));
+
+/* If GMAC receive is disabled. */
+if (!(gmac->regs[R_NPCM_GMAC_MAC_CONFIG] & NPCM_GMAC_MAC_CONFIG_RX_EN)) {
+return false;
+}
+
+/* If GMAC DMA RX is stopped. */
+if (!(gmac->regs[R_NPCM_DMA_CONTROL] & NPCM_DMA_CONTROL_START_STOP_RX)) {
+return false;
+}
 return true;
 }
 
@@ -191,11 +205,285 @@ static void gmac_update_irq(NPCMGMACState *gmac)
 qemu_set_irq(gmac->irq, level);
 }
 
-static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+static int gmac_read_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
 {
-/* Placeholder */
+if (dma_memory_read(_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->rdes0 = le32_to_cpu(desc->rdes0);
+desc->rdes1 = le32_to_cpu(desc->rdes1);
+desc->rdes2 = le32_to_cpu(desc->rdes2);
+desc->rdes3 = le32_to_cpu(desc->rdes3);
+return 0;
+}
+
+static int gmac_write_rx_desc(dma_addr_t addr, struct NPCMGMACRxDesc *desc)
+{
+struct NPCMGMACRxDesc le_desc;
+le_desc.rdes0 = cpu_to_le32(desc->rdes0);
+le_desc.rdes1 = cpu_to_le32(desc->rdes1);
+le_desc.rdes2 = cpu_to_le32(desc->rdes2);
+le_desc.rdes3 = cpu_to_le32(desc->rdes3);
+if (dma_memory_write(_space_memory, addr, _desc,
+sizeof(le_desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to write descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+return 0;
+}
+
+static int gmac_read_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc *desc)
+{
+if (dma_memory_read(_space_memory, addr, desc,
+sizeof(*desc), MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read descriptor @ 0x%"
+  HWADDR_PRIx "\n", __func__, addr);
+return -1;
+}
+desc->tdes0 = le32_to_cpu(desc->tdes0);
+desc->tdes1 = le32_to_cpu(desc->tdes1);
+desc->tdes2 = le32_to_cpu(desc->tdes2);
+desc->tdes3 = le32_to_cpu(desc->tdes3);
+return 0;
+}
+
+static int gmac_write_tx_desc(dma_addr_t addr, struct NPCMGMACTxDesc

[PATCH v2 10/11] hw/net: GMAC Tx Implementation

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

- Implementation of Transmit function for packets
- Implementation for reading and writing from and to descriptors in
  memory for Tx

NOTE: This function implements the steps detailed in the datasheet for
transmitting messages from the GMAC.

Signed-off-by: Nabih Estefan Diaz 
---
 hw/net/npcm_gmac.c | 152 +
 1 file changed, 152 insertions(+)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 67f123e3c4..678c30dfba 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -266,6 +266,7 @@ static int gmac_write_tx_desc(dma_addr_t addr, struct 
NPCMGMACTxDesc *desc)
 }
 return 0;
 }
+
 static int gmac_rx_transfer_frame_to_buffer(uint32_t rx_buf_len,
 uint32_t *left_frame,
 uint32_t rx_buf_addr,
@@ -484,6 +485,157 @@ static ssize_t gmac_receive(NetClientState *nc, const 
uint8_t *buf, size_t len)
 gmac->regs[R_NPCM_DMA_HOST_RX_DESC] = desc_addr;
 return len;
 }
+
+static int gmac_tx_get_csum(uint32_t tdes1)
+{
+uint32_t mask = TX_DESC_TDES1_CHKSM_INS_CTRL_MASK(tdes1);
+int csum = 0;
+
+if (likely(mask > 0)) {
+csum |= CSUM_IP;
+}
+if (likely(mask > 1)) {
+csum |= CSUM_TCP | CSUM_UDP;
+}
+
+return csum;
+}
+
+static void gmac_try_send_next_packet(NPCMGMACState *gmac)
+{
+/*
+ * Comments about steps refer to steps for
+ * transmitting in page 384 of datasheet
+ */
+uint16_t tx_buffer_size = 2048;
+g_autofree uint8_t *tx_send_buffer = g_malloc(tx_buffer_size);
+uint32_t desc_addr;
+struct NPCMGMACTxDesc tx_desc;
+uint32_t tx_buf_addr, tx_buf_len;
+uint16_t length = 0;
+uint8_t *buf = tx_send_buffer;
+uint32_t prev_buf_size = 0;
+int csum = 0;
+
+/* steps 1&2 */
+if (!gmac->regs[R_NPCM_DMA_HOST_TX_DESC]) {
+gmac->regs[R_NPCM_DMA_HOST_TX_DESC] =
+NPCM_DMA_HOST_TX_DESC_MASK(gmac->regs[R_NPCM_DMA_TX_BASE_ADDR]);
+}
+desc_addr = gmac->regs[R_NPCM_DMA_HOST_TX_DESC];
+
+while (true) {
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_FETCHING_STATE);
+trace_npcm_gmac_packet_transmit(DEVICE(gmac)->canonical_path, length);
+if (gmac_read_tx_desc(desc_addr, _desc)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x can't be read\n",
+  desc_addr);
+return;
+}
+/* step 3 */
+
+trace_npcm_gmac_packet_desc_read(DEVICE(gmac)->canonical_path,
+desc_addr);
+trace_npcm_gmac_debug_desc_data(DEVICE(gmac)->canonical_path, _desc,
+tx_desc.tdes0, tx_desc.tdes1, tx_desc.tdes2, tx_desc.tdes3);
+
+/* 1 = DMA Owned, 0 = Software Owned */
+if (!(tx_desc.tdes0 & TX_DESC_TDES0_OWN)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "TX Descriptor @ 0x%x is owned by software\n",
+  desc_addr);
+gmac->regs[R_NPCM_DMA_STATUS] |= NPCM_DMA_STATUS_TU;
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_SUSPENDED_STATE);
+gmac_update_irq(gmac);
+return;
+}
+
+gmac_dma_set_state(gmac, NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT,
+NPCM_DMA_STATUS_TX_RUNNING_READ_STATE);
+/* Give the descriptor back regardless of what happens. */
+tx_desc.tdes0 &= ~TX_DESC_TDES0_OWN;
+
+if (tx_desc.tdes1 & TX_DESC_TDES1_FIRST_SEG_MASK) {
+csum = gmac_tx_get_csum(tx_desc.tdes1);
+}
+
+/* step 4 */
+tx_buf_addr = tx_desc.tdes2;
+gmac->regs[R_NPCM_DMA_CUR_TX_BUF_ADDR] = tx_buf_addr;
+tx_buf_len = TX_DESC_TDES1_BFFR1_SZ_MASK(tx_desc.tdes1);
+buf = _send_buffer[prev_buf_size];
+
+if ((prev_buf_size + tx_buf_len) > sizeof(buf)) {
+tx_buffer_size = prev_buf_size + tx_buf_len;
+tx_send_buffer = g_realloc(tx_send_buffer, tx_buffer_size);
+buf = _send_buffer[prev_buf_size];
+}
+
+/* step 5 */
+if (dma_memory_read(_space_memory, tx_buf_addr, buf,
+tx_buf_len, MEMTXATTRS_UNSPECIFIED)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Failed to read packet @ 
0x%x\n",
+__func__, tx_buf_addr);
+return;
+}
+length += tx_buf_len;
+prev_buf_size += tx_buf_len;
+
+/* If not chained we'll have a second buffer. */
+if (!(tx_desc.tdes1 & TX_DESC_TDES1_SEC_ADDR_CHND_MASK)) {
+tx_buf_addr = tx_desc.tdes3;
+gmac->regs[R_NPCM_DMA_CUR_TX_BUF_ADDR] = tx_buf_addr;
+tx_buf_len = TX_DESC_TDES1_BFFR2_SZ_MASK(tx_desc.tdes1);
+buf = _send_buffer[prev_buf_size];
+
+if

[PATCH v2 11/11] tests/qtest: Adding PCS Module test to GMAC Qtest

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

 - Add PCS Register check to npcm_gmac-test

Signed-off-by: Nabih Estefan Diaz 
---
 tests/qtest/npcm_gmac-test.c | 134 ++-
 1 file changed, 133 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
index 84511fd915..1f0ad664f4 100644
--- a/tests/qtest/npcm_gmac-test.c
+++ b/tests/qtest/npcm_gmac-test.c
@@ -20,6 +20,10 @@
 /* Name of the GMAC Device */
 #define TYPE_NPCM_GMAC "npcm-gmac"
 
+/* Address of the PCS Module */
+#define PCS_BASE_ADDRESS 0xf078
+#define NPCM_PCS_IND_AC_BA 0x1fe
+
 typedef struct GMACModule {
 int irq;
 uint64_t base_addr;
@@ -111,6 +115,62 @@ typedef enum NPCMRegister {
 NPCM_GMAC_PTP_STNSUR = 0x714,
 NPCM_GMAC_PTP_TAR = 0x718,
 NPCM_GMAC_PTP_TTSR = 0x71c,
+
+/* PCS Registers */
+NPCM_PCS_SR_CTL_ID1 = 0x3c0008,
+NPCM_PCS_SR_CTL_ID2 = 0x3c000a,
+NPCM_PCS_SR_CTL_STS = 0x3c0010,
+
+NPCM_PCS_SR_MII_CTRL = 0x3e,
+NPCM_PCS_SR_MII_STS = 0x3e0002,
+NPCM_PCS_SR_MII_DEV_ID1 = 0x3e0004,
+NPCM_PCS_SR_MII_DEV_ID2 = 0x3e0006,
+NPCM_PCS_SR_MII_AN_ADV = 0x3e0008,
+NPCM_PCS_SR_MII_LP_BABL = 0x3e000a,
+NPCM_PCS_SR_MII_AN_EXPN = 0x3e000c,
+NPCM_PCS_SR_MII_EXT_STS = 0x3e001e,
+
+NPCM_PCS_SR_TIM_SYNC_ABL = 0x3e0e10,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR = 0x3e0e12,
+NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR = 0x3e0e14,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR = 0x3e0e16,
+NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR = 0x3e0e18,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR = 0x3e0e1a,
+NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR = 0x3e0e1c,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR = 0x3e0e1e,
+NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR = 0x3e0e20,
+
+NPCM_PCS_VR_MII_MMD_DIG_CTRL1 = 0x3f,
+NPCM_PCS_VR_MII_AN_CTRL = 0x3f0002,
+NPCM_PCS_VR_MII_AN_INTR_STS = 0x3f0004,
+NPCM_PCS_VR_MII_TC = 0x3f0006,
+NPCM_PCS_VR_MII_DBG_CTRL = 0x3f000a,
+NPCM_PCS_VR_MII_EEE_MCTRL0 = 0x3f000c,
+NPCM_PCS_VR_MII_EEE_TXTIMER = 0x3f0010,
+NPCM_PCS_VR_MII_EEE_RXTIMER = 0x3f0012,
+NPCM_PCS_VR_MII_LINK_TIMER_CTRL = 0x3f0014,
+NPCM_PCS_VR_MII_EEE_MCTRL1 = 0x3f0016,
+NPCM_PCS_VR_MII_DIG_STS = 0x3f0020,
+NPCM_PCS_VR_MII_ICG_ERRCNT1 = 0x3f0022,
+NPCM_PCS_VR_MII_MISC_STS = 0x3f0030,
+NPCM_PCS_VR_MII_RX_LSTS = 0x3f0040,
+NPCM_PCS_VR_MII_MP_TX_BSTCTRL0 = 0x3f0070,
+NPCM_PCS_VR_MII_MP_TX_LVLCTRL0 = 0x3f0074,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL0 = 0x3f007a,
+NPCM_PCS_VR_MII_MP_TX_GENCTRL1 = 0x3f007c,
+NPCM_PCS_VR_MII_MP_TX_STS = 0x3f0090,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL0 = 0x3f00b0,
+NPCM_PCS_VR_MII_MP_RX_GENCTRL1 = 0x3f00b2,
+NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0 = 0x3f00ba,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL0 = 0x3f00f0,
+NPCM_PCS_VR_MII_MP_MPLL_CTRL1 = 0x3f00f2,
+NPCM_PCS_VR_MII_MP_MPLL_STS = 0x3f0110,
+NPCM_PCS_VR_MII_MP_MISC_CTRL2 = 0x3f0126,
+NPCM_PCS_VR_MII_MP_LVL_CTRL = 0x3f0130,
+NPCM_PCS_VR_MII_MP_MISC_CTRL0 = 0x3f0132,
+NPCM_PCS_VR_MII_MP_MISC_CTRL1 = 0x3f0134,
+NPCM_PCS_VR_MII_DIG_CTRL2 = 0x3f01c2,
+NPCM_PCS_VR_MII_DIG_ERRCNT_SEL = 0x3f01c4,
 } NPCMRegister;
 
 static uint32_t gmac_read(QTestState *qts, const GMACModule *mod,
@@ -119,6 +179,15 @@ static uint32_t gmac_read(QTestState *qts, const 
GMACModule *mod,
 return qtest_readl(qts, mod->base_addr + regno);
 }
 
+static uint16_t pcs_read(QTestState *qts, const GMACModule *mod,
+  NPCMRegister regno)
+{
+uint32_t write_value = (regno & 0x3ffe00) >> 9;
+qtest_writel(qts, PCS_BASE_ADDRESS + NPCM_PCS_IND_AC_BA, write_value);
+uint32_t read_offset = regno & 0x1ff;
+return qtest_readl(qts, PCS_BASE_ADDRESS + read_offset);
+}
+
 /* Check that GMAC registers are reset to default value */
 static void test_init(gconstpointer test_data)
 {
@@ -129,7 +198,12 @@ static void test_init(gconstpointer test_data)
 #define CHECK_REG32(regno, value) \
 do { \
 g_assert_cmphex(gmac_read(qts, mod, (regno)), ==, (value)); \
-} while (0)
+} while (0) ;
+
+#define CHECK_REG_PCS(regno, value) \
+do { \
+g_assert_cmphex(pcs_read(qts, mod, (regno)), ==, (value)); \
+} while (0) ;
 
 CHECK_REG32(NPCM_DMA_BUS_MODE, 0x00020100);
 CHECK_REG32(NPCM_DMA_XMT_POLL_DEMAND, 0);
@@ -180,6 +254,64 @@ static void test_init(gconstpointer test_data)
 CHECK_REG32(NPCM_GMAC_PTP_TAR, 0);
 CHECK_REG32(NPCM_GMAC_PTP_TTSR, 0);
 
+/* TODO Add registers PCS */
+if (mod->base_addr == 0xf0802000) {
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID1, 0x699e)
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_ID2, 0)
+CHECK_REG_PCS(NPCM_PCS_SR_CTL_STS, 0x8000)
+
+CHECK_REG_PCS(NPCM_PCS_SR_MII_CTRL, 0x1140)
+CHECK_REG_PCS(NPCM_PCS_SR_MII_STS, 0x0109)
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID1, 0x699e)
+CHECK_REG_PCS(NPCM_PCS_SR_MII_DEV_ID2, 0x0ced0)
+CHECK_REG_PCS(NPCM_PCS_SR_MII_AN_ADV, 0x0020)
+

[PATCH v2 08/11] hw/net: General GMAC Implementation

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

- General GMAC Register handling
- GMAC IRQ Handling
- Added traces in some methods for debugging
- Lots of declarations for accessing information on GMAC Descriptors 
(npcm_gmac.h file)

NOTE: With code on this state, the GMAC can boot-up properly and will show up 
in the ifconfig command on the BMC

Google-Rebase-Count: 1
Signed-off-by: Nabih Estefan Diaz 
Google-Bug-Id: 237557100
Change-Id: I3a4332ee5bab31b919782031a77c5b943f45ca2f
---
 include/hw/net/npcm_gmac.h | 198 ++---
 1 file changed, 184 insertions(+), 14 deletions(-)

diff --git a/include/hw/net/npcm_gmac.h b/include/hw/net/npcm_gmac.h
index e5729e83ea..c97eb6fe6e 100644
--- a/include/hw/net/npcm_gmac.h
+++ b/include/hw/net/npcm_gmac.h
@@ -34,13 +34,15 @@ struct NPCMGMACRxDesc {
 };
 
 /* NPCMGMACRxDesc.flags values */
-/* RDES2 and RDES3 are buffer address pointers */
-/* Owner: 0 = software, 1 = gmac */
-#define RX_DESC_RDES0_OWNER_MASK BIT(31)
+/* RDES2 and RDES3 are buffer addresses */
+/* Owner: 0 = software, 1 = dma */
+#define RX_DESC_RDES0_OWN BIT(31)
 /* Destination Address Filter Fail */
-#define RX_DESC_RDES0_DEST_ADDR_FILT_FAIL_MASK BIT(30)
-/* Frame length*/
-#define RX_DESC_RDES0_FRAME_LEN_MASK(word) extract32(word, 16, 29)
+#define RX_DESC_RDES0_DEST_ADDR_FILT_FAIL BIT(30)
+/* Frame length */
+#define RX_DESC_RDES0_FRAME_LEN_MASK(word) extract32(word, 16, 14)
+/* Frame length Shift*/
+#define RX_DESC_RDES0_FRAME_LEN_SHIFT 16
 /* Error Summary */
 #define RX_DESC_RDES0_ERR_SUMM_MASK BIT(15)
 /* Descriptor Error */
@@ -83,9 +85,9 @@ struct NPCMGMACRxDesc {
 /* Receive Buffer 2 Size */
 #define RX_DESC_RDES1_BFFR2_SZ_SHIFT 11
 #define RX_DESC_RDES1_BFFR2_SZ_MASK(word) extract32(word, \
-RX_DESC_RDES1_BFFR2_SZ_SHIFT, 10 + RX_DESC_RDES1_BFFR2_SZ_SHIFT)
+RX_DESC_RDES1_BFFR2_SZ_SHIFT, 11)
 /* Receive Buffer 1 Size */
-#define RX_DESC_RDES1_BFFR1_SZ_MASK(word) extract32(word, 0, 10)
+#define RX_DESC_RDES1_BFFR1_SZ_MASK(word) extract32(word, 0, 11)
 
 
 struct NPCMGMACTxDesc {
@@ -96,9 +98,9 @@ struct NPCMGMACTxDesc {
 };
 
 /* NPCMGMACTxDesc.flags values */
-/* TDES2 and TDES3 are buffer address pointers */
+/* TDES2 and TDES3 are buffer addresses */
 /* Owner: 0 = software, 1 = gmac */
-#define TX_DESC_TDES0_OWNER_MASK BIT(31)
+#define TX_DESC_TDES0_OWN BIT(31)
 /* Tx Time Stamp Status */
 #define TX_DESC_TDES0_TTSS_MASK BIT(17)
 /* IP Header Error */
@@ -122,7 +124,7 @@ struct NPCMGMACTxDesc {
 /* VLAN Frame */
 #define TX_DESC_TDES0_VLAN_FRM_MASK BIT(7)
 /* Collision Count */
-#define TX_DESC_TDES0_COLL_CNT_MASK(word) extract32(word, 3, 6)
+#define TX_DESC_TDES0_COLL_CNT_MASK(word) extract32(word, 3, 4)
 /* Excessive Deferral */
 #define TX_DESC_TDES0_EXCS_DEF_MASK BIT(2)
 /* Underflow Error */
@@ -137,7 +139,7 @@ struct NPCMGMACTxDesc {
 /* Last Segment */
 #define TX_DESC_TDES1_FIRST_SEG_MASK BIT(29)
 /* Checksum Insertion Control */
-#define TX_DESC_TDES1_CHKSM_INS_CTRL_MASK(word) extract32(word, 27, 28)
+#define TX_DESC_TDES1_CHKSM_INS_CTRL_MASK(word) extract32(word, 27, 2)
 /* Disable Cyclic Redundancy Check */
 #define TX_DESC_TDES1_DIS_CDC_MASK BIT(26)
 /* Transmit End of Ring */
@@ -145,9 +147,9 @@ struct NPCMGMACTxDesc {
 /* Secondary Address Chained */
 #define TX_DESC_TDES1_SEC_ADDR_CHND_MASK BIT(24)
 /* Transmit Buffer 2 Size */
-#define TX_DESC_TDES1_BFFR2_SZ_MASK(word) extract32(word, 11, 21)
+#define TX_DESC_TDES1_BFFR2_SZ_MASK(word) extract32(word, 11, 11)
 /* Transmit Buffer 1 Size */
-#define TX_DESC_TDES1_BFFR1_SZ_MASK(word) extract32(word, 0, 10)
+#define TX_DESC_TDES1_BFFR1_SZ_MASK(word) extract32(word, 0, 11)
 
 typedef struct NPCMGMACState {
 SysBusDevice parent;
@@ -165,4 +167,172 @@ typedef struct NPCMGMACState {
 #define TYPE_NPCM_GMAC "npcm-gmac"
 OBJECT_DECLARE_SIMPLE_TYPE(NPCMGMACState, NPCM_GMAC)
 
+/* Mask for RO bits in Status */
+#define NPCM_DMA_STATUS_RO_MASK(word) (word & 0xfffe)
+/* Mask for RO bits in Status */
+#define NPCM_DMA_STATUS_W1C_MASK(word) (word & 0x1e7ff)
+
+/* Transmit Process State */
+#define NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT 20
+/* Transmit States */
+#define NPCM_DMA_STATUS_TX_STOPPED_STATE \
+(0b000 << NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT)
+#define NPCM_DMA_STATUS_TX_RUNNING_FETCHING_STATE \
+(0b001 << NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT)
+#define NPCM_DMA_STATUS_TX_RUNNING_WAITING_STATE \
+(0b010 << NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT)
+#define NPCM_DMA_STATUS_TX_RUNNING_READ_STATE \
+(0b011 << NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT)
+#define NPCM_DMA_STATUS_TX_SUSPENDED_STATE \
+(0b110 << NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT)
+#define NPCM_DMA_STATUS_TX_RUNNING_CLOSING_STATE \
+(0b111 << NPCM_DMA_STATUS_TX_PROCESS_STATE_SHIFT)
+/* Transmit Process State */
+#define NPCM_DMA_STATUS_RX_PROCESS_STATE_SHIFT 17
+/* Receive States */
+#define NPCM_DMA_STATUS_RX_STOPPED_STATE \
+(0b000 << NPCM_DMA_STATUS_RX_PROCESS_STATE_SHIFT)
+#define

[PATCH v2 05/11] hw/arm: Add GMAC devices to NPCM7XX SoC

2023-09-20 Thread Nabih Estefan

From: Hao Wu 

Signed-off-by: Hao Wu 
---
 hw/arm/npcm7xx.c | 36 ++--
 include/hw/arm/npcm7xx.h |  2 ++
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index c9e87162cb..12e11250e1 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -91,6 +91,7 @@ enum NPCM7xxInterrupt {
 NPCM7XX_GMAC1_IRQ   = 14,
 NPCM7XX_EMC1RX_IRQ  = 15,
 NPCM7XX_EMC1TX_IRQ,
+NPCM7XX_GMAC2_IRQ,
 NPCM7XX_MMC_IRQ = 26,
 NPCM7XX_PSPI2_IRQ   = 28,
 NPCM7XX_PSPI1_IRQ   = 31,
@@ -234,6 +235,12 @@ static const hwaddr npcm7xx_pspi_addr[] = {
 0xf0201000,
 };
 
+/* Register base address for each GMAC Module */
+static const hwaddr npcm7xx_gmac_addr[] = {
+0xf0802000,
+0xf0804000,
+};
+
 static const struct {
 hwaddr regs_addr;
 uint32_t unconnected_pins;
@@ -462,6 +469,10 @@ static void npcm7xx_init(Object *obj)
 object_initialize_child(obj, "pspi[*]", >pspi[i], TYPE_NPCM_PSPI);
 }
 
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+object_initialize_child(obj, "gmac[*]", >gmac[i], TYPE_NPCM_GMAC);
+}
+
 object_initialize_child(obj, "pci-mbox", >pci_mbox,
 TYPE_NPCM7XX_PCI_MBOX);
 object_initialize_child(obj, "mmc", >mmc, TYPE_NPCM7XX_SDHCI);
@@ -695,6 +706,29 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 sysbus_connect_irq(sbd, 1, npcm7xx_irq(s, rx_irq));
 }
 
+/*
+ * GMAC Modules. Cannot fail.
+ */
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(npcm7xx_gmac_addr) != ARRAY_SIZE(s->gmac));
+QEMU_BUILD_BUG_ON(ARRAY_SIZE(s->gmac) != 2);
+for (i = 0; i < ARRAY_SIZE(s->gmac); i++) {
+SysBusDevice *sbd = SYS_BUS_DEVICE(>gmac[i]);
+
+/*
+ * The device exists regardless of whether it's connected to a QEMU
+ * netdev backend. So always instantiate it even if there is no
+ * backend.
+ */
+sysbus_realize(sbd, _abort);
+sysbus_mmio_map(sbd, 0, npcm7xx_gmac_addr[i]);
+int irq = i == 0 ? NPCM7XX_GMAC1_IRQ : NPCM7XX_GMAC2_IRQ;
+/*
+ * N.B. The values for the second argument sysbus_connect_irq are
+ * chosen to match the registration order in npcm7xx_emc_realize.
+ */
+sysbus_connect_irq(sbd, 0, npcm7xx_irq(s, irq));
+}
+
 /*
  * Flash Interface Unit (FIU). Can fail if incorrect number of chip selects
  * specified, but this is a programming error.
@@ -765,8 +799,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 create_unimplemented_device("npcm7xx.siox[2]",  0xf0102000,   4 * KiB);
 create_unimplemented_device("npcm7xx.ahbpci",   0xf040,   1 * MiB);
 create_unimplemented_device("npcm7xx.mcphy",0xf05f,  64 * KiB);
-create_unimplemented_device("npcm7xx.gmac1",0xf0802000,   8 * KiB);
-create_unimplemented_device("npcm7xx.gmac2",0xf0804000,   8 * KiB);
 create_unimplemented_device("npcm7xx.vcd",  0xf081,  64 * KiB);
 create_unimplemented_device("npcm7xx.ece",  0xf082,   8 * KiB);
 create_unimplemented_device("npcm7xx.vdma", 0xf0822000,   8 * KiB);
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index cec3792a2e..9e5cf639a2 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -30,6 +30,7 @@
 #include "hw/misc/npcm7xx_pwm.h"
 #include "hw/misc/npcm7xx_rng.h"
 #include "hw/net/npcm7xx_emc.h"
+#include "hw/net/npcm_gmac.h"
 #include "hw/nvram/npcm7xx_otp.h"
 #include "hw/timer/npcm7xx_timer.h"
 #include "hw/ssi/npcm7xx_fiu.h"
@@ -105,6 +106,7 @@ struct NPCM7xxState {
 OHCISysBusState ohci;
 NPCM7xxFIUState fiu[2];
 NPCM7xxEMCState emc[2];
+NPCMGMACState   gmac[2];
 NPCM7xxPCIMBoxState pci_mbox;
 NPCM7xxSDHCIState   mmc;
 NPCMPSPIState   pspi[2];
-- 
2.42.0.459.ge4e396fd5e-goog

[PATCH v2 06/11] \tests/qtest: Creating qtest for GMAC Module

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

 - Created qtest to check initialization of registers in GMAC Module.
 - Implemented test into Build File.

Signed-off-by: Nabih Estefan Diaz 
---
 tests/qtest/meson.build  |  11 +-
 tests/qtest/npcm_gmac-test.c | 209 +++
 2 files changed, 215 insertions(+), 5 deletions(-)
 create mode 100644 tests/qtest/npcm_gmac-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 849394515d..7c9622d8e1 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -191,6 +191,8 @@ qtests_npcm7xx = \
'npcm7xx_timer-test',
'npcm7xx_watchdog_timer-test'] + \
(slirp.found() ? ['npcm7xx_emc-test'] : [])
+qtests_npcm8xx = \
+  ['npcm_gmac-test']
 qtests_aspeed = \
   ['aspeed_hace-test',
'aspeed_smc-test',
@@ -205,9 +207,7 @@ qtests_arm = \
   (config_all_devices.has_key('CONFIG_ASPEED_SOC') ? qtests_aspeed : []) + \
   (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
   (config_all_devices.has_key('CONFIG_GENERIC_LOADER') ? ['hexloader-test'] : 
[]) + \
-  (config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
-  (config_all_devices.has_key('CONFIG_VEXPRESS') ? ['test-arm-mptimer'] : []) 
+ \
-  (config_all_devices.has_key('CONFIG_MICROBIT') ? ['microbit-test'] : []) + \
+  (config_all_devices.has_key('CONFIG_NPCM8XX') ? qtests_npcm8xx : []) + \
   ['arm-cpu-features',
'boot-serial-test']
 
@@ -219,8 +219,9 @@ qtests_aarch64 = \
   (config_all_devices.has_key('CONFIG_XLNX_ZYNQMP_ARM') ? ['xlnx-can-test', 
'fuzz-xlnx-dp-test'] : []) + \
   (config_all_devices.has_key('CONFIG_XLNX_VERSAL') ? ['xlnx-canfd-test'] : 
[]) + \
   (config_all_devices.has_key('CONFIG_RASPI') ? ['bcm2835-dma-test'] : []) +  \
-  (config_all.has_key('CONFIG_TCG') and
\
-   config_all_devices.has_key('CONFIG_TPM_TIS_I2C') ? ['tpm-tis-i2c-test'] : 
[]) + \
+  (config_all_devices.has_key('CONFIG_ASPEED_SOC') ? qtests_aspeed : []) + \
+  (config_all_devices.has_key('CONFIG_NPCM7XX') ? qtests_npcm7xx : []) + \
+  (config_all_devices.has_key('CONFIG_NPCM8XX') ? qtests_npcm8xx : []) + \
   ['arm-cpu-features',
'numa-test',
'boot-serial-test',
diff --git a/tests/qtest/npcm_gmac-test.c b/tests/qtest/npcm_gmac-test.c
new file mode 100644
index 00..30d27e8dcc
--- /dev/null
+++ b/tests/qtest/npcm_gmac-test.c
@@ -0,0 +1,209 @@
+/*
+ * QTests for Nuvoton NPCM7xx/8xx GMAC Modules.
+ *
+ * Copyright 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "libqos/libqos.h"
+
+/* Name of the GMAC Device */
+#define TYPE_NPCM_GMAC "npcm-gmac"
+
+typedef struct GMACModule {
+int irq;
+uint64_t base_addr;
+} GMACModule;
+
+typedef struct TestData {
+const GMACModule *module;
+} TestData;
+
+/* Values extracted from hw/arm/npcm8xx.c */
+static const GMACModule gmac_module_list[] = {
+{
+.irq= 14,
+.base_addr  = 0xf0802000
+},
+{
+.irq= 15,
+.base_addr  = 0xf0804000
+},
+{
+.irq= 16,
+.base_addr  = 0xf0806000
+},
+{
+.irq= 17,
+.base_addr  = 0xf0808000
+}
+};
+
+/* Returns the index of the GMAC module. */
+static int gmac_module_index(const GMACModule *mod)
+{
+ptrdiff_t diff = mod - gmac_module_list;
+
+g_assert_true(diff >= 0 && diff < ARRAY_SIZE(gmac_module_list));
+
+return diff;
+}
+
+/* 32-bit register indices. Taken from npcm_gmac.c */
+typedef enum NPCMRegister {
+/* DMA Registers */
+NPCM_DMA_BUS_MODE = 0x1000,
+NPCM_DMA_XMT_POLL_DEMAND = 0x1004,
+NPCM_DMA_RCV_POLL_DEMAND = 0x1008,
+NPCM_DMA_RCV_BASE_ADDR = 0x100c,
+NPCM_DMA_TX_BASE_ADDR = 0x1010,
+NPCM_DMA_STATUS = 0x1014,
+NPCM_DMA_CONTROL = 0x1018,
+NPCM_DMA_INTR_ENA = 0x101c,
+NPCM_DMA_MISSED_FRAME_CTR = 0x1020,
+NPCM_DMA_HOST_TX_DESC = 0x1048,
+NPCM_DMA_HOST_RX_DESC = 0x104c,
+NPCM_DMA_CUR_TX_BUF_ADDR = 0x1050,
+NPCM_DMA_CUR_RX_BUF_ADDR = 0x1054,
+NPCM_DMA_HW_FEATURE = 0x1058,
+
+/* GMAC Registers */
+NPCM_GMAC_MAC_CONFIG = 0x0,
+NPCM_GMAC_FRAME_FILTER = 0x4,
+NPCM_GMAC_HASH_HIGH = 0x8,
+NPCM_GMAC_HASH_LOW = 0xc,
+NPCM_GMAC_MII_ADDR = 0x10,
+NPCM_GMAC_MII_DATA = 0x14,
+NPCM_GMAC_FLOW_CTRL = 0x18,
+NPCM_GMAC_VLAN_FLAG = 0x1c,
+NPCM_GMAC_VERSION = 0x20,
+NPCM_GMAC_WAKEUP_FILTER = 0x28,
+NPCM_GMAC_PMT = 0x2c,
+

[PATCH v2 04/11] hw/net: Add NPCMXXX GMAC device

2023-09-20 Thread Nabih Estefan

From: Hao Wu 

This patch implements the basic registers of GMAC device. Actual network
communications are not supported yet.

Signed-off-by: Hao Wu 

include/hw: Fix type problem in NPCMGMACState

- Fix type problem in NPCMGMACState
- Fix Register Initalization which was breaking boot-up in driver
- Added trace for NPCM_GMAC reset
- Added nd_table to npcm8xx.c for GMAC bootup

Signed-off-by: Nabih Estefan Diaz 

hw/net: Add BCM54612E PHY regs for GMAC

This patch adds default values for PHYs to make the driver happy.
The device is derived from an actual Izumi machine.

Signed-off-by: Hao Wu 

hw/net: change GMAC PHY regs to indicate link is up

This change makes NPCM GMAC module to use BCM54612E unconditionally
and make some fake PHY registers such that the kernel driver thinks
the link partner is up.

Tested:
The following message shows up with the change:
Broadcom BCM54612E stmmac-0:00: attached PHY driver [Broadcom BCM54612E] 
(mii_bus:phy_addr=stmmac-0:00, irq=POLL)
stmmaceth f0802000.eth eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Signed-off-by: Hao Wu 
---
 hw/net/meson.build |   2 +-
 hw/net/npcm_gmac.c | 395 +
 hw/net/trace-events|  11 ++
 include/hw/net/npcm_gmac.h | 170 
 4 files changed, 577 insertions(+), 1 deletion(-)
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/net/npcm_gmac.h

diff --git a/hw/net/meson.build b/hw/net/meson.build
index 2632634df3..8389a134d5 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -38,7 +38,7 @@ system_ss.add(when: 'CONFIG_I82596_COMMON', if_true: 
files('i82596.c'))
 system_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c'))
 system_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c'))
 system_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c'))
-system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c'))
+system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c', 
'npcm_gmac.c'))
 
 system_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c'))
 system_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c'))
diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
new file mode 100644
index 00..5ce632858d
--- /dev/null
+++ b/hw/net/npcm_gmac.c
@@ -0,0 +1,395 @@
+/*
+ * Nuvoton NPCM7xx/8xx GMAC Module
+ *
+ * Copyright 2022 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * Unsupported/unimplemented features:
+ * - MII is not implemented, MII_ADDR.BUSY and MII_DATA always return zero
+ * - Precision timestamp (PTP) is not implemented.
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/registerfields.h"
+#include "hw/net/mii.h"
+#include "hw/net/npcm_gmac.h"
+#include "migration/vmstate.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "sysemu/dma.h"
+#include "trace.h"
+
+REG32(NPCM_DMA_BUS_MODE, 0x1000)
+REG32(NPCM_DMA_XMT_POLL_DEMAND, 0x1004)
+REG32(NPCM_DMA_RCV_POLL_DEMAND, 0x1008)
+REG32(NPCM_DMA_RCV_BASE_ADDR, 0x100c)
+REG32(NPCM_DMA_TX_BASE_ADDR, 0x1010)
+REG32(NPCM_DMA_STATUS, 0x1014)
+REG32(NPCM_DMA_CONTROL, 0x1018)
+REG32(NPCM_DMA_INTR_ENA, 0x101c)
+REG32(NPCM_DMA_MISSED_FRAME_CTR, 0x1020)
+REG32(NPCM_DMA_HOST_TX_DESC, 0x1048)
+REG32(NPCM_DMA_HOST_RX_DESC, 0x104c)
+REG32(NPCM_DMA_CUR_TX_BUF_ADDR, 0x1050)
+REG32(NPCM_DMA_CUR_RX_BUF_ADDR, 0x1054)
+REG32(NPCM_DMA_HW_FEATURE, 0x1058)
+
+REG32(NPCM_GMAC_MAC_CONFIG, 0x0)
+REG32(NPCM_GMAC_FRAME_FILTER, 0x4)
+REG32(NPCM_GMAC_HASH_HIGH, 0x8)
+REG32(NPCM_GMAC_HASH_LOW, 0xc)
+REG32(NPCM_GMAC_MII_ADDR, 0x10)
+REG32(NPCM_GMAC_MII_DATA, 0x14)
+REG32(NPCM_GMAC_FLOW_CTRL, 0x18)
+REG32(NPCM_GMAC_VLAN_FLAG, 0x1c)
+REG32(NPCM_GMAC_VERSION, 0x20)
+REG32(NPCM_GMAC_WAKEUP_FILTER, 0x28)
+REG32(NPCM_GMAC_PMT, 0x2c)
+REG32(NPCM_GMAC_LPI_CTRL, 0x30)
+REG32(NPCM_GMAC_TIMER_CTRL, 0x34)
+REG32(NPCM_GMAC_INT_STATUS, 0x38)
+REG32(NPCM_GMAC_INT_MASK, 0x3c)
+REG32(NPCM_GMAC_MAC0_ADDR_HI, 0x40)
+REG32(NPCM_GMAC_MAC0_ADDR_LO, 0x44)
+REG32(NPCM_GMAC_MAC1_ADDR_HI, 0x48)
+REG32(NPCM_GMAC_MAC1_ADDR_LO, 0x4c)
+REG32(NPCM_GMAC_MAC2_ADDR_HI, 0x50)
+REG32(NPCM_GMAC_MAC2_ADDR_LO, 0x54)
+REG32(NPCM_GMAC_MAC3_ADDR_HI, 0x58)
+REG32(NPCM_GMAC_MAC3_ADDR_LO, 0x5c)
+REG32(NPCM_GMAC_RGMII_STATUS, 0xd8)
+REG32(NPCM_GMAC_WATCHDOG, 0xdc)
+REG32(NPCM_GMAC_PTP_TCR, 0x700)
+REG32(NPCM_GMAC_PTP_SSIR, 0x704)
+REG32(NPCM_GMAC_PTP_STSR, 0x708)
+REG32(NPCM_GMAC_PTP_STNSR, 0x70c)
+REG32(NPCM_GMAC_PTP_STSUR, 0x710)
+REG32(NPCM_GMAC_PTP_STNSUR, 0x714)
+REG32(NPCM_GMAC_PTP_TAR, 0x718)
+REG32(NPCM_GMAC_PTP_TTSR,

[PATCH v2 01/11] hw/misc: Add Nuvoton's PCI Mailbox Module

2023-09-20 Thread Nabih Estefan

From: Hao Wu 

The PCI Mailbox Module is a high-bandwidth communcation module
between a Nuvoton BMC and CPU. It features 16KB RAM that are both
accessible by the BMC and core CPU. and supports interrupt for
both sides.

This patch implements the BMC side of the PCI mailbox module.
Communication with the core CPU is emulated via a chardev and
will be in a follow-up patch.

Signed-off-by: Hao Wu 
---
 hw/arm/npcm7xx.c   |  16 +-
 hw/misc/meson.build|   1 +
 hw/misc/npcm7xx_pci_mbox.c | 324 +
 hw/misc/trace-events   |   5 +
 include/hw/arm/npcm7xx.h   |   1 +
 include/hw/misc/npcm7xx_pci_mbox.h |  81 
 6 files changed, 427 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/npcm7xx_pci_mbox.c
 create mode 100644 include/hw/misc/npcm7xx_pci_mbox.h

diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index 15ff21d047..c69e936669 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -53,6 +53,9 @@
 /* ADC Module */
 #define NPCM7XX_ADC_BA  (0xf000c000)
 
+/* PCI Mailbox Module */
+#define NPCM7XX_PCI_MBOX_BA (0xf0848000)
+
 /* Internal AHB SRAM */
 #define NPCM7XX_RAM3_BA (0xc0008000)
 #define NPCM7XX_RAM3_SZ (4 * KiB)
@@ -83,6 +86,10 @@ enum NPCM7xxInterrupt {
 NPCM7XX_UART1_IRQ,
 NPCM7XX_UART2_IRQ,
 NPCM7XX_UART3_IRQ,
+NPCM7XX_PECI_IRQ= 6,
+NPCM7XX_PCI_MBOX_IRQ= 8,
+NPCM7XX_KCS_HIB_IRQ = 9,
+NPCM7XX_GMAC1_IRQ   = 14,
 NPCM7XX_EMC1RX_IRQ  = 15,
 NPCM7XX_EMC1TX_IRQ,
 NPCM7XX_MMC_IRQ = 26,
@@ -706,6 +713,14 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 }
 }
 
+/* PCI Mailbox. Cannot fail */
+sysbus_realize(SYS_BUS_DEVICE(>pci_mbox), _abort);
+sysbus_mmio_map(SYS_BUS_DEVICE(>pci_mbox), 0, NPCM7XX_PCI_MBOX_BA);
+sysbus_mmio_map(SYS_BUS_DEVICE(>pci_mbox), 1,
+NPCM7XX_PCI_MBOX_BA + NPCM7XX_PCI_MBOX_RAM_SIZE);
+sysbus_connect_irq(SYS_BUS_DEVICE(>pci_mbox), 0,
+   npcm7xx_irq(s, NPCM7XX_PCI_MBOX_IRQ));
+
 /* RAM2 (SRAM) */
 memory_region_init_ram(>sram, OBJECT(dev), "ram2",
NPCM7XX_RAM2_SZ, _abort);
@@ -765,7 +780,6 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp)
 create_unimplemented_device("npcm7xx.usbd[8]",  0xf0838000,   4 * KiB);
 create_unimplemented_device("npcm7xx.usbd[9]",  0xf0839000,   4 * KiB);
 create_unimplemented_device("npcm7xx.sd",   0xf084,   8 * KiB);
-create_unimplemented_device("npcm7xx.pcimbx",   0xf0848000, 512 * KiB);
 create_unimplemented_device("npcm7xx.aes",  0xf0858000,   4 * KiB);
 create_unimplemented_device("npcm7xx.des",  0xf0859000,   4 * KiB);
 create_unimplemented_device("npcm7xx.sha",  0xf085a000,   4 * KiB);
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 88ecab8392..c7858422f3 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -71,6 +71,7 @@ system_ss.add(when: 'CONFIG_NPCM7XX', if_true: files(
   'npcm7xx_clk.c',
   'npcm7xx_gcr.c',
   'npcm7xx_mft.c',
+  'npcm7xx_pci_mbox.c',
   'npcm7xx_pwm.c',
   'npcm7xx_rng.c',
 ))
diff --git a/hw/misc/npcm7xx_pci_mbox.c b/hw/misc/npcm7xx_pci_mbox.c
new file mode 100644
index 00..c770ad6fcf
--- /dev/null
+++ b/hw/misc/npcm7xx_pci_mbox.c
@@ -0,0 +1,324 @@
+/*
+ * Nuvoton NPCM7xx PCI Mailbox Module
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "chardev/char-fe.h"
+#include "hw/irq.h"
+#include "hw/qdev-clock.h"
+#include "hw/qdev-properties-system.h"
+#include "hw/misc/npcm7xx_pci_mbox.h"
+#include "hw/registerfields.h"
+#include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/visitor.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/module.h"
+#include "qemu/timer.h"
+#include "qemu/units.h"
+#include "trace.h"
+
+REG32(NPCM7XX_PCI_MBOX_BMBXSTAT, 0x00);
+REG32(NPCM7XX_PCI_MBOX_BMBXCTL, 0x04);
+REG32(NPCM7XX_PCI_MBOX_BMBXCMD, 0x08);
+
+enum NPCM7xxPCIMBoxOperation {
+NPCM7XX_PCI_MBOX_OP_READ = 1,
+NPCM7XX_PCI_MBOX_OP_WRITE,
+};
+
+#define NPCM7XX_PCI_MBOX_OFFSET_BYTES 8
+
+/* Response code */
+#define NPCM7XX_PCI_MBOX_OK 0
+#define NPCM7XX_PCI_MBOX_INVALID_OP 0xa0
+#define NPCM7XX_PCI_MBOX_INVALID_SIZE 0xa1
+#define NPCM7XX_PCI_MBOX_UNSPECIFIED_ERROR 0xff
+
+#define

[PATCH v2 07/11] include/hw/net: Implemented Classes and Masks for GMAC Descriptors

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

 - Implemeted classes for GMAC Receive and Transmit Descriptors
 - Implemented Masks for said descriptors

Signed-off-by: Nabih Estefan Diaz 
---
 hw/net/npcm_gmac.c   | 183 +++
 hw/net/trace-events  |   9 ++
 include/hw/net/npcm_gmac.h   |   2 -
 tests/qtest/npcm_gmac-test.c |   2 +-
 4 files changed, 150 insertions(+), 46 deletions(-)

diff --git a/hw/net/npcm_gmac.c b/hw/net/npcm_gmac.c
index 5ce632858d..6f8109e0ee 100644
--- a/hw/net/npcm_gmac.c
+++ b/hw/net/npcm_gmac.c
@@ -32,7 +32,7 @@
 REG32(NPCM_DMA_BUS_MODE, 0x1000)
 REG32(NPCM_DMA_XMT_POLL_DEMAND, 0x1004)
 REG32(NPCM_DMA_RCV_POLL_DEMAND, 0x1008)
-REG32(NPCM_DMA_RCV_BASE_ADDR, 0x100c)
+REG32(NPCM_DMA_RX_BASE_ADDR, 0x100c)
 REG32(NPCM_DMA_TX_BASE_ADDR, 0x1010)
 REG32(NPCM_DMA_STATUS, 0x1014)
 REG32(NPCM_DMA_CONTROL, 0x1018)
@@ -91,7 +91,8 @@ REG32(NPCM_GMAC_PTP_TTSR, 0x71c)
 #define NPCM_DMA_BUS_MODE_SWR   BIT(0)
 
 static const uint32_t npcm_gmac_cold_reset_values[NPCM_GMAC_NR_REGS] = {
-[R_NPCM_GMAC_VERSION] = 0x1037,
+/* Reduce version to 3.2 so that the kernel can enable interrupt. */
+[R_NPCM_GMAC_VERSION] = 0x1032,
 [R_NPCM_GMAC_TIMER_CTRL]  = 0x03e8,
 [R_NPCM_GMAC_MAC0_ADDR_HI]= 0x8000,
 [R_NPCM_GMAC_MAC0_ADDR_LO]= 0x,
@@ -125,12 +126,12 @@ static const uint16_t phy_reg_init[] = {
 [MII_EXTSTAT]   = 0x3000, /* 1000BASTE_T full-duplex capable */
 };
 
-static void npcm_gmac_soft_reset(NPCMGMACState *s)
+static void npcm_gmac_soft_reset(NPCMGMACState *gmac)
 {
-memcpy(s->regs, npcm_gmac_cold_reset_values,
+memcpy(gmac->regs, npcm_gmac_cold_reset_values,
NPCM_GMAC_NR_REGS * sizeof(uint32_t));
 /* Clear reset bits */
-s->regs[R_NPCM_DMA_BUS_MODE] &= ~NPCM_DMA_BUS_MODE_SWR;
+gmac->regs[R_NPCM_DMA_BUS_MODE] &= ~NPCM_DMA_BUS_MODE_SWR;
 }
 
 static void gmac_phy_set_link(NPCMGMACState *s, bool active)
@@ -148,11 +149,53 @@ static bool gmac_can_receive(NetClientState *nc)
 return true;
 }
 
-static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t 
len1)
+/*
+ * Function that updates the GMAC IRQ
+ * It find the logical OR of the enabled bits for NIS (if enabled)
+ * It find the logical OR of the enabled bits for AIS (if enabled)
+ */
+static void gmac_update_irq(NPCMGMACState *gmac)
 {
-return 0;
+/*
+ * Check if the normal interrupts summery is enabled
+ * if so, add the bits for the summary that are enabled
+ */
+if (gmac->regs[R_NPCM_DMA_INTR_ENA] & gmac->regs[R_NPCM_DMA_STATUS] &
+(NPCM_DMA_INTR_ENAB_NIE_BITS))
+{
+gmac->regs[R_NPCM_DMA_STATUS] |=  NPCM_DMA_STATUS_NIS;
+}
+/*
+ * Check if the abnormal interrupts summery is enabled
+ * if so, add the bits for the summary that are enabled
+ */
+if (gmac->regs[R_NPCM_DMA_INTR_ENA] & gmac->regs[R_NPCM_DMA_STATUS] &
+(NPCM_DMA_INTR_ENAB_AIE_BITS))
+{
+gmac->regs[R_NPCM_DMA_STATUS] |=  NPCM_DMA_STATUS_AIS;
+}
+
+/* Get the logical OR of both normal and abnormal interrupts */
+int level = !!((gmac->regs[R_NPCM_DMA_STATUS] &
+gmac->regs[R_NPCM_DMA_INTR_ENA] &
+NPCM_DMA_STATUS_NIS) |
+   (gmac->regs[R_NPCM_DMA_STATUS] &
+   gmac->regs[R_NPCM_DMA_INTR_ENA] &
+   NPCM_DMA_STATUS_AIS));
+
+/* Set the IRQ */
+trace_npcm_gmac_update_irq(DEVICE(gmac)->canonical_path,
+   gmac->regs[R_NPCM_DMA_STATUS],
+   gmac->regs[R_NPCM_DMA_INTR_ENA],
+   level);
+qemu_set_irq(gmac->irq, level);
 }
 
+static ssize_t gmac_receive(NetClientState *nc, const uint8_t *buf, size_t len)
+{
+/* Placeholder */
+return 0;
+}
 static void gmac_cleanup(NetClientState *nc)
 {
 /* Nothing to do yet. */
@@ -166,7 +209,7 @@ static void gmac_set_link(NetClientState *nc)
 gmac_phy_set_link(s, !nc->link_down);
 }
 
-static void npcm_gmac_mdio_access(NPCMGMACState *s, uint16_t v)
+static void npcm_gmac_mdio_access(NPCMGMACState *gmac, uint16_t v)
 {
 bool busy = v & NPCM_GMAC_MII_ADDR_BUSY;
 uint8_t is_write;
@@ -183,33 +226,38 @@ static void npcm_gmac_mdio_access(NPCMGMACState *s, 
uint16_t v)
 
 
 if (v & NPCM_GMAC_MII_ADDR_WRITE) {
-data = s->regs[R_NPCM_GMAC_MII_DATA];
+data = gmac->regs[R_NPCM_GMAC_MII_DATA];
 /* Clear reset bit for BMCR register */
 switch (gr) {
 case MII_BMCR:
 data &= ~MII_BMCR_RESET;
-/* Complete auto-negotiation immediately and set as complete */
-if (data & MII_BMCR_AUTOEN) {
+/* Autonegotiation is a W1C bit*/
+if (data & MII_BMCR_ANRESTART) {
 /* Tells autonegotiation to not restart again */
 data &=

[PATCH v2 03/11] hw/misc: Add qtest for NPCM7xx PCI Mailbox

2023-09-20 Thread Nabih Estefan

From: Hao Wu 

This patches adds a qtest for NPCM7XX PCI Mailbox module.
It sends read and write requests to the module, and verifies that
the module contains the correct data after the requests.

Signed-off-by: Hao Wu 
---
 tests/qtest/meson.build |   1 +
 tests/qtest/npcm7xx_pci_mbox-test.c | 238 
 2 files changed, 239 insertions(+)
 create mode 100644 tests/qtest/npcm7xx_pci_mbox-test.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 1fba07f4ed..849394515d 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -183,6 +183,7 @@ qtests_sparc64 = \
 qtests_npcm7xx = \
   ['npcm7xx_adc-test',
'npcm7xx_gpio-test',
+   'npcm7xx_pci_mbox-test',
'npcm7xx_pwm-test',
'npcm7xx_rng-test',
'npcm7xx_sdhci-test',
diff --git a/tests/qtest/npcm7xx_pci_mbox-test.c 
b/tests/qtest/npcm7xx_pci_mbox-test.c
new file mode 100644
index 00..24eec18e3c
--- /dev/null
+++ b/tests/qtest/npcm7xx_pci_mbox-test.c
@@ -0,0 +1,238 @@
+/*
+ * QTests for Nuvoton NPCM7xx PCI Mailbox Modules.
+ *
+ * Copyright 2021 Google LLC
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitops.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qnum.h"
+#include "libqtest-single.h"
+
+#define PCI_MBOX_BA 0xf0848000
+#define PCI_MBOX_IRQ8
+
+/* register offset */
+#define PCI_MBOX_STAT   0x00
+#define PCI_MBOX_CTL0x04
+#define PCI_MBOX_CMD0x08
+
+#define CODE_OK 0x00
+#define CODE_INVALID_OP 0xa0
+#define CODE_INVALID_SIZE   0xa1
+#define CODE_ERROR  0xff
+
+#define OP_READ 0x01
+#define OP_WRITE0x02
+#define OP_INVALID  0x41
+
+
+static int sock;
+static int fd;
+
+/*
+ * Create a local TCP socket with any port, then save off the port we got.
+ */
+static in_port_t open_socket(void)
+{
+struct sockaddr_in myaddr;
+socklen_t addrlen;
+
+myaddr.sin_family = AF_INET;
+myaddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+myaddr.sin_port = 0;
+sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
+g_assert(sock != -1);
+g_assert(bind(sock, (struct sockaddr *) , sizeof(myaddr)) != -1);
+addrlen = sizeof(myaddr);
+g_assert(getsockname(sock, (struct sockaddr *)  , ) != -1);
+g_assert(listen(sock, 1) != -1);
+return ntohs(myaddr.sin_port);
+}
+
+static void setup_fd(void)
+{
+fd_set readfds;
+
+FD_ZERO();
+FD_SET(sock, );
+g_assert(select(sock + 1, , NULL, NULL, NULL) == 1);
+
+fd = accept(sock, NULL, 0);
+g_assert(fd >= 0);
+}
+
+static uint8_t read_response(uint8_t *buf, size_t len)
+{
+uint8_t code;
+ssize_t ret = read(fd, , 1);
+
+if (ret == -1) {
+return CODE_ERROR;
+}
+if (code != CODE_OK) {
+return code;
+}
+g_test_message("response code: %x", code);
+if (len > 0) {
+ret = read(fd, buf, len);
+if (ret < len) {
+return CODE_ERROR;
+}
+}
+return CODE_OK;
+}
+
+static void receive_data(uint64_t offset, uint8_t *buf, size_t len)
+{
+uint8_t op = OP_READ;
+uint8_t code;
+ssize_t rv;
+
+while (len > 0) {
+uint8_t size;
+
+if (len >= 8) {
+size = 8;
+} else if (len >= 4) {
+size = 4;
+} else if (len >= 2) {
+size = 2;
+} else {
+size = 1;
+}
+
+g_test_message("receiving %u bytes", size);
+/* Write op */
+rv = write(fd, , 1);
+g_assert_cmpint(rv, ==, 1);
+/* Write offset */
+rv = write(fd, (uint8_t *), sizeof(uint64_t));
+g_assert_cmpint(rv, ==, sizeof(uint64_t));
+/* Write size */
+g_assert_cmpint(write(fd, , 1), ==, 1);
+
+/* Read data and Expect response */
+code = read_response(buf, size);
+g_assert_cmphex(code, ==, CODE_OK);
+
+buf += size;
+offset += size;
+len -= size;
+}
+}
+
+static void send_data(uint64_t offset, const uint8_t *buf, size_t len)
+{
+uint8_t op = OP_WRITE;
+uint8_t code;
+ssize_t rv;
+
+while (len > 0) {
+uint8_t size;
+
+if (len >= 8) {
+size = 8;
+} else if (len >= 4) {
+size = 4;
+} else if (len >= 2) {
+size = 2;
+} else {
+size = 1;
+}
+
+g_test_message("sending %u bytes", size);
+/* Write op */
+rv = write(fd, , 1);
+

[PATCH v2 02/11] hw/arm: Add PCI mailbox module to Nuvoton SoC

2023-09-20 Thread Nabih Estefan

From: Hao Wu 

This patch wires the PCI mailbox module to Nuvoton SoC.

Google-Rebase-Count: 5
Google-Bug-Id: 262938292
Signed-off-by: Hao Wu 
Change-Id: Ifd858a7ed760557faa15a7a1cef66b2056f06e2e
---
 docs/system/arm/nuvoton.rst | 2 ++
 hw/arm/npcm7xx.c| 3 ++-
 include/hw/arm/npcm7xx.h| 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst
index 0424cae4b0..e611099545 100644
--- a/docs/system/arm/nuvoton.rst
+++ b/docs/system/arm/nuvoton.rst
@@ -50,6 +50,8 @@ Supported devices
  * Ethernet controller (EMC)
  * Tachometer
  * Peripheral SPI controller (PSPI)
+ * BIOS POST code FIFO
+ * PCI Mailbox
 
 Missing devices
 ---
diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c
index c69e936669..c9e87162cb 100644
--- a/hw/arm/npcm7xx.c
+++ b/hw/arm/npcm7xx.c
@@ -86,7 +86,6 @@ enum NPCM7xxInterrupt {
 NPCM7XX_UART1_IRQ,
 NPCM7XX_UART2_IRQ,
 NPCM7XX_UART3_IRQ,
-NPCM7XX_PECI_IRQ= 6,
 NPCM7XX_PCI_MBOX_IRQ= 8,
 NPCM7XX_KCS_HIB_IRQ = 9,
 NPCM7XX_GMAC1_IRQ   = 14,
@@ -463,6 +462,8 @@ static void npcm7xx_init(Object *obj)
 object_initialize_child(obj, "pspi[*]", >pspi[i], TYPE_NPCM_PSPI);
 }
 
+object_initialize_child(obj, "pci-mbox", >pci_mbox,
+TYPE_NPCM7XX_PCI_MBOX);
 object_initialize_child(obj, "mmc", >mmc, TYPE_NPCM7XX_SDHCI);
 }
 
diff --git a/include/hw/arm/npcm7xx.h b/include/hw/arm/npcm7xx.h
index 273090ac60..cec3792a2e 100644
--- a/include/hw/arm/npcm7xx.h
+++ b/include/hw/arm/npcm7xx.h
@@ -105,6 +105,7 @@ struct NPCM7xxState {
 OHCISysBusState ohci;
 NPCM7xxFIUState fiu[2];
 NPCM7xxEMCState emc[2];
+NPCM7xxPCIMBoxState pci_mbox;
 NPCM7xxSDHCIState   mmc;
 NPCMPSPIState   pspi[2];
 };
-- 
2.42.0.459.ge4e396fd5e-goog

[PATCH v2 00/11] Implementation of NPI Mailbox and GMAC Networking Module

2023-09-20 Thread Nabih Estefan

From: Nabih Estefan Diaz 

[Changes since v1]
Fixed some errors in formatting.
Fixed a merge error that I didn't see in v1.
Removed Nuvoton 8xx references since that is a separate patch set.

[Original Cover]
Creates NPI Mailbox Module with data verification for read and write (internal 
and external),
wiring to the Nuvoton SoC, and QTests.

Also creates the GMAC Networking Module. Implements read and write 
functionalities with cooresponding descriptors
and registers. Also includes QTests for the different functionalities.

Hao Wu (5):
  hw/misc: Add Nuvoton's PCI Mailbox Module
  hw/arm: Add PCI mailbox module to Nuvoton SoC
  hw/misc: Add qtest for NPCM7xx PCI Mailbox
  hw/net: Add NPCMXXX GMAC device
  hw/arm: Add GMAC devices to NPCM7XX SoC

Nabih Estefan Diaz (6):
  \tests/qtest: Creating qtest for GMAC Module
  include/hw/net: Implemented Classes and Masks for GMAC Descriptors
  hw/net: General GMAC Implementation
  hw/net: GMAC Rx Implementation
  hw/net: GMAC Tx Implementation
  tests/qtest: Adding PCS Module test to GMAC Qtest

 docs/system/arm/nuvoton.rst |   2 +
 hw/arm/npcm7xx.c|  53 +-
 hw/misc/meson.build |   1 +
 hw/misc/npcm7xx_pci_mbox.c  | 324 ++
 hw/misc/trace-events|   5 +
 hw/net/meson.build  |   2 +-
 hw/net/npcm_gmac.c  | 939 
 hw/net/trace-events |  20 +
 include/hw/arm/npcm7xx.h|   4 +
 include/hw/misc/npcm7xx_pci_mbox.h  |  81 +++
 include/hw/net/npcm_gmac.h  | 340 ++
 tests/qtest/meson.build |  12 +-
 tests/qtest/npcm7xx_pci_mbox-test.c | 238 +++
 tests/qtest/npcm_gmac-test.c| 341 ++
 14 files changed, 2353 insertions(+), 9 deletions(-)
 create mode 100644 hw/misc/npcm7xx_pci_mbox.c
 create mode 100644 hw/net/npcm_gmac.c
 create mode 100644 include/hw/misc/npcm7xx_pci_mbox.h
 create mode 100644 include/hw/net/npcm_gmac.h
 create mode 100644 tests/qtest/npcm7xx_pci_mbox-test.c
 create mode 100644 tests/qtest/npcm_gmac-test.c

-- 
2.42.0.459.ge4e396fd5e-goog

Re: [PATCH v1 06/22] vfio/common: Add a vfio device iterator

2023-09-20 Thread Alex Williamson

On Wed, 30 Aug 2023 18:37:38 +0800
Zhenzhong Duan  wrote:

> With a vfio device iterator added, we can make some migration and reset
> related functions group agnostic.
> E.x:
> vfio_mig_active
> vfio_migratable_device_num
> vfio_devices_all_dirty_tracking
> vfio_devices_all_device_dirty_tracking
> vfio_devices_all_running_and_mig_active
> vfio_devices_dma_logging_stop
> vfio_devices_dma_logging_start
> vfio_devices_query_dirty_bitmap
> vfio_reset_handler
> 
> Or else we need to add container specific callback variants for above
> functions just because they iterate devices based on group.
> 
> Move the reset handler registration/unregistration to a place that is not
> group specific, saying first vfio address space created instead of the
> first group.
> 
> Signed-off-by: Zhenzhong Duan 
> ---
>  hw/vfio/common.c | 224 ++-
>  1 file changed, 122 insertions(+), 102 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 949ad6714a..51c6e7598e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -84,6 +84,26 @@ static int vfio_ram_block_discard_disable(VFIOContainer 
> *container, bool state)
>  }
>  }
>  
> +static VFIODevice *vfio_container_dev_iter_next(VFIOContainer *container,
> +VFIODevice *curr)
> +{
> +VFIOGroup *group;
> +
> +if (!curr) {
> +group = QLIST_FIRST(>group_list);
> +} else {
> +if (curr->next.le_next) {
> +return curr->next.le_next;
> +}


VFIODevice *device = QLIST_NEXT(curr, next);

if (device) {
return device;
}

> +group = curr->group->container_next.le_next;


group = QLIST_NEXT(curr->group, container_next);

> +}
> +
> +if (!group) {
> +return NULL;
> +}
> +return QLIST_FIRST(>device_list);
> +}
> +
>  /*
>   * Device state interfaces
>   */
> @@ -112,17 +132,22 @@ static int vfio_get_dirty_bitmap(VFIOContainer 
> *container, uint64_t iova,
>  
>  bool vfio_mig_active(void)
>  {
> -VFIOGroup *group;
> +VFIOAddressSpace *space;
> +VFIOContainer *container;
>  VFIODevice *vbasedev;
>  
> -if (QLIST_EMPTY(_group_list)) {
> +if (QLIST_EMPTY(_address_spaces)) {
>  return false;
>  }
>  
> -QLIST_FOREACH(group, _group_list, next) {
> -QLIST_FOREACH(vbasedev, >device_list, next) {
> -if (vbasedev->migration_blocker) {
> -return false;
> +QLIST_FOREACH(space, _address_spaces, list) {
> +QLIST_FOREACH(container, >containers, next) {
> +vbasedev = NULL;
> +while ((vbasedev = vfio_container_dev_iter_next(container,
> +vbasedev))) {
> +if (vbasedev->migration_blocker) {
> +return false;
> +}

Appears easy to avoid setting vbasedev in the loop iterator and
improving the scope of vbasedev:

VFIODevice *vbasedev = vfio_container_dev_iter_next(container, NULL);

while (vbasedev) {
if (vbasedev->migration_blocker) {
return false;
}

vbasedev = vfio_container_dev_iter_next(container, vbasedev);
}

>  }
>  }
>  }
> @@ -133,14 +158,19 @@ static Error *multiple_devices_migration_blocker;
>  
>  static unsigned int vfio_migratable_device_num(void)
>  {
> -VFIOGroup *group;
> +VFIOAddressSpace *space;
> +VFIOContainer *container;
>  VFIODevice *vbasedev;
>  unsigned int device_num = 0;
>  
> -QLIST_FOREACH(group, _group_list, next) {
> -QLIST_FOREACH(vbasedev, >device_list, next) {
> -if (vbasedev->migration) {
> -device_num++;
> +QLIST_FOREACH(space, _address_spaces, list) {
> +QLIST_FOREACH(container, >containers, next) {
> +vbasedev = NULL;
> +while ((vbasedev = vfio_container_dev_iter_next(container,
> +vbasedev))) {
> +if (vbasedev->migration) {
> +device_num++;
> +}

Same as above.

>  }
>  }
>  }
> @@ -207,8 +237,7 @@ static void vfio_set_migration_error(int err)
>  
>  static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
>  {
> -VFIOGroup *group;
> -VFIODevice *vbasedev;
> +VFIODevice *vbasedev = NULL;
>  MigrationState *ms = migrate_get_current();
>  
>  if (ms->state != MIGRATION_STATUS_ACTIVE &&
> @@ -216,19 +245,17 @@ static bool 
> vfio_devices_all_dirty_tracking(VFIOContainer *container)
>  return false;
>  }
>  
> -QLIST_FOREACH(group, >group_list, container_next) {
> -QLIST_FOREACH(vbasedev, >device_list, next) {
> -VFIOMigration *migration = vbasedev->migration;
> +while ((vbasedev = vfio_container_dev_iter_next(container, vbasedev))) {
> +VFIOMigration *migration = vbasedev->migration;

Similar, and all the

Re: [PATCH v1 05/22] vfio/common: Extract out vfio_kvm_device_[add/del]_fd

2023-09-20 Thread Alex Williamson

On Wed, 30 Aug 2023 18:37:37 +0800
Zhenzhong Duan  wrote:

> ...which will be used by both legacy and iommufd backend.

+1 to Eric's comments regarding complete sentences in the commit log
and suggested description.

> 
> Signed-off-by: Yi Liu 
> Signed-off-by: Zhenzhong Duan 
> ---
>  hw/vfio/common.c  | 44 +++
>  include/hw/vfio/vfio-common.h |  3 +++
>  2 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 67150e4575..949ad6714a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1759,17 +1759,17 @@ void vfio_reset_handler(void *opaque)
>  }
>  }
>  
> -static void vfio_kvm_device_add_group(VFIOGroup *group)
> +int vfio_kvm_device_add_fd(int fd)

Returning int vs void looks gratuitous, nothing uses the return value
in this series.

>  {
>  #ifdef CONFIG_KVM
>  struct kvm_device_attr attr = {
> -.group = KVM_DEV_VFIO_GROUP,
> -.attr = KVM_DEV_VFIO_GROUP_ADD,
> -.addr = (uint64_t)(unsigned long)>fd,
> +.group = KVM_DEV_VFIO_FILE,
> +.attr = KVM_DEV_VFIO_FILE_ADD,
> +.addr = (uint64_t)(unsigned long),
>  };
>  
>  if (!kvm_enabled()) {
> -return;
> +return 0;
>  }
>  
>  if (vfio_kvm_device_fd < 0) {
> @@ -1779,37 +1779,51 @@ static void vfio_kvm_device_add_group(VFIOGroup 
> *group)
>  
>  if (kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, )) {
>  error_report("Failed to create KVM VFIO device: %m");
> -return;
> +return -ENODEV;
>  }
>  
>  vfio_kvm_device_fd = cd.fd;
>  }
>  
>  if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, )) {
> -error_report("Failed to add group %d to KVM VFIO device: %m",
> - group->groupid);
> +error_report("Failed to add fd %d to KVM VFIO device: %m",
> + fd);

It's not nearly as useful to report an fd# in the error log vs the
group#.  Thanks,

Alex

> +return -errno;
>  }
>  #endif
> +return 0;
>  }
>  
> -static void vfio_kvm_device_del_group(VFIOGroup *group)
> +static void vfio_kvm_device_add_group(VFIOGroup *group)
> +{
> +vfio_kvm_device_add_fd(group->fd);
> +}
> +
> +int vfio_kvm_device_del_fd(int fd)
>  {
>  #ifdef CONFIG_KVM
>  struct kvm_device_attr attr = {
> -.group = KVM_DEV_VFIO_GROUP,
> -.attr = KVM_DEV_VFIO_GROUP_DEL,
> -.addr = (uint64_t)(unsigned long)>fd,
> +.group = KVM_DEV_VFIO_FILE,
> +.attr = KVM_DEV_VFIO_FILE_DEL,
> +.addr = (uint64_t)(unsigned long),
>  };
>  
>  if (vfio_kvm_device_fd < 0) {
> -return;
> +return -EINVAL;
>  }
>  
>  if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, )) {
> -error_report("Failed to remove group %d from KVM VFIO device: %m",
> - group->groupid);
> +error_report("Failed to remove fd %d from KVM VFIO device: %m",
> + fd);
> +return -EBADF;
>  }
>  #endif
> +return 0;
> +}
> +
> +static void vfio_kvm_device_del_group(VFIOGroup *group)
> +{
> +vfio_kvm_device_del_fd(group->fd);
>  }
>  
>  static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 5e376c436e..598c3ce079 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -220,6 +220,9 @@ struct vfio_device_info *vfio_get_device_info(int fd);
>  int vfio_get_device(VFIOGroup *group, const char *name,
>  VFIODevice *vbasedev, Error **errp);
>  
> +int vfio_kvm_device_add_fd(int fd);
> +int vfio_kvm_device_del_fd(int fd);
> +
>  extern const MemoryRegionOps vfio_region_ops;
>  typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>  extern VFIOGroupList vfio_group_list;

[PATCH 3/8] target/riscv/kvm/kvm-cpu.c: add missing property getters()

2023-09-20 Thread Daniel Henrique Barboza

We got along without property getters in the KVM driver because we never
needed them. But the incoming query-cpu-model-expansion API will use
property getters and setters to retrieve the CPU characteristics.

Add the missing getters for the KVM driver for both MISA and
multi-letter extension properties. We're also adding an special getter
for absent multi-letter properties that KVM doesn't implement that
always return false.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/kvm/kvm-cpu.c | 40 +++---
 1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index c6615cb807..b4c231f231 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -140,6 +140,19 @@ static KVMCPUConfig kvm_misa_ext_cfgs[] = {
 KVM_MISA_CFG(RVM, KVM_RISCV_ISA_EXT_M),
 };
 
+static void kvm_cpu_get_misa_ext_cfg(Object *obj, Visitor *v,
+ const char *name,
+ void *opaque, Error **errp)
+{
+KVMCPUConfig *misa_ext_cfg = opaque;
+target_ulong misa_bit = misa_ext_cfg->offset;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
+bool value = env->misa_ext_mask & misa_bit;
+
+visit_type_bool(v, name, , errp);
+}
+
 static void kvm_cpu_set_misa_ext_cfg(Object *obj, Visitor *v,
  const char *name,
  void *opaque, Error **errp)
@@ -244,6 +257,17 @@ static uint32_t kvm_cpu_cfg_get(RISCVCPU *cpu,
 return *ext_enabled;
 }
 
+static void kvm_cpu_get_multi_ext_cfg(Object *obj, Visitor *v,
+  const char *name,
+  void *opaque, Error **errp)
+{
+KVMCPUConfig *multi_ext_cfg = opaque;
+RISCVCPU *cpu = RISCV_CPU(obj);
+bool value = kvm_cpu_cfg_get(cpu, multi_ext_cfg);
+
+visit_type_bool(v, name, , errp);
+}
+
 static void kvm_cpu_set_multi_ext_cfg(Object *obj, Visitor *v,
   const char *name,
   void *opaque, Error **errp)
@@ -346,6 +370,15 @@ static void kvm_riscv_update_cpu_cfg_isa_ext(RISCVCPU 
*cpu, CPUState *cs)
 }
 }
 
+static void cpu_get_cfg_unavailable(Object *obj, Visitor *v,
+const char *name,
+void *opaque, Error **errp)
+{
+bool value = false;
+
+visit_type_bool(v, name, , errp);
+}
+
 static void cpu_set_cfg_unavailable(Object *obj, Visitor *v,
 const char *name,
 void *opaque, Error **errp)
@@ -376,7 +409,8 @@ static void riscv_cpu_add_kvm_unavail_prop(Object *obj, 
const char *prop_name)
  * to enable any of them.
  */
 object_property_add(obj, prop_name, "bool",
-NULL, cpu_set_cfg_unavailable,
+cpu_get_cfg_unavailable,
+cpu_set_cfg_unavailable,
 NULL, (void *)prop_name);
 }
 
@@ -406,7 +440,7 @@ static void kvm_riscv_add_cpu_user_properties(Object 
*cpu_obj)
 misa_cfg->description = riscv_get_misa_ext_description(bit);
 
 object_property_add(cpu_obj, misa_cfg->name, "bool",
-NULL,
+kvm_cpu_get_misa_ext_cfg,
 kvm_cpu_set_misa_ext_cfg,
 NULL, misa_cfg);
 object_property_set_description(cpu_obj, misa_cfg->name,
@@ -422,7 +456,7 @@ static void kvm_riscv_add_cpu_user_properties(Object 
*cpu_obj)
 KVMCPUConfig *multi_cfg = _multi_ext_cfgs[i];
 
 object_property_add(cpu_obj, multi_cfg->name, "bool",
-NULL,
+kvm_cpu_get_multi_ext_cfg,
 kvm_cpu_set_multi_ext_cfg,
 NULL, multi_cfg);
 }
-- 
2.41.0

[PATCH 1/8] target/riscv: add riscv_cpu_get_name()

2023-09-20 Thread Daniel Henrique Barboza

We'll introduce generic errors that will output a CPU type name via its
RISCVCPU pointer. Create a helper for that.

Use the helper in tcg_cpu_realizefn() instead of hardcoding the 'host'
CPU name.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 11 +++
 target/riscv/cpu.h |  1 +
 target/riscv/tcg/tcg-cpu.c |  4 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index eeeb08a35a..521bb88538 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -643,6 +643,17 @@ static ObjectClass *riscv_cpu_class_by_name(const char 
*cpu_model)
 return oc;
 }
 
+char *riscv_cpu_get_name(RISCVCPU *cpu)
+{
+RISCVCPUClass *rcc = RISCV_CPU_GET_CLASS(cpu);
+const char *typename = object_class_get_name(OBJECT_CLASS(rcc));
+
+g_assert(g_str_has_suffix(typename, RISCV_CPU_TYPE_SUFFIX));
+
+return g_strndup(typename,
+ strlen(typename) - strlen(RISCV_CPU_TYPE_SUFFIX));
+}
+
 static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int flags)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 219fe2e9b5..3f11e69223 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -730,6 +730,7 @@ typedef struct isa_ext_data {
 int ext_enable_offset;
 } RISCVIsaExtData;
 extern const RISCVIsaExtData isa_edata_arr[];
+char *riscv_cpu_get_name(RISCVCPU *cpu);
 
 void riscv_add_satp_mode_properties(Object *obj);
 
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 8c052d6fcd..f31aa9bcc4 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -563,7 +563,9 @@ static bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
 Error *local_err = NULL;
 
 if (object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
-error_setg(errp, "'host' CPU is not compatible with TCG acceleration");
+g_autofree char *name = riscv_cpu_get_name(cpu);
+error_setg(errp, "'%s' CPU is not compatible with TCG acceleration",
+   name);
 return false;
 }
 
-- 
2.41.0

[PATCH 5/8] target/riscv/tcg: add tcg_cpu_finalize_features()

2023-09-20 Thread Daniel Henrique Barboza

The query-cpu-model-expansion API is capable of passing extra properties
to a given CPU model and tell callers if this custom configuration is
valid.

The RISC-V version of the API is not quite there yet. The reason is the
realize() flow in the TCG driver, where most of the validation is done
in tcg_cpu_realizefn(). riscv_cpu_finalize_features() is then used to
validate satp_mode for both TCG and KVM CPUs.

Our ARM friends uses a concept of 'finalize_features()', a step done in
the end of realize() where the CPU features are validated. We have a
riscv_cpu_finalize_features() helper that, at this moment, is only
validating satp_mode.

Re-use this existing helper to do all CPU extension validation we
required after at the end of realize(). Make it public to allow APIs to
use it. At this moment only the TCG driver requires a realize() time
validation, thus, to avoid adding accelerator specific helpers in the
API, riscv_cpu_finalize_features() uses
riscv_tcg_cpu_finalize_features() if we are running TCG. The API will
then use riscv_cpu_finalize_features() regardless of the current
accelerator.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 18 +--
 target/riscv/cpu.h |  1 +
 target/riscv/tcg/tcg-cpu.c | 61 +-
 target/riscv/tcg/tcg-cpu.h |  1 +
 4 files changed, 51 insertions(+), 30 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 521bb88538..272baaf6c7 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -34,6 +34,7 @@
 #include "sysemu/kvm.h"
 #include "sysemu/tcg.h"
 #include "kvm/kvm_riscv.h"
+#include "tcg/tcg-cpu.h"
 #include "tcg/tcg.h"
 
 /* RISC-V CPU definitions */
@@ -996,11 +997,24 @@ static void riscv_cpu_satp_mode_finalize(RISCVCPU *cpu, 
Error **errp)
 }
 #endif
 
-static void riscv_cpu_finalize_features(RISCVCPU *cpu, Error **errp)
+void riscv_cpu_finalize_features(RISCVCPU *cpu, Error **errp)
 {
-#ifndef CONFIG_USER_ONLY
 Error *local_err = NULL;
 
+/*
+ * KVM accel does not have a specialized finalize()
+ * callback because its extensions are validated
+ * in the get()/set() callbacks of each property.
+ */
+if (tcg_enabled()) {
+riscv_tcg_cpu_finalize_features(cpu, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
+#ifndef CONFIG_USER_ONLY
 riscv_cpu_satp_mode_finalize(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3f11e69223..1bfa3da55b 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -732,6 +732,7 @@ typedef struct isa_ext_data {
 extern const RISCVIsaExtData isa_edata_arr[];
 char *riscv_cpu_get_name(RISCVCPU *cpu);
 
+void riscv_cpu_finalize_features(RISCVCPU *cpu, Error **errp);
 void riscv_add_satp_mode_properties(Object *obj);
 
 /* CSR function table */
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index a90ee63b06..52cd87db0c 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -549,6 +549,39 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 riscv_cpu_disable_priv_spec_isa_exts(cpu);
 }
 
+void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp)
+{
+CPURISCVState *env = >env;
+Error *local_err = NULL;
+
+riscv_cpu_validate_priv_spec(cpu, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
+riscv_cpu_validate_misa_priv(env, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
+if (cpu->cfg.epmp && !cpu->cfg.pmp) {
+/*
+ * Enhanced PMP should only be available
+ * on harts with PMP support
+ */
+error_setg(errp, "Invalid configuration: EPMP requires PMP support");
+return;
+}
+
+riscv_cpu_validate_set_extensions(cpu, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
 static bool riscv_cpu_is_generic(Object *cpu_obj)
 {
 return object_dynamic_cast(cpu_obj, TYPE_RISCV_DYNAMIC_CPU) != NULL;
@@ -564,7 +597,6 @@ static bool riscv_cpu_is_generic(Object *cpu_obj)
 static bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
 {
 RISCVCPU *cpu = RISCV_CPU(cs);
-CPURISCVState *env = >env;
 Error *local_err = NULL;
 
 if (object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
@@ -580,33 +612,6 @@ static bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
 return false;
 }
 
-riscv_cpu_validate_priv_spec(cpu, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return false;
-}
-
-riscv_cpu_validate_misa_priv(env, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return false;
-}
-
-if (cpu->cfg.epmp && !cpu->cfg.pmp) {
-

[PATCH 2/8] target/riscv/tcg-cpu.c: add extension properties for all cpus

2023-09-20 Thread Daniel Henrique Barboza

At this moment we do not expose extension properties for vendor CPUs
because that would allow users to change them via command line. But that
comes at a cost: if we were to add an API that shows all CPU properties,
e.g. qmp-query-cpu-model-expansion, we won't be able to show the extension
state of vendor CPUs.

We have the required machinery to create extension properties for vendor
CPUs while not allowing users to enable extensions. Disabling existing
extensions is allowed since it can be useful for debugging.

Change the set() callback cpu_set_multi_ext_cfg() to allow enabling
extensions only for generic CPUs. In cpu_add_multi_ext_prop() let's not
set the default values for the properties if we're not dealing with
generic CPUs, otherwise the values set in cpu_init() of vendor CPUs will
be overwritten. And finally, in tcg_cpu_instance_init(), add cpu user
properties for all CPUs.

For the veyron-v1 CPU, we're now able to disable existing extensions
like smstateen:

$ ./build/qemu-system-riscv64 --nographic -M virt \
-cpu veyron-v1,smstateen=false

But setting extensions that the CPU didn't set during cpu_init(), like
V, is not allowed:

$ ./build/qemu-system-riscv64 --nographic -M virt \
-cpu veyron-v1,v=true
qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.v=true:
  'veyron-v1' CPU does not allow enabling extensions

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/tcg/tcg-cpu.c | 64 +-
 1 file changed, 50 insertions(+), 14 deletions(-)

diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index f31aa9bcc4..a90ee63b06 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -549,6 +549,11 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 riscv_cpu_disable_priv_spec_isa_exts(cpu);
 }
 
+static bool riscv_cpu_is_generic(Object *cpu_obj)
+{
+return object_dynamic_cast(cpu_obj, TYPE_RISCV_DYNAMIC_CPU) != NULL;
+}
+
 /*
  * We'll get here via the following path:
  *
@@ -632,13 +637,27 @@ static void cpu_set_misa_ext_cfg(Object *obj, Visitor *v, 
const char *name,
 target_ulong misa_bit = misa_ext_cfg->misa_bit;
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
-bool value;
+bool generic_cpu = riscv_cpu_is_generic(obj);
+bool prev_val, value;
 
 if (!visit_type_bool(v, name, , errp)) {
 return;
 }
 
+prev_val = env->misa_ext & misa_bit;
+
+if (value == prev_val) {
+return;
+}
+
 if (value) {
+if (!generic_cpu) {
+g_autofree char *cpuname = riscv_cpu_get_name(cpu);
+error_setg(errp, "'%s' CPU does not allow enabling extensions",
+   cpuname);
+return;
+}
+
 env->misa_ext |= misa_bit;
 env->misa_ext_mask |= misa_bit;
 } else {
@@ -688,6 +707,7 @@ static const RISCVCPUMisaExtConfig misa_ext_cfgs[] = {
  */
 static void riscv_cpu_add_misa_properties(Object *cpu_obj)
 {
+bool use_def_vals = riscv_cpu_is_generic(cpu_obj);
 int i;
 
 for (i = 0; i < ARRAY_SIZE(misa_ext_cfgs); i++) {
@@ -706,7 +726,9 @@ static void riscv_cpu_add_misa_properties(Object *cpu_obj)
 cpu_set_misa_ext_cfg,
 NULL, (void *)misa_cfg);
 object_property_set_description(cpu_obj, name, desc);
-object_property_set_bool(cpu_obj, name, misa_cfg->enabled, NULL);
+if (use_def_vals) {
+object_property_set_bool(cpu_obj, name, misa_cfg->enabled, NULL);
+}
 }
 }
 
@@ -714,17 +736,32 @@ static void cpu_set_multi_ext_cfg(Object *obj, Visitor 
*v, const char *name,
   void *opaque, Error **errp)
 {
 const RISCVCPUMultiExtConfig *multi_ext_cfg = opaque;
-bool value;
+RISCVCPU *cpu = RISCV_CPU(obj);
+bool generic_cpu = riscv_cpu_is_generic(obj);
+bool prev_val, value;
 
 if (!visit_type_bool(v, name, , errp)) {
 return;
 }
 
-isa_ext_update_enabled(RISCV_CPU(obj), multi_ext_cfg->offset, value);
-
 g_hash_table_insert(multi_ext_user_opts,
 GUINT_TO_POINTER(multi_ext_cfg->offset),
 (gpointer)value);
+
+prev_val = isa_ext_is_enabled(cpu, multi_ext_cfg->offset);
+
+if (value == prev_val) {
+return;
+}
+
+if (value && !generic_cpu) {
+g_autofree char *cpuname = riscv_cpu_get_name(cpu);
+error_setg(errp, "'%s' CPU does not allow enabling extensions",
+   cpuname);
+return;
+}
+
+isa_ext_update_enabled(cpu, multi_ext_cfg->offset, value);
 }
 
 static void cpu_get_multi_ext_cfg(Object *obj, Visitor *v, const char *name,
@@ -739,11 +776,17 @@ static void cpu_get_multi_ext_cfg(Object *obj, Visitor 
*v, const char *name,
 static void cpu_add_multi_ext_prop(Object *cpu_obj,
const RISCVCPUMultiExtConfig *multi_cfg)
 {
+bool generic_cpu

[PATCH 7/8] target/riscv: add riscv_cpu_accelerator_compatible()

2023-09-20 Thread Daniel Henrique Barboza

Add an API to check if a given CPU is compatible with the current
accelerator.

This will allow query-cpu-model-expansion to work properly in conditions
where QEMU supports both accelerators (TCG and KVM), QEMU is then
launched using TCG, and the API requests information about a KVM only
CPU (e.g. 'host' CPU).

KVM doesn't have such restrictions and, at least in theory, all CPUs
models should work with KVM. We will revisit this API in case we decide
to restrict the amount of KVM CPUs we support.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 9 +
 target/riscv/cpu.h | 1 +
 target/riscv/tcg/tcg-cpu.c | 7 ++-
 target/riscv/tcg/tcg-cpu.h | 1 +
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 272baaf6c7..8bdf6dbd5d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1061,6 +1061,15 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 mcc->parent_realize(dev, errp);
 }
 
+bool riscv_cpu_accelerator_compatible(RISCVCPU *cpu)
+{
+if (tcg_enabled()) {
+return riscv_cpu_tcg_compatible(cpu);
+}
+
+return true;
+}
+
 #ifndef CONFIG_USER_ONLY
 static void cpu_riscv_get_satp(Object *obj, Visitor *v, const char *name,
void *opaque, Error **errp)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 1bfa3da55b..00b0507b17 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -734,6 +734,7 @@ char *riscv_cpu_get_name(RISCVCPU *cpu);
 
 void riscv_cpu_finalize_features(RISCVCPU *cpu, Error **errp);
 void riscv_add_satp_mode_properties(Object *obj);
+bool riscv_cpu_accelerator_compatible(RISCVCPU *cpu);
 
 /* CSR function table */
 extern riscv_csr_operations csr_ops[CSR_TABLE_SIZE];
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 52cd87db0c..071a744a43 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -582,6 +582,11 @@ void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error 
**errp)
 }
 }
 
+bool riscv_cpu_tcg_compatible(RISCVCPU *cpu)
+{
+return object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST) == NULL;
+}
+
 static bool riscv_cpu_is_generic(Object *cpu_obj)
 {
 return object_dynamic_cast(cpu_obj, TYPE_RISCV_DYNAMIC_CPU) != NULL;
@@ -599,7 +604,7 @@ static bool tcg_cpu_realizefn(CPUState *cs, Error **errp)
 RISCVCPU *cpu = RISCV_CPU(cs);
 Error *local_err = NULL;
 
-if (object_dynamic_cast(OBJECT(cpu), TYPE_RISCV_CPU_HOST)) {
+if (!riscv_cpu_tcg_compatible(cpu)) {
 g_autofree char *name = riscv_cpu_get_name(cpu);
 error_setg(errp, "'%s' CPU is not compatible with TCG acceleration",
name);
diff --git a/target/riscv/tcg/tcg-cpu.h b/target/riscv/tcg/tcg-cpu.h
index aa00fbc253..f7b32417f8 100644
--- a/target/riscv/tcg/tcg-cpu.h
+++ b/target/riscv/tcg/tcg-cpu.h
@@ -24,5 +24,6 @@
 
 void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp);
 void riscv_tcg_cpu_finalize_features(RISCVCPU *cpu, Error **errp);
+bool riscv_cpu_tcg_compatible(RISCVCPU *cpu);
 
 #endif
-- 
2.41.0

[PATCH 8/8] target/riscv/riscv-qmp-cmds.c: check CPU accel in query-cpu-model-expansion

2023-09-20 Thread Daniel Henrique Barboza

Use the recently added API to filter unavailable CPUs for a given
accelerator. At this moment this is the case for a QEMU built with KVM
and TCG support querying a binary running with TCG:

qemu-system-riscv64 -S -M virt,accel=tcg -display none
-qmp tcp:localhost:1234,server,wait=off

./qemu/scripts/qmp/qmp-shell localhost:1234

(QEMU) query-cpu-model-expansion type=full model={"name":"host"}
{"error": {"class": "GenericError", "desc": "'host' CPU not available with 
tcg"}}

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/riscv-qmp-cmds.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/target/riscv/riscv-qmp-cmds.c b/target/riscv/riscv-qmp-cmds.c
index 5b2d186c83..2f2dbae7c8 100644
--- a/target/riscv/riscv-qmp-cmds.c
+++ b/target/riscv/riscv-qmp-cmds.c
@@ -31,6 +31,8 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/visitor.h"
 #include "qom/qom-qobject.h"
+#include "sysemu/kvm.h"
+#include "sysemu/tcg.h"
 #include "cpu-qom.h"
 #include "cpu.h"
 
@@ -63,6 +65,17 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error 
**errp)
 return cpu_list;
 }
 
+static void riscv_check_if_cpu_available(RISCVCPU *cpu, Error **errp)
+{
+if (!riscv_cpu_accelerator_compatible(cpu)) {
+g_autofree char *name = riscv_cpu_get_name(cpu);
+const char *accel = kvm_enabled() ? "kvm" : "tcg";
+
+error_setg(errp, "'%s' CPU not available with %s", name, accel);
+return;
+}
+}
+
 static void riscv_obj_add_qdict_prop(Object *obj, QDict *qdict_out,
  const char *name)
 {
@@ -161,6 +174,13 @@ CpuModelExpansionInfo 
*qmp_query_cpu_model_expansion(CpuModelExpansionType type,
 
 obj = object_new(object_class_get_name(oc));
 
+riscv_check_if_cpu_available(RISCV_CPU(obj), _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+object_unref(obj);
+return NULL;
+}
+
 if (qdict_in) {
 riscv_cpuobj_validate_qdict_in(obj, model->props, qdict_in,
_err);
-- 
2.41.0

[PATCH 0/8] riscv: query-cpu-model-expansion API

2023-09-20 Thread Daniel Henrique Barboza

Based-on: 20230920112020.651006-1-dbarb...@ventanamicro.com
("[PATCH v3 00/19] riscv: split TCG/KVM accelerators from cpu.c")

Hi,

The parent series is in a more stable state so I decided to go ahead
and post this work.

This series implements query-cpu-model-expansion for RISC-V. The
implementation was based on the ARM version of the same API in
target/arm/arm-qmp-cmds.c.

A couple of changes were made in the first 3 patches. The most impactful
is in patch 2, where we're now exposing extension properties for vendor
CPUs. This was done to allow the API to retrieve the extension state for
those CPUs, which were otherwise hidden inside cpu->cfg. The result is
that users will have a little more power because we're now allowing
vendor CPU extensions to be disabled. Enabling extensions for those CPUs
is still forbidden. Patch 2 commit msg gives more details on what is now
possible to do.

The first 3 patches can be pushed/merged separately from the API since
they can stand on their own.

A small tweak in the extension validation in the TCG driver was also
needed. We're now centralizing all extension validation in
finalize_features(), and exporting finalize_features() to be usable by
the new API. This will allow us to validate model properties and report
back if the desired model is valid or not.

This series can be tested directly using this branch:

https://gitlab.com/danielhb/qemu/-/tree/qmp-cpu-expansion_v1


Here's an usage example. Launch QEMU with "-S" to be able to issue QMP
query commands before the machine starts:

$ ./build/qemu-system-riscv64 -S -M virt -display none -qmp  
tcp:localhost:1234,server,wait=off

Then use QMP to access the API:

$ ./scripts/qmp/qmp-shell localhost:1234
Welcome to the QMP low-level shell!
Connected to QEMU 8.1.50

(QEMU)  query-cpu-model-expansion type=full model={"name":"rv64"}
{"return": {"model": {"name": "rv64", "props": {"zicond": false, "x-zvfh": 
false, "mmu": true, "x-zvfbfwma": false, "x-zvfbfmin": false, "xtheadbs": 
false, "xtheadbb": false, "xtheadba": false, "xtheadmemidx": false, 
"smstateen": false, "zfinx": false, "Zve64f": false, "Zve32f": false, 
"x-zvfhmin": false, "xventanacondops": false, "xtheadcondmov": false, "svpbmt": 
false, "zbs": true, "zbc": true, "zbb": true, "zba": true, "zicboz": true, 
"xtheadmac": false, "Zfh": false, "Zfa": true, "zbkx": false, "zbkc": false, 
"zbkb": false, "Zve64d": false, "x-zfbfmin": false, "zk": false, "x-epmp": 
false, "xtheadmempair": false, "zkt": false, "zks": false, "zkr": false, "zkn": 
false, "Zfhmin": false, "zksh": false, "zknh": false, "zkne": false, "zknd": 
false, "zhinx": false, "Zicsr": true, "sscofpmf": false, "Zihintntl": true, 
"sstc": true, "xtheadcmo": false, "x-zvbb": false, "zksed": false, "x-zvkned": 
false, "xtheadsync": false, "x-zvkg": false, "zhinxmin": false, "svadu": true, 
"xtheadfmv": false, "x-zvksed": false, "svnapot": false, "pmp": true, 
"x-zvknhb": false, "x-zvknha": false, "xtheadfmemidx": false, "x-zvksh": false, 
"zdinx": false, "zicbom": true, "Zihintpause": true, "svinval": false, "zcf": 
false, "zce": false, "zcd": false, "zcb": false, "zca": false, "x-ssaia": 
false, "x-smaia": false, "zmmul": false, "x-zvbc": false, "Zifencei": true, 
"zcmt": false, "zcmp": false, "Zawrs": true



Daniel Henrique Barboza (8):
  target/riscv: add riscv_cpu_get_name()
  target/riscv/tcg-cpu.c: add extension properties for all cpus
  target/riscv/kvm/kvm-cpu.c: add missing property getters()
  qapi,risc-v: add query-cpu-model-expansion
  target/riscv/tcg: add tcg_cpu_finalize_features()
  target/riscv: handle custom props in qmp_query_cpu_model_expansion
  target/riscv: add riscv_cpu_accelerator_compatible()
  target/riscv/riscv-qmp-cmds.c: check CPU accel in
query-cpu-model-expansion

 qapi/machine-target.json  |   6 +-
 target/riscv/cpu.c|  38 +++-
 target/riscv/cpu.h|   3 +
 target/riscv/kvm/kvm-cpu.c|  40 -
 target/riscv/riscv-qmp-cmds.c | 160 ++
 target/riscv/tcg/tcg-cpu.c| 122 ++
 target/riscv/tcg/tcg-cpu.h|   2 +
 7 files changed, 327 insertions(+), 44 deletions(-)

-- 
2.41.0

[PATCH 6/8] target/riscv: handle custom props in qmp_query_cpu_model_expansion

2023-09-20 Thread Daniel Henrique Barboza

Callers can add 'props' when querying for a cpu model expansion to see
if a given CPU model supports a certain criteria, and what's the
resulting CPU object.

If we have 'props' to handle, gather it in a QDict and use the new
riscv_cpuobj_validate_qdict_in() helper to validate it. This helper will
add the custom properties in the CPU object and validate it using
riscv_cpu_finalize_features(). Users will be aware of validation errors
if any occur, if not a CPU object with 'props' will be returned.

Here's an example with the veyron-v1 vendor CPU. Disabling vendor CPU
extensions is allowed, assuming the final config is valid. Disabling
'smstateen' is a valid expansion:

(QEMU) query-cpu-model-expansion type=full 
model={"name":"veyron-v1","props":{"smstateen":false}}
{"return": {"model": {"name": "veyron-v1", "props": {"zicond": false, ..., 
"smstateen": false, ...}

But enabling extensions isn't allowed for vendor CPUs. E.g. enabling 'V'
for the veyron-v1 CPU isn't allowed:

(QEMU) query-cpu-model-expansion type=full 
model={"name":"veyron-v1","props":{"v":true}}
{"error": {"class": "GenericError", "desc": "'veyron-v1' CPU does not allow 
enabling extensions"}}

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/riscv-qmp-cmds.c | 65 +++
 1 file changed, 65 insertions(+)

diff --git a/target/riscv/riscv-qmp-cmds.c b/target/riscv/riscv-qmp-cmds.c
index 2170562e3a..5b2d186c83 100644
--- a/target/riscv/riscv-qmp-cmds.c
+++ b/target/riscv/riscv-qmp-cmds.c
@@ -27,6 +27,9 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-machine-target.h"
 #include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi/qobject-input-visitor.h"
+#include "qapi/visitor.h"
 #include "qom/qom-qobject.h"
 #include "cpu-qom.h"
 #include "cpu.h"
@@ -83,14 +86,58 @@ static void riscv_obj_add_multiext_props(Object *obj, QDict 
*qdict_out,
 }
 }
 
+static void riscv_cpuobj_validate_qdict_in(Object *obj, QObject *props,
+   const QDict *qdict_in,
+   Error **errp)
+{
+const QDictEntry *qe;
+Visitor *visitor;
+Error *local_err = NULL;
+
+visitor = qobject_input_visitor_new(props);
+if (!visit_start_struct(visitor, NULL, NULL, 0, _err)) {
+goto err;
+}
+
+for (qe = qdict_first(qdict_in); qe; qe = qdict_next(qdict_in, qe)) {
+object_property_find_err(obj, qe->key, _err);
+if (local_err) {
+goto err;
+}
+
+object_property_set(obj, qe->key, visitor, _err);
+if (local_err) {
+goto err;
+}
+}
+
+visit_check_struct(visitor, _err);
+if (local_err) {
+goto err;
+}
+
+riscv_cpu_finalize_features(RISCV_CPU(obj), _err);
+if (local_err) {
+goto err;
+}
+
+visit_end_struct(visitor, NULL);
+
+err:
+error_propagate(errp, local_err);
+visit_free(visitor);
+}
+
 CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
type,
  CpuModelInfo *model,
  Error **errp)
 {
 CpuModelExpansionInfo *expansion_info;
+const QDict *qdict_in = NULL;
 QDict *qdict_out;
 ObjectClass *oc;
 Object *obj;
+Error *local_err = NULL;
 
 if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
 error_setg(errp, "The requested expansion type is not supported");
@@ -104,8 +151,26 @@ CpuModelExpansionInfo 
*qmp_query_cpu_model_expansion(CpuModelExpansionType type,
 return NULL;
 }
 
+if (model->props) {
+qdict_in = qobject_to(QDict, model->props);
+if (!qdict_in) {
+error_setg(errp, QERR_INVALID_PARAMETER_TYPE, "props", "dict");
+return NULL;
+}
+}
+
 obj = object_new(object_class_get_name(oc));
 
+if (qdict_in) {
+riscv_cpuobj_validate_qdict_in(obj, model->props, qdict_in,
+   _err);
+if (local_err) {
+error_propagate(errp, local_err);
+object_unref(obj);
+return NULL;
+}
+}
+
 expansion_info = g_new0(CpuModelExpansionInfo, 1);
 expansion_info->model = g_malloc0(sizeof(*expansion_info->model));
 expansion_info->model->name = g_strdup(model->name);
-- 
2.41.0

[PATCH 4/8] qapi,risc-v: add query-cpu-model-expansion

2023-09-20 Thread Daniel Henrique Barboza

This API is used to inspect the characteristics of a given CPU model. It
also allows users to validate a CPU model with a certain configuration,
e.g. if "-cpu X,a=true,b=false" is a valid setup for a given QEMU
binary. We'll start implementing the first part. The second requires
more changes in RISC-V CPU boot flow.

The implementation is inspired by the existing ARM
query-cpu-model-expansion impl in target/arm/arm-qmp-cmds.c. We'll
create a RISCVCPU object with the required model, fetch its existing
properties, add a couple of relevant boolean options (pmp and mmu) and
display it to users.

Here's an usage example:

./build/qemu-system-riscv64 -S -M virt -display none \
  -qmp  tcp:localhost:1234,server,wait=off

./scripts/qmp/qmp-shell localhost:1234
Welcome to the QMP low-level shell!
Connected to QEMU 8.1.50

(QEMU)  query-cpu-model-expansion type=full model={"name":"rv64"}
{"return": {"model": {"name": "rv64", "props": {"zicond": false, "x-zvfh": 
false, "mmu": true, "x-zvfbfwma": false, "x-zvfbfmin": false, "xtheadbs": 
false, "xtheadbb": false, "xtheadba": false, "xtheadmemidx": false, 
"smstateen": false, "zfinx": false, "Zve64f": false, "Zve32f": false, 
"x-zvfhmin": false, "xventanacondops": false, "xtheadcondmov": false, "svpbmt": 
false, "zbs": true, "zbc": true, "zbb": true, "zba": true, "zicboz": true, 
"xtheadmac": false, "Zfh": false, "Zfa": true, "zbkx": false, "zbkc": false, 
"zbkb": false, "Zve64d": false, "x-zfbfmin": false, "zk": false, "x-epmp": 
false, "xtheadmempair": false, "zkt": false, "zks": false, "zkr": false, "zkn": 
false, "Zfhmin": false, "zksh": false, "zknh": false, "zkne": false, "zknd": 
false, "zhinx": false, "Zicsr": true, "sscofpmf": false, "Zihintntl": true, 
"sstc": true, "xtheadcmo": false, "x-zvbb": false, "zksed": false, "x-zvkned": 
false, "xtheadsync": false, "x-zvkg": false, "zhinxmin": false, "svadu": true, 
"xtheadfmv": false, "x-zvksed": false, "svnapot": false, "pmp": true, 
"x-zvknhb": false, "x-zvknha": false, "xtheadfmemidx": false, "x-zvksh": false, 
"zdinx": false, "zicbom": true, "Zihintpause": true, "svinval": false, "zcf": 
false, "zce": false, "zcd": false, "zcb": false, "zca": false, "x-ssaia": 
false, "x-smaia": false, "zmmul": false, "x-zvbc": false, "Zifencei": true, 
"zcmt": false, "zcmp": false, "Zawrs": true

Signed-off-by: Daniel Henrique Barboza 
---
 qapi/machine-target.json  |  6 ++-
 target/riscv/riscv-qmp-cmds.c | 75 +++
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index f0a6b72414..e5630e73aa 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -228,7 +228,8 @@
   'data': { 'model': 'CpuModelInfo' },
   'if': { 'any': [ 'TARGET_S390X',
'TARGET_I386',
-   'TARGET_ARM' ] } }
+   'TARGET_ARM',
+   'TARGET_RISCV' ] } }
 
 ##
 # @query-cpu-model-expansion:
@@ -273,7 +274,8 @@
   'returns': 'CpuModelExpansionInfo',
   'if': { 'any': [ 'TARGET_S390X',
'TARGET_I386',
-   'TARGET_ARM' ] } }
+   'TARGET_ARM',
+   'TARGET_RISCV' ] } }
 
 ##
 # @CpuDefinitionInfo:
diff --git a/target/riscv/riscv-qmp-cmds.c b/target/riscv/riscv-qmp-cmds.c
index 5ecff1afb3..2170562e3a 100644
--- a/target/riscv/riscv-qmp-cmds.c
+++ b/target/riscv/riscv-qmp-cmds.c
@@ -24,8 +24,12 @@
 
 #include "qemu/osdep.h"
 
+#include "qapi/error.h"
 #include "qapi/qapi-commands-machine-target.h"
+#include "qapi/qmp/qdict.h"
+#include "qom/qom-qobject.h"
 #include "cpu-qom.h"
+#include "cpu.h"
 
 static void riscv_cpu_add_definition(gpointer data, gpointer user_data)
 {
@@ -55,3 +59,74 @@ CpuDefinitionInfoList *qmp_query_cpu_definitions(Error 
**errp)
 
 return cpu_list;
 }
+
+static void riscv_obj_add_qdict_prop(Object *obj, QDict *qdict_out,
+ const char *name)
+{
+ObjectProperty *prop = object_property_find(obj, name);
+
+if (prop) {
+QObject *value;
+
+assert(prop->get);
+value = object_property_get_qobject(obj, name, _abort);
+
+qdict_put_obj(qdict_out, name, value);
+}
+}
+
+static void riscv_obj_add_multiext_props(Object *obj, QDict *qdict_out,
+ const RISCVCPUMultiExtConfig *arr)
+{
+for (int i = 0; arr[i].name != NULL; i++) {
+riscv_obj_add_qdict_prop(obj, qdict_out, arr[i].name);
+}
+}
+
+CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
type,
+ CpuModelInfo *model,
+ Error **errp)
+{
+CpuModelExpansionInfo *expansion_info;
+QDict *qdict_out;
+ObjectClass *oc;
+Object *obj;
+
+if (type != CPU_MODEL_EXPANSION_TYPE_FULL) {
+error_setg(errp, "The requested expansion type is not supported");
+return NULL;
+

Re: [Stable-8.1.1 11/34] softmmu: Assert data in bounds in iotlb_to_section

2023-09-20 Thread Alex Bennée



Michael Tokarev  writes:

> 09.09.2023 13:27, Michael Tokarev wrote:
>> From: Richard Henderson 
>> Acked-by: Alex Bennée 
>> Suggested-by: Alex Bennée 
>> Signed-off-by: Richard Henderson 
>> (cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc)
>> Signed-off-by: Michael Tokarev 
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index 3df73542e1..7597dc1c39 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>>   int asidx = cpu_asidx_from_attrs(cpu, attrs);
>>   CPUAddressSpace *cpuas = >cpu_ases[asidx];
>>   AddressSpaceDispatch *d = qatomic_rcu_read(>memory_dispatch);
>> -MemoryRegionSection *sections = d->map.sections;
>> +int section_index = index & ~TARGET_PAGE_MASK;
>> +MemoryRegionSection *ret;
>> +
>> +assert(section_index < d->map.sections_nb);
>
> This assert now triggers on staging-8.1
>
> https://ci.debian.net/data/autopkgtest/testing/amd64/d/dropbear/37993610/log.gz
> https://ci.debian.net/data/autopkgtest/testing/amd64/c/cryptsetup/37993606/log.gz

The asserts are basically there to detect when we attempt to do a MR
lookup and it is not fully committed. If they are firing its because
things have gone wrong (which we know because we still have patches in
flight for the full fix).

The main benefit of the assert is we fail early rather than later on in
various cryptic ways (evidenced by the fact we raised 3 different bugs
which exhibited slightly different symptoms that where all fundamentally
caused by getting bogus memory region data).

>
>> +ret = d->map.sections + section_index;
>> +assert(ret->mr);
>> +assert(ret->mr->ops);
>>   -return [index & ~TARGET_PAGE_MASK];
>> +return ret;
>>   }
>> static void io_mem_init(void)
>
> In this upload I removed softmmu-Use-async_run_on_cpu-in-tcg_commit.patch 
> (0d58c660689f6da1),
> and the test run uses tcg and -smp 4, which is the configuration which 
> 0d58c6606
> was supposed to fix.
>
> qemu-system-x86_64 -no-user-config -nodefaults -name 
> autopkgtest-cryptsetup-cryptroot-sysvinit \
>  -machine type=q35,graphics=off -cpu qemu64,-svm,-vmx -smp cpus=4 -m size=2G \
>  -vga none -display none -object rng
>
> I wonder if I should keep 0d58c6606 for 8.1.1 (the deadline is
> tomorrow)..

Unfortunately 0d58c is not the full fix, it papered over one crack but
revealed others. It might be leading to a false sense of security. So I
would argue:

  - keep the assert - better to fail early than to fail later in a hard
to understand way
  - toss a coin for the 0d58c66 fix, if we include it we may end up
reverting later once we have the "complete" fix but at least its
slightly better for x86 while definitely breaking MIPS

>
> Thanks,
>
> /mjt


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PULL v3 0/9] testing updates (back to green!)

2023-09-20 Thread Alex Bennée

The following changes since commit 4907644841e3200aea6475c0f72d3d987e9f3d93:

  Merge tag 'mem-2023-09-19' of https://github.com/davidhildenbrand/qemu into 
staging (2023-09-19 13:22:19 -0400)

are available in the Git repository at:

  https://gitlab.com/stsquad/qemu.git tags/pull-testing-200923-1

for you to fetch changes up to f959c3d87ccfa585b105de6964a6261e368cc1da:

  tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884 (2023-09-20 
15:06:33 +0100)


testing updates:

  - update most Debian to bookworm
  - fix some typos
  - update loongarch toolchain
  - fix microbit test
  - handle GitLab/Cirrus timeout discrepancy
  - improve avocado console handling
  - disable mips avocado images pending bugfix


Alex Bennée (2):
  tests: update most Debian images to Bookworm
  gitlab: fix typo/spelling in comments

Daniel P. Berrangé (4):
  microbit: add missing qtest_quit() call
  qtest: kill orphaned qtest QEMU processes on FreeBSD
  gitlab: make Cirrus CI timeout explicit
  gitlab: make Cirrus CI jobs gating

Nicholas Piggin (1):
  tests/avocado: Fix console data loss

Philippe Mathieu-Daudé (1):
  tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884

Richard Henderson (1):
  tests/docker: Update docker-loongarch-cross toolchain

 tests/qtest/libqtest.c|  7 +++
 tests/qtest/microbit-test.c   |  2 ++
 .gitlab-ci.d/base.yml |  2 +-
 .gitlab-ci.d/cirrus.yml   |  4 +++-
 .gitlab-ci.d/cirrus/build.yml |  2 ++
 python/qemu/machine/machine.py| 19 +++
 tests/avocado/avocado_qemu/__init__.py|  2 +-
 tests/avocado/boot_linux_console.py   |  7 +++
 tests/avocado/machine_mips_malta.py   |  6 ++
 tests/avocado/replay_kernel.py|  7 +++
 tests/avocado/tuxrun_baselines.py |  4 
 tests/docker/dockerfiles/debian-amd64-cross.docker|  9 +++--
 tests/docker/dockerfiles/debian-amd64.docker  |  9 +++--
 tests/docker/dockerfiles/debian-arm64-cross.docker|  9 +++--
 tests/docker/dockerfiles/debian-armhf-cross.docker|  9 +++--
 .../docker/dockerfiles/debian-loongarch-cross.docker  |  2 +-
 tests/docker/dockerfiles/debian-ppc64el-cross.docker  |  9 +++--
 tests/docker/dockerfiles/debian-s390x-cross.docker|  9 +++--
 tests/lcitool/libvirt-ci  |  2 +-
 tests/lcitool/refresh | 17 +
 20 files changed, 88 insertions(+), 49 deletions(-)

-- 
2.39.2

Re: stable-8.1.1: which bug do we keep?

2023-09-20 Thread Michael Tokarev


20.09.2023 12:17, Daniel P. Berrangé wrote:

On Wed, Sep 20, 2023 at 07:46:36AM +0300, Michael Tokarev wrote:

Hi!

I'm in somewhat doubt what to do with 8.1.1 release.

There are 2 compelling issues, fixing one discovers the other.

https://gitlab.com/qemu-project/qemu/-/issues/1864
"x86 VM with TCG and SMP fails to start on 8.1.0"
is fixed by 0d58c660689f "softmmu: Use async_run_on_cpu in tcg_commit"

But this brings up

https://gitlab.com/qemu-project/qemu/-/issues/1866
"mips/mip64 virtio broken on master (and 8.1.0 with tcg fix)"
(which is actually more than mips, as I've shown down the line,
https://gitlab.com/qemu-project/qemu/-/issues/1866#note_1558221926 )

...


In the cover letter for the 2nd proposed series Richard says

[quote]
I've done a tiny bit of performance testing between the two
solutions and it seems to be a wash.  So now it's simply a
matter of cleanliness.
[/quote]

Since the 2nd series is shown to still be broken in some cases
and 1st is thought to solve them all, IMHO it feels like we
should just press ahead with applying the the 1st series to
git master, and then stable.

If we still want a cleaner solution, it can be reverted/replaced
later once someone figures out an option that addresses all the
problems. We shouldn't leave such a big regression in TCG unfixed
for so long while we figure out a cleaner option.


Daniel, you have a very good point here.

I just collected the first version of Richard's fixes (with Phil's
changes and tags), added them to qemu debian package and pushed that
one out, - debian has much wider CI than qemu has, hopefully it will
clear things out.

Also I pushed them to staging-8.1 branch for qemu ci run.  This obviously
should not go to current stable-8.1 since these fixes aren't in master.

The only thing I regret is that his simple thing didn't occur to me
much earlier (and actually didn't occur to me at all).

Let's see..


To mee, it *feels* like 0d58c660689f should be there.
Note: the scheduled deadline for staging-8.1.1 is gone yesterday.
But this stuff seems to be important enough to delay 8.1.1 further.


On the one hand breaking x86 is a big deal because it is a mainstream
architecture, on the other hand people have real x86 hardware, so
using TCG emulation for x86 is less compelling. I agree we need to
fully address this in 8.1.1.


As it turns out, quite a lot of various CI stuff uses qemu in tcg
mode behind the scenes.


I guess the other unmentioned option is to revert whatever TCG changes
went into 8.1 that caused the regression in the first place. I've no
idea if that is at all practical though.


This does not seem to be practical.  I did find commit which broke (some)
things, but it isn't easy to revert it now.  IIRC anyway.

Thank you for the excellent hint!

/mjt

Re: [Stable-8.1.1 11/34] softmmu: Assert data in bounds in iotlb_to_section

2023-09-20 Thread Alex Bennée



Michael Tokarev  writes:

> 18.09.2023 12:19, Michael Tokarev wrote:
>> 09.09.2023 13:27, Michael Tokarev wrote:
>>> From: Richard Henderson 
>>>
>>> Acked-by: Alex Bennée 
>>> Suggested-by: Alex Bennée 
>>> Signed-off-by: Richard Henderson 
>>> (cherry picked from commit 86e4f93d827d3c1efd00cd8a906e38a2c0f2b5bc)
>>> Signed-off-by: Michael Tokarev 
>>>
>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>> index 3df73542e1..7597dc1c39 100644
>>> --- a/softmmu/physmem.c
>>> +++ b/softmmu/physmem.c
>>> @@ -2413,9 +2413,15 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>>>   int asidx = cpu_asidx_from_attrs(cpu, attrs);
>>>   CPUAddressSpace *cpuas = >cpu_ases[asidx];
>>>   AddressSpaceDispatch *d = qatomic_rcu_read(>memory_dispatch);
>>> -    MemoryRegionSection *sections = d->map.sections;
>>> +    int section_index = index & ~TARGET_PAGE_MASK;
>>> +    MemoryRegionSection *ret;
>>> +
>>> +    assert(section_index < d->map.sections_nb);
>> This assert now triggers on staging-8.1
>> https://ci.debian.net/data/autopkgtest/testing/amd64/d/dropbear/37993610/log.gz
>> https://ci.debian.net/data/autopkgtest/testing/amd64/c/cryptsetup/37993606/log.gz
>> 
>>> +    ret = d->map.sections + section_index;
>>> +    assert(ret->mr);
>>> +    assert(ret->mr->ops);
>>> -    return [index & ~TARGET_PAGE_MASK];
>>> +    return ret;
>>>   }
>>>   static void io_mem_init(void)
>> In this upload I removed
>> softmmu-Use-async_run_on_cpu-in-tcg_commit.patch (0d58c660689f6da1),
>> and the test run uses tcg and -smp 4, which is the configuration which 
>> 0d58c6606
>> was supposed to fix.
>
> So, should this change not be in 8.1.1 too (together with 0d58c6606),
> or is it just the "messenger"?

Sorry my previous reply was eaten by my MUA.

The main purpose of the asserts is to catch corruption to the Memory
Regions early so we don't see weird failures later on (c.f. the 3
separate bugs for crashes in slightly different places).

IOW is we are crashing on the asserts in this patch but it's booting
without it we are just getting lucky.

>
> Or both should go?
>
> Today is the deadline day for 8.1.1.
>
> Thanks!
>
> /mjt


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v4 00/14] simpletrace: refactor and general improvements

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:15AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> I wanted to use simpletrace.py for an internal project, so I tried to update
> and polish the code. Some of the commits resolve specific issues, while some
> are more subjective.

Hi Mads,
Apologies for the very late review. I'm happy with this series except
for the zip file and unused Formatter2 class. Please drop them and
resend.

Thanks,
Stefan

> 
> I've tried to divide it into commits so we can discuss the
> individual changes, and I'm ready to pull things out, if it isn't needed.
> 
> v4:
>  * Added missing Analyzer2 to __all__
>  * Rebased with master
> v3:
>  * Added __all__ with public interface
>  * Added comment about magic numbers and structs from Stefan Hajnoczi
>  * Reintroduced old interface for process, run and Analyzer
>  * Added comment about Python 3.6 in ref. to getfullargspec
>  * process now accepts events as file-like objects
>  * Updated context-manager code for Analyzer
>  * Moved logic of event processing to Analyzer class
>  * Moved logic of process into _process function
>  * Added new Analyzer2 class with kwarg event-processing
>  * Reverted changes to process-call in scripts/analyse-locks-simpletrace.py
> v2:
>  * Added myself as maintainer of simpletrace.py
>  * Improve docstring on `process`
>  * Changed call to `process` in scripts/analyse-locks-simpletrace.py to 
> reflect new argument types
>  * Replaced `iteritems()` with `items()` in 
> scripts/analyse-locks-simpletrace.py to support Python 3
> 
> Mads Ynddal (14):
>   simpletrace: add __all__ to define public interface
>   simpletrace: annotate magic constants from QEMU code
>   simpletrace: improve parsing of sys.argv; fix files never closed.
>   simpletrace: changed naming of edict and idtoname to improve
> readability
>   simpletrace: update code for Python 3.11
>   simpletrace: improved error handling on struct unpack
>   simpletrace: define exception and add handling
>   simpletrace: made Analyzer into context-manager
>   simpletrace: refactor to separate responsibilities
>   simpletrace: move logic of process into internal function
>   simpletrace: move event processing to Analyzer class
>   simpletrace: added simplified Analyzer2 class
>   MAINTAINERS: add maintainer of simpletrace.py
>   scripts/analyse-locks-simpletrace.py: changed iteritems() to items()
> 
>  MAINTAINERS  |   6 +
>  scripts/analyse-locks-simpletrace.py |   2 +-
>  scripts/simpletrace-benchmark.zip| Bin 0 -> 4809 bytes
>  scripts/simpletrace.py   | 362 ++-
>  4 files changed, 247 insertions(+), 123 deletions(-)
>  create mode 100644 scripts/simpletrace-benchmark.zip
> 
> -- 
> 2.38.1
> 


signature.asc
Description: PGP signature

Re: [PULL 00/22] implement discard operation for Parallels images

2023-09-20 Thread Denis V. Lunev


On 9/20/23 19:55, Stefan Hajnoczi wrote:

On Wed, 20 Sept 2023 at 05:22, Denis V. Lunev  wrote:

The following changes since commit 4907644841e3200aea6475c0f72d3d987e9f3d93:

   Merge tag 'mem-2023-09-19' of https://github.com/davidhildenbrand/qemu into 
staging (2023-09-19 13:22:19 -0400)

are available in the Git repository at:

   https://src.openvz.org/scm/~den/qemu.git tags/pull-parallels-2023-09-20

Hi Denis,
Please take a look at the following CI failure. I have dropped this
series for now.

clang -m64 -mcx16 -Ilibblock.fa.p -I. -I.. -Iqapi -Itrace -Iui
-Iui/shader -Iblock -I/usr/include/p11-kit-1 -I/usr/include/uuid
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include
-I/usr/include/sysprof-4 -flto -fcolor-diagnostics -Wall -Winvalid-pch
-Werror -std=gnu11 -O2 -g -fstack-protector-strong
-fsanitize=safe-stack -Wundef -Wwrite-strings -Wmissing-prototypes
-Wstrict-prototypes -Wredundant-decls -Wold-style-definition
-Wtype-limits -Wformat-security -Wformat-y2k -Winit-self
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels
-Wexpansion-to-defined -Wmissing-format-attribute
-Wno-initializer-overrides -Wno-missing-include-dirs
-Wno-shift-negative-value -Wno-string-plus-int
-Wno-typedef-redefinition -Wno-tautological-type-limit-compare
-Wno-psabi -Wno-gnu-variable-sized-type-not-at-end -Wthread-safety
-isystem /builds/qemu-project/qemu/linux-headers -isystem
linux-headers -iquote . -iquote /builds/qemu-project/qemu -iquote
/builds/qemu-project/qemu/include -iquote
/builds/qemu-project/qemu/host/include/x86_64 -iquote
/builds/qemu-project/qemu/host/include/generic -iquote
/builds/qemu-project/qemu/tcg/i386 -pthread -D_GNU_SOURCE
-D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv
-fsanitize=cfi-icall -fsanitize-cfi-icall-generalize-pointers
-fno-sanitize-trap=cfi-icall -fPIE -D_FILE_OFFSET_BITS=64
-D__USE_FILE_OFFSET64 -D__USE_LARGEFILE64 -DUSE_POSIX_ACLS=1 -MD -MQ
libblock.fa.p/block_parallels.c.o -MF
libblock.fa.p/block_parallels.c.o.d -o
libblock.fa.p/block_parallels.c.o -c ../block/parallels.c
../block/parallels.c:210:21: error: calling function
'bdrv_co_getlength' requires holding mutex 'graph_lock'
[-Werror,-Wthread-safety-analysis]
payload_bytes = bdrv_co_getlength(bs->file->bs);
^
../block/parallels.c:572:15: error: calling function
'bdrv_co_pdiscard' requires holding mutex 'graph_lock'
[-Werror,-Wthread-safety-analysis]
ret = bdrv_co_pdiscard(bs->file, host_off, s->cluster_size);
^
2 errors generated.

https://gitlab.com/qemu-project/qemu/-/jobs/5131277794

Stefan




It seems that GCC and CLANG environments are different
nowadays. I have had a smell of that but have not have
a proof. Will try to understand.

Den

Re: [PATCH v4 14/14] scripts/analyse-locks-simpletrace.py: changed iteritems() to items()

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:29AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> Python 3 removed `dict.iteritems()` in favor of `dict.items()`. This
> means the script curerntly doesn't work on Python 3.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/analyse-locks-simpletrace.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 13/14] MAINTAINERS: add maintainer of simpletrace.py

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:28AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> In my work to refactor simpletrace.py, I noticed that there's no
> maintainer of it, and has the status of "odd fixes". I'm using it from
> time to time, so I'd like to maintain the script.
> 
> I've added myself as reviewer under "Tracing" to be informed of changes
> that might affect simpletrace.py.

I treat simpletrace.py as part of the tracing subsystem but it is
missing from the MAINTAINERS file.

You're welcome to be the maintainer. I haven't touched it in years.

When you have reviewed future simpletrace.py patches you can either post
your Reviewed-by and I'll include them in my tracing pull requests, or
you could send pull requests to the qemu.git maintainer yourself
(requires publishing a GPG key and signing pull request tags).

Please let me know which option you prefer.

> Signed-off-by: Mads Ynddal 
> ---
>  MAINTAINERS | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6111b6b4d9..2ffb608dec 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3162,6 +3162,7 @@ F: stubs/
>  
>  Tracing
>  M: Stefan Hajnoczi 
> +R: Mads Ynddal 
>  S: Maintained
>  F: trace/
>  F: trace-events
> @@ -3174,6 +3175,11 @@ F: docs/tools/qemu-trace-stap.rst
>  F: docs/devel/tracing.rst
>  T: git https://github.com/stefanha/qemu.git tracing
>  
> +Simpletrace
> +M: Mads Ynddal 
> +S: Maintained
> +F: scripts/simpletrace.py
> +
>  TPM
>  M: Stefan Berger 
>  S: Maintained
> -- 
> 2.38.1
> 

signature.asc
Description: PGP signature

Re: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Alex Williamson

On Wed, 20 Sep 2023 15:12:59 -0300
Jason Gunthorpe  wrote:

> On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote:
> > On Wed, 20 Sep 2023 03:42:20 +
> > "Duan, Zhenzhong"  wrote:
> >   
> > > >-Original Message-
> > > >From: Cédric Le Goater 
> > > >Sent: Wednesday, September 20, 2023 1:08 AM
> > > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > > >
> > > >On 8/30/23 12:37, Zhenzhong Duan wrote:
> > > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > > >> iommufd support, enabled by default.
> > > >
> > > >Why would someone want to disable support at compile time ? It might
> > > 
> > > For those users who only want to support legacy container feature?
> > > Let me know if you still prefer to drop this patch, I'm fine with that.
> > >   
> > > >have been useful for dev but now QEMU should self-adjust at runtime
> > > >depending only on the host capabilities AFAIUI. Am I missing something ? 
> > > >   
> > > 
> > > IOMMUFD doesn't support all features of legacy container, so QEMU
> > > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > > We need to specify it explicitly to use IOMMUFD as below:
> > > 
> > > -object iommufd,id=iommufd0
> > > -device vfio-pci,host=:02:00.0,iommufd=iommufd0  
> > 
> > There's an important point here that maybe we've let slip for too long.
> > Laine had asked in an internal forum whether the switch to IOMMUFD was
> > visible to the guest.  I replied that it wasn't, but this note about
> > IOMMUFD vs container features jogged my memory that I think we still
> > lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO.  It
> > seemed like there was something else too, but I don't recall without
> > some research.  
> 
> I think p2p is the only guest visible one.
> 
> I still expect to solve it :\
> 
> > Ideally we'd have feature parity and libvirt could simply use the
> > native IOMMUFD interface whenever both the kernel and QEMU support it.
> > 
> > Without that parity, when does libvirt decide to use IOMMUFD?
> > 
> > How would libvirt know if some future IOMMUFD does have parity?  
> 
> At this point I think it is reasonable that iommufd is explicitly
> opted into.
> 
> The next step would be automatic for single PCI device VMs (p2p is not
> relavent)

And when a second PCI device is hot-plugged into the VM and it behaves
differently from a VM with multiple statically attached devices?  Seems
like it's an opt-in until full p2p support, then an opt-out for
potential bugs.  Thanks,

Alex

> The final step would be automatic if kernel supports P2P. I expect
> libvirt will be able to detect this from an open'd /dev/iommu.
> 
> Jason
>

Re: [PATCH v4 12/14] simpletrace: added simplified Analyzer2 class

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:27AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> By moving the dynamic argument construction to keyword-arguments,
> we can remove all of the specialized handling, and streamline it.
> If a tracing method wants to access these, they can define the
> kwargs, or ignore it be placing `**kwargs` at the end of the
> function's arguments list.
> 
> Added deprecation warning to Analyzer class to make users aware
> of the Analyzer2 class. No removal date is planned.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace-benchmark.zip | Bin 0 -> 4809 bytes
>  scripts/simpletrace.py|  78 +-
>  2 files changed, 76 insertions(+), 2 deletions(-)
>  create mode 100644 scripts/simpletrace-benchmark.zip
> 
> diff --git a/scripts/simpletrace-benchmark.zip 
> b/scripts/simpletrace-benchmark.zip
> new file mode 100644
> index 
> ..1d696a8d5c49d4725c938af8bf25a59090192986
> GIT binary patch
> literal 4809
> zcmb7|^-~m#x5jtLrD4ehq*FRXq`O;??nb% zJ9F>c@2}tInKREhbLN~s;G?6CjzIOwzjqB
> z^$ax7Cjek4BSY+t{;7Y{F8~h!#JB(g0RQ{|01nLF`z$IYKjr4yFu~`dpv7e;He
> znoe1lQxC!H%PeWl6BVnB{0xqmXh!%lpLBOyEU#Mm1ClW8JT~{VxYJb8CM6{$AFlmu
> zdnfrUU`%24s6j5ct15zP#`!eJS4@mmdb5(l-_iJ2O5`Ssnh(=8TReo*o0`A2sEbEk
> z4y{gjWjCG#nvsNBDVpcq{^lq9{qZ3N5 zBl?xrxNr{)Rpy(`maC5BqtM$>B?LF6geLN%GIA^dNZ zgRmFC3H%kwH2N+l>zc1R;YGX<6BV;Hb*v#H;roy4XLg|2OHKdoeU(#f)
> z+fNkiI(^ADca?ZrRVLGST3!|A6KSN#jUdH?J@C)#_9Uvbdkn7m2|xnWH_~g
> zpnH-0lf!p#sH0uhrgP_qPH138f`^h4?3gI6>I&`BMINt_>B$z=(?TE0%I< zBi241hZ7A$1l9B+hc`%woFewHUchLKqHhs)KpxDLgbM{)1T7M`_PMiGmzfn`JegAD
> z@{XWGXKO8CEvI6{m0;#Z$;ydM;z;<%JQ_E1N3}4=JgoX3WZVRfsq#~5C>g%Ra$uv8
> z6=PbEGNv(Hf|td|qb~ZHb<+ZAZcN_HO~_P0hAd6 z5bl}quu(t~IZa|GYC_^X-n>8D8s1g1j$H5?MEjltRq|>+y*U4k^&^o_I{2Euf9>F}
> zq2VY^=m@y6+;s}ii5B!I{@N0#L@p_}_;mRXHh0fM)|~caA80naEAKal-4{W-
> zK5KCCaK{*u8d7e(adQ2`GhW9{+r?RdAm*$0c(bYCpb`Pdrx7|LndKIVu}2661?O~z
> zbKb1$dFzpGaFx<>S2rb}v`AoIDJhkCvc@Tg4!^#RR*DQ_UzIndt36)ncVPng*ym6|
> zhFQ=DlEA-on#=Y2;>aDCi=Wql+4>0`>`ba{U2DOb&0PU~xmMil!(pV~zp6X|mE
> z23yqRo&_@Qh})4SjTwJnCeqW)4$;5czzy^EKII|h-WHD5bapSoH=e>UICf~Xw5gl$
> zVcB$0!fm%+UH|H;B$$=D#U*eNUXzsv?*Dv78vAJh*#;wC);_HIQtgtOA;p`gn
> zr;n51ZAEzvn`6-d$E4)#9@%f+A{fc2&{lvFZe2{TcO#q5b*$t+o7|JX>x=#*?
> zalnv|Qhjf}P##10S4^{g;^SUg=JL6vFmJ|ro~m#tQCC(nkCD<2NP1ENWESG2brS0%
> zA_7+7RMjACZ8ClN#$o>X^x2=U#PEP3ZVYJE#xi0{KFA7Ix1@_C9`
> z)I>axD%Ai+}uQTln`7fghU^Tb|ltxBwz5+-u$Sm`=|AQk$zq((J3MX
> z1{rnfRCYG4!zTLgj694Xn`?2%0U6cq1KAk~MB(y{CC#PPUo>53t
> z>BSYi-1R~0O3n(Y0!;hnAL53;18VdEMVFNp-gZag#QjJ!ITu9Qmdq8-@sx9OBY
> za2^D|#Z(AD-GtzkDF?-S=9x&;(n1smuQ*~j1Hr|sdDM`gB) z-iD)5AvXg-7Ttq`4Ywt#5(2uY3*#eIV1a-mu;0279*rs3`eXHl4aiu
> zY z3Ji*py=?SOv}=6KuAv+%W6HCEJA3D`%M%20vbTO@4FNaGO2(+4)h6?1
> zTTpVe74&@+)2#iT8&(SUYUr0ZR=22)Q`H@|vD7>uAc(>Z#{)3@{mE;6e1Np*M#X>}
> zr71XFf-gMnmLd#WD4p6&+)VH&2&)!hrAv_j>g4{RlQ)tXnC%aVR+bpYf6U
> z;^CGM?JAn!zNZqY9}1-?>3OY)*JT<;UmaiO9Qf`!-FMO@TH}sza^VNt=LZ)s0Pt-y
> zdtSW89R_?K@n0JBuIzqQ2~S0PR%#F_i@_z+X7M%DWY_OG?#*1&`jkSR3EWvH2
> zpwWsFDRjm=JTRhZ;WQvpc=D1eSW~#nH8337Wh(DF>*-*EQoj&6KvfuCfW7^ zpPP_yPc5pmPL7hyxae;)pbjGPW!t1k+(;HfZKrD#Rcb70=%qAiET6{936d!J)xF
> zKRl*?+3zd*NCr+*pe8kTORt@%z+ezPpy{`2{=D#)xMguww8ELw`fSTX4tRDb1(R
> zpjuC^;XRO6P8>0e^d{nO0#FVHE4BBk$>UDFfR2bFzR8J%^H1n3bR6#F?3{t z!XMh^;7O@wgI0`Z+b^HV{Q#S(2Bt0q^pTiK#=xQGt!?p!Dbs$z)?e058|w0JMI0Kf
> z+~8O^jnRw*GwKy5Kuvvl70~pnp zpbWylXrhLIH$-9To}{78isq96UXcx15xjPfTDQOi^v!`a-B-qt1RH*$Ra?984kb{#
> z14DDl-zxdQaxe-H#G%5$*lP6*$P+m~PtMreeRYe`cORR+^Z9R1){Bt`a6a3#Byc&7
> zWrEg@M;1ryWqJ?K;ve9X5Wma(!)qS7t`YYWgJikPFZeJQ>Cl_agJ-v$%1dI^nw=Sp
> zo!?EDKA6YnGi#*wdj-WvnEX@&?!r&0#6KnOS>+!KM)(q0VHM9%EzT>68`-F3C7Sv)
> zTx#acCs<4}@y4<&=y)Lb)OfsZrGcR@7M>(5I3~CdYOb%gi4
> zEjCs3_mX$I^{S()uY61t_$6*WN1V=jK+!|J!ILhW86D0~|jxb9v09_G3kr
> zv*HAc6SrRV8uu2EZ(Mw*%JsRi;)@2v-n|;R_@ULyu%zmG!L!{R4vlx
> z+^4Sf_E3y#z#n$ zrM;P_Yva7=t*@I(YpWrv7<^W5h(76+-||uKZGXg*hl61j(oSqL2WTbzv+3JRX;be+
> zjmwPL;J8* zvOXuQEBK) z4od@B)mDykT*ik^J}@zVp>z6}Y?hc#cLP@>lQ3NkSVBhM4a?G=`#wnE0c+}
> zJ%XN{ki8)FOC9S@yYt_bO3PAMDE58PwSE}*Wg_F6jK#rfhM#nLh8VJuCvduw|18En
> zFUV_}RNL`Tx54+ z)TSZ-dqBtGs2)7yS z30zD3tkz~|1!=nfJnrX|U3HClh`phr+)eW_kAr4UbLv8NzWm68Q*+*
> z8BOK3mz_O!Da2Dqe!yHIQ3jj8jsr`F+cCdygj(4gh23xdiM*7qaG#`i6;vZH&
> zv>TJtX6zYmyx-ZAFoaql5+3<>xB+v1RFtxhAZXrkdan4%#iu)bDZf0Z8TREp)?D2p
> zo35a9ch+z;1#io=*F2hh8Z4ALk$iJqqE=$Ghv%o_!DjXhqIxFp0i~GhfBe8+-j$8W
> z7#0(`^&3#KWl4dW2hUaZXI=@@yr1j86Sn?iE?loQt^fd$ks7$3HkIC*m@8BZrju2G
> zP>VdN6Ze+n#}2{+YTO^u#9$Y=MOcu}6{Y(2v>U<4A$;PA>!zm52^SB1gz
> zpim)JM^x~W%`=NM?HFIr;vvBX4D=7ZKDy-T18~~OQNg0#G+AfHKUBH3X1RR4
> z?p8tf*Fh)LQq!z08He*{jRTpySl@Am@Qx$cD;0Q()NdubkNygB{eCR;CBu`#Wi@7H
> zbvS3oo3C0#H(EG~RrtHj7DhjwkXmN{_VM7@MG{mol-n{PMMdi*c|Q8ZluwS8
> z{VD
> z=bU@Tyf|5y8Gp^7zdI>vPfk>Uw)004Z>V6pEB~;Ne40kfuTbods|=B2z~>%+maaBe
> zpg}#vzL;)H#$3FMXPbmy1LM_Lb-i||iqCXx@ft;xy3=8K z_r0BaTtqJ9^|sd-LwYC%3|ZJgKZv#W( zZcMU|m%}XaJj7^Trl>W$I3~{(*u<=<9B269;mP$kt>4j6HK+vob6KV@3|2 zbXoBN??T^9dN;QfM%pQY#C{ao)=wk-+CqP=yrDNvZkmM6b$h?{#Z0_lDJje^_g9!(
>

Re: [PATCH v4 12/14] simpletrace: added simplified Analyzer2 class

2023-09-20 Thread Stefan Hajnoczi

On Wed, Sep 06, 2023 at 09:57:32AM +0100, Daniel P. Berrangé wrote:
> On Wed, Sep 06, 2023 at 10:32:14AM +0200, Mads Ynddal wrote:
> > 
> > > AFAIK, we don't consider simpletrace.py python code to be a
> > > supported public API, just a command line tool.
> > > 
> > > IOW, we can change the python code at will, as long as the
> > > command line doesn't alter its behaviour. Thus I don't see
> > > a need to add new classes, just change the existing ones.
> > > 
> > > 
> > > With regards,
> > > Daniel
> > > -- 
> > > |: https://berrange.com  -o-
> > > https://www.flickr.com/photos/dberrange :|
> > > |: https://libvirt.org -o-
> > > https://fstop138.berrange.com :|
> > > |: https://entangle-photo.org-o-
> > > https://www.instagram.com/dberrange :|
> > > 
> > 
> > This was based on feedback from Stefan in v2. I don't have strong opinions
> > about the one or the other. But now that the work has already been done,
> > I'd think the easiest would be to follow-through on the two versions.
> > 
> > https://lore.kernel.org/qemu-devel/20230504180326.GB581920@fedora/
> 
> I don't really agree with that feedback. We never document simpletrace.py
> as being a public facing Python API, only its usage as a CLI tool. It is
> also never installed into any Python library path, nor packaged as a PyPI
> module AFAIK.
> 
> If someone is none the less importing simpletrace.py from their app
> then they should copy the file wholesale into that code and take the
> responsibility for their usage.
> 
> I don't think QEMU should take on the burden of maintaining API
> compatibility for something we have never presented as a public API.

I wrote about the simpletrace API here in 2011:
http://blog.vmsplice.net/2011/03/how-to-write-trace-analysis-scripts-for.html

It was intended as an API and I don't think we should break people's
scripts unless there is a strong reason.

I don't know how many users there are, but it feels wrong to break
existing scripts without a strong reason to do so.

Stefan

> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
> 


signature.asc
Description: PGP signature

Re: [PATCH RFC] vfio/common: Add an option to relax vIOMMU migration blocker

2023-09-20 Thread Joao Martins

On 08/09/2023 13:05, Joao Martins wrote:
> Add an option 'x-migration-iommu-pt' to VFIO that allows it to relax
> whether the vIOMMU usage blocks the migration. The current behaviour
> is kept and we block migration in the following conditions:
> 
> * By default if the guest does try to use vIOMMU migration is blocked
> when migration is attempted, just like having the migration blocker in
> the first place [Current behaviour]
> 
> * Migration starts with no vIOMMU mappings, but guest kexec's itself
> with IOMMU on ('iommu=on intel_iommu=on') and ends up using the vIOMMU.
> here we cancel the migration with an error message [Added behaviour]
> 
> This is meant to be used for older VMs (5.10) cases where we can relax
> the usage and that IOMMU is passed for the sole need of interrupt
> remapping while the guest is old enough to not check for DMA translation
> services while probe its IOMMU devices[0]. The option is useful for
> managed VMs where you *steer* some of the guest behaviour and you know
> you won't use it for more than interrupt remapping.
> 
> [0] 
> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.mart...@oracle.com/
> 
> Default is 'disabled' for this option given the second bullet point
> above depends on guest behaviour (thus undeterministic). But let the
> user enable it if it can tolerate migration failures.
> 
> Signed-off-by: Joao Martins 
> ---
> Followup from discussion here:
> https://lore.kernel.org/qemu-devel/d5d30f58-31f0-1103-6956-377de34a7...@redhat.com/
> 
> This is a smaller (and simpler) take than [0], but is likely the only
> option thinking in old guests, or managed guests that only want to use
> vIOMMU for interrupt remapping. The work in [0] has stronger 'migration
> will work' guarantees (of course except for the usual no convergence 
> or network failuresi that are agnostic to vIOMMU), and a bit better in
> limiting what guest can do. But it also depends in slightly recent
> guests. I think both are useful.
> 
> About the patch itself:
> 
> * cancelling migration was done via vfio_migration_set_error() but
> I can always use migrate_cancel() if migration is active, or add
> a migration blocker when it's not active.
> 
Are folks in against/favor the idea presented here before I go and make this
small improvement?

It is the only way I can think of for old guests using vIOMMU (for intremap
case). At the same time, it is still blocking/interrupting migration with vIOMMU
except that it's only really blocked of migration when it actually tries to
setup a mapping. Hence why I was thinking to enable it by default, but
optionally on (as is) is great too. The naming could probably be better, but
couldn't figure a better name

> ---
>  include/hw/vfio/vfio-common.h |  2 ++
>  hw/vfio/common.c  | 66 +++
>  hw/vfio/migration.c   |  7 +++-
>  hw/vfio/pci.c |  2 ++
>  4 files changed, 76 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index e9b895459534..95ef386af45f 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -140,6 +140,7 @@ typedef struct VFIODevice {
>  bool no_mmap;
>  bool ram_block_discard_allowed;
>  OnOffAuto enable_migration;
> +bool iommu_passthrough;
>  VFIODeviceOps *ops;
>  unsigned int num_irqs;
>  unsigned int num_regions;
> @@ -227,6 +228,7 @@ extern VFIOGroupList vfio_group_list;
>  bool vfio_mig_active(void);
>  int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error 
> **errp);
>  void vfio_unblock_multiple_devices_migration(void);
> +bool vfio_devices_all_iommu_passthrough(void);
>  bool vfio_viommu_preset(VFIODevice *vbasedev);
>  int64_t vfio_mig_bytes_transferred(void);
>  void vfio_reset_bytes_transferred(void);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 134649226d43..4adf9fec08f1 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -433,6 +433,22 @@ void vfio_unblock_multiple_devices_migration(void)
>  multiple_devices_migration_blocker = NULL;
>  }
>  
> +bool vfio_devices_all_iommu_passthrough(void)
> +{
> +VFIODevice *vbasedev;
> +VFIOGroup *group;
> +
> +QLIST_FOREACH(group, _group_list, next) {
> +QLIST_FOREACH(vbasedev, >device_list, next) {
> +if (!vbasedev->iommu_passthrough) {
> +return false;
> +}
> +}
> +}
> +
> +return true;
> +}
> +
>  bool vfio_viommu_preset(VFIODevice *vbasedev)
>  {
>  return vbasedev->group->container->space->as != _space_memory;
> @@ -1194,6 +1210,18 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  goto fail;
>  }
>  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
> +
> +/*
> + * Any attempts to use make vIOMMU mappings will fail the live 
> migration
> + */
> +if (vfio_devices_all_iommu_passthrough())

Re: [PATCH v4 11/14] simpletrace: move event processing to Analyzer class

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:26AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> Moved event processing to the Analyzer class to separate specific analyzer
> logic (like caching and function signatures) from the _process function.
> This allows for new types of Analyzer-based subclasses without changing
> the core code.
> 
> Note, that the fn_cache is important for performance in cases where the
> analyzer is branching away from the catch-all a lot. The cache has no
> measurable performance penalty.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 60 +-
>  1 file changed, 36 insertions(+), 24 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 10/14] simpletrace: move logic of process into internal function

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:25AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> To avoid duplicate code depending on input types and to better handle
> open/close of log with a context-manager, we move the logic of process into
> _process.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 09/14] simpletrace: refactor to separate responsibilities

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:24AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> Moved event_mapping and event_id_to_name down one level in the function
> call-stack to keep variable instantiation and usage closer (`process`
> and `run` has no use of the variables; `read_trace_records` does).
> 
> Instead of passing event_mapping and event_id_to_name to the bottom of
> the call-stack, we move their use to `read_trace_records`. This
> separates responsibility and ownership of the information.
> 
> `read_record` now just reads the arguments from the file-object by
> knowning the total number of bytes. Parsing it to specific arguments is
> moved up to `read_trace_records`.
> 
> Special handling of dropped events removed, as they can be handled
> by the general code.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 115 +++--
>  1 file changed, 53 insertions(+), 62 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 08/14] simpletrace: made Analyzer into context-manager

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:23AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> Instead of explicitly calling `begin` and `end`, we can change the class
> to use the context-manager paradigm. This is mostly a styling choice,
> used in modern Python code. But it also allows for more advanced analyzers
> to handle exceptions gracefully in the `__exit__` method (not
> demonstrated here).
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 31 ---
>  1 file changed, 20 insertions(+), 11 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-20 Thread Eric Blake

On Wed, Sep 20, 2023 at 08:31:49PM +0200, Markus Armbruster wrote:
...
> The only reliable way to prevent unintended variable name capture is
> -Wshadow.
> 
> One blocker for enabling it is shadowing hiding in function-like
> macros like
> 
>  qdict_put(dict, "name", qobject_ref(...))
> 
> qdict_put() wraps its last argument in QOBJECT(), and the last
> argument here contains another QOBJECT().
> 
> Use dark preprocessor sorcery to make the macros that give us this
> problem use different variable names on every call.
> 
> Signed-off-by: Markus Armbruster 
> Reviewed-by: Eric Blake 

It's changed (for the better) since v1, so I'm re-reviewing.

> ---
>  include/qapi/qmp/qobject.h | 11 +--
>  include/qemu/atomic.h  | 17 -
>  include/qemu/compiler.h|  3 +++
>  include/qemu/osdep.h   | 31 +++
>  4 files changed, 47 insertions(+), 15 deletions(-)
> 
> diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
> index 9003b71fd3..d36cc97805 100644
> --- a/include/qapi/qmp/qobject.h
> +++ b/include/qapi/qmp/qobject.h
> @@ -45,10 +45,17 @@ struct QObject {
>  struct QObjectBase_ base;
>  };
>  
> -#define QOBJECT(obj) ({ \
> +/*
> + * Preprocessory sorcery ahead: use a different identifier for the

s/Preprocessory/Preprocessor/ (multiple times in the patch)

> + * local variable in each expansion, so we can nest macro calls
> + * without shadowing variables.
> + */
> +#define QOBJECT_INTERNAL(obj, _obj) ({  \
>  typeof(obj) _obj = (obj);   \
> -_obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
> +_obj\
> +? container_of(&(_obj)->base, QObject, base) : NULL;\

As pointed out before, you can write &_obj->base instead of
&(_obj)->base, now that we know _obj is a single identifier rather
than an arbitrary expression.  Not strictly necessary since the extra
() doesn't change semantics...

>  })
> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), MAKE_IDENTFIER(_obj))
>  
>  /* Required for qobject_to() */
>  #define QTYPE_CAST_TO_QNull QTYPE_QNULL
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index d95612f7a0..d4cbd01909 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -157,13 +157,20 @@
>  smp_read_barrier_depends();
>  #endif
>  
> -#define qatomic_rcu_read(ptr)  \
> -({ \
> +/*
> + * Preprocessory sorcery ahead: use a different identifier for the
> + * local variable in each expansion, so we can nest macro calls
> + * without shadowing variables.
> + */
> +#define qatomic_rcu_read_internal(ptr, _val)\
> +({  \
>  qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
> -typeof_strip_qual(*ptr) _val;  \
> -qatomic_rcu_read__nocheck(ptr, &_val); \
> -_val;  \
> +typeof_strip_qual(*ptr) _val;   \
> +qatomic_rcu_read__nocheck(ptr, &_val);  \

...but it looks odd for the patch to not be consistent on that front.

> +_val;   \
>  })
> +#define qatomic_rcu_read(ptr) \
> +qatomic_rcu_read_internal((ptr), MAKE_IDENTFIER(_val))
>  
>  #define qatomic_rcu_set(ptr, i) do {   \
>  qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
> diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
> index a309f90c76..03236d830c 100644
> --- a/include/qemu/compiler.h
> +++ b/include/qemu/compiler.h
> @@ -37,6 +37,9 @@
>  #define tostring(s) #s
>  #endif
>  
> +/* Expands into an identifier stemN, where N is another number each time */
> +#define MAKE_IDENTFIER(stem) glue(stem, __COUNTER__)

I like how this turned out.

With the spelling fix, and optionally with the redundant () dropped,
you can keep my R-b.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH v4 07/14] simpletrace: define exception and add handling

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:22AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> Define `SimpleException` to differentiate our exceptions from generic
> exceptions (IOError, etc.). Adapted simpletrace to support this and
> output to stderr.
> 
> Signed-off-by: Mads Ynddal 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  scripts/simpletrace.py | 22 ++
>  1 file changed, 14 insertions(+), 8 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PULL v2 0/8] Hppa btlb patches

2023-09-20 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/57] loongarch-to-apply queue

2023-09-20 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2 12/12] vfio: Remove 64-bit IOVA address space assumption

2023-09-20 Thread Alex Williamson

On Wed, 13 Sep 2023 10:01:47 +0200
Eric Auger  wrote:

> Now we retrieve the usable IOVA ranges from the host,
> we now the physical IOMMU aperture and we can remove
> the assumption of 64b IOVA space when calling
> vfio_host_win_add().
> 
> This works fine in general but in case of an IOMMU memory
> region this becomes more tricky. For instance the virtio-iommu
> MR has a 64b aperture by default. If the physical IOMMU has a
> smaller aperture (typically the case for VTD), this means we
> would need to resize the IOMMU MR when this latter is linked
> to a container. However this happens on vfio_listener_region_add()
> when calling the IOMMU MR set_iova_ranges() callback and this
> would mean we would have a recursive call the
> vfio_listener_region_add(). This looks like a wrong usage of
> the memory API causing duplicate IOMMU MR notifier registration
> for instance.
> 
> Until we find a better solution, make sure the vfio_find_hostwin()
> is not called anymore for IOMMU region.

Thanks for your encouragement to double check this, it does seem like
there are some gaps in the host window support.  First I guess I don't
understand why the last chunk here assumes a contiguous range.
Shouldn't we call vfio_host_win_add() for each IOVA range?

But then we have a problem that we don't necessarily get positive
feedback from memory_region_iommu_set_iova_ranges().  Did the vIOMMU
accept the ranges or not?  Only one vIOMMU implements the callback.
Should we only call memory_region_iommu_set_iova_ranges() if the range
doesn't align to a host window and should the wrapper return -ENOTSUP
if there is no vIOMMU support to poke holes in the range?  Thanks,

Alex

 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> I have not found any working solution to the IOMMU MR resizing.
> So I can remove this patch or remove the check for IOMMU MR. Maybe
> this is an issue which can be handled separately?
> ---
>  hw/vfio/common.c | 25 -
>  1 file changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 26da38de05..40cac1ca91 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1112,13 +1112,6 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  #endif
>  }
>  
> -hostwin = vfio_find_hostwin(container, iova, end);
> -if (!hostwin) {
> -error_setg(, "Container %p can't map guest IOVA region"
> -   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, 
> end);
> -goto fail;
> -}
> -
>  memory_region_ref(section->mr);
>  
>  if (memory_region_is_iommu(section->mr)) {
> @@ -1177,6 +1170,14 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  return;
>  }
>  
> +hostwin = vfio_find_hostwin(container, iova, end);
> +if (!hostwin) {
> +error_setg(, "Container %p can't map guest IOVA region"
> +   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, 
> end);
> +goto fail;
> +}
> +
> +
>  /* Here we assume that memory_region_is_ram(section->mr)==true */
>  
>  /*
> @@ -2594,12 +2595,10 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  vfio_get_iommu_info_migration(container, info);
>  g_free(info);
>  
> -/*
> - * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> - * information to get the actual window extent rather than assume
> - * a 64-bit IOVA address space.
> - */
> -vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> +g_assert(container->nr_iovas);
> +vfio_host_win_add(container, 0,
> +  container->iova_ranges[container->nr_iovas - 
> 1].end,
> +  container->pgsizes);
>  
>  break;
>  }

Re: [PATCH v4 06/14] simpletrace: improved error handling on struct unpack

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:21AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> A failed call to `read_header` wouldn't be handled the same for the two
> different code paths (one path would try to use `None` as a list).
> Changed to raise exception to be handled centrally. This also allows for
> easier unpacking, as errors has been filtered out.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 41 -
>  1 file changed, 16 insertions(+), 25 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 04/14] simpletrace: changed naming of edict and idtoname to improve readability

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:19AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> Readability is subjective, but I've expanded the naming of the variables
> and arguments, to help with understanding for new eyes on the code.
> 
> Signed-off-by: Mads Ynddal 
> Reviewed-by: Philippe Mathieu-Daudé 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  scripts/simpletrace.py | 34 +-
>  1 file changed, 17 insertions(+), 17 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 28/28] bsd-user: Implement pdfork(2) system call.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:40 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Acked-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 32 
>  bsd-user/freebsd/os-syscall.c |  4 
>  2 files changed, 36 insertions(+)
>

Reviewed-by: Warner Losh 

I chased down pdfork recently for other reasons, and I'm pretty sure this
is good.

Warner

Re: [PATCH v2 27/28] bsd-user: Implement rfork(2) system call.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 39 +++
>  bsd-user/freebsd/os-syscall.c |  4 
>  2 files changed, 43 insertions(+)
>

Reviewed-by: Warner Losh 


> diff --git a/bsd-user/freebsd/os-proc.h b/bsd-user/freebsd/os-proc.h
> index 14478d4bb5..a406ef7db8 100644
> --- a/bsd-user/freebsd/os-proc.h
> +++ b/bsd-user/freebsd/os-proc.h
> @@ -212,4 +212,43 @@ static inline abi_long do_freebsd_vfork(void *cpu_env)
>  return do_freebsd_fork(cpu_env);
>  }
>
> +/* rfork(2) */
> +static inline abi_long do_freebsd_rfork(void *cpu_env, abi_long flags)
> +{
> +abi_long ret;
> +abi_ulong child_flag;
> +
> +/*
> + * XXX We need to handle RFMEM here, as well.  Neither are safe to
> execute
> + * as-is on x86 hosts because they'll split memory but not the stack,
> + * wreaking havoc on host architectures that use the stack to store
> the
> + * return address as both threads try to pop it off.  Rejecting
> RFSPAWN
> + * entirely for now is ok, the only consumer at the moment is
> posix_spawn
> + * and it will fall back to classic vfork(2) if we return EINVAL.
> + */
> +if ((flags & TARGET_RFSPAWN) != 0) {
> +return -TARGET_EINVAL;
> +}
> +fork_start();
> +ret = rfork(flags);
> +if (ret == 0) {
> +/* child */
> +child_flag = 1;
> +target_cpu_clone_regs(cpu_env, 0);
> +} else {
> +/* parent */
> +child_flag = 0;
> +}
> +
> +/*
> + * The fork system call sets a child flag in the second return
> + * value: 0 for parent process, 1 for child process.
> + */
> +set_second_rval(cpu_env, child_flag);
> +fork_end(child_flag);
> +
> +return ret;
> +
> +}
> +
>  #endif /* BSD_USER_FREEBSD_OS_PROC_H */
> diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
> index cb9425c9ba..4c4e773d1d 100644
> --- a/bsd-user/freebsd/os-syscall.c
> +++ b/bsd-user/freebsd/os-syscall.c
> @@ -234,6 +234,10 @@ static abi_long freebsd_syscall(void *cpu_env, int
> num, abi_long arg1,
>  ret = do_freebsd_vfork(cpu_env);
>  break;
>
> +case TARGET_FREEBSD_NR_rfork: /* rfork(2) */
> +ret = do_freebsd_rfork(cpu_env, arg1);
> +break;
> +
>  case TARGET_FREEBSD_NR_execve: /* execve(2) */
>  ret = do_freebsd_execve(arg1, arg2, arg3);
>  break;
> --
> 2.42.0
>
>

Re: [PATCH v2 26/28] bsd-user: Implement fork(2) and vfork(2) system calls.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 34 ++
>  bsd-user/freebsd/os-syscall.c |  8 
>  2 files changed, 42 insertions(+)
>

Reviewed-by: Warner Losh 

Though i have minor qualms about vfork == fork, for bsd-user it's fine
since I
don't think the performance difference will be that large for the typical
case
where vfork + exec exists for older (now kinda really old) programs that
used
to use this.

Warner

Re: [PATCH v2 25/28] bsd-user: Implement pdgetpid(2) and the undocumented setugid.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 23 +++
>  bsd-user/freebsd/os-syscall.c |  8 
>  2 files changed, 31 insertions(+)
>
> diff --git a/bsd-user/freebsd/os-proc.h b/bsd-user/freebsd/os-proc.h
> index 1866f0b2d6..47bcdcf8a3 100644
> --- a/bsd-user/freebsd/os-proc.h
> +++ b/bsd-user/freebsd/os-proc.h
> @@ -34,6 +34,8 @@ pid_t safe_wait4(pid_t wpid, int *status, int options,
> struct rusage *rusage);
>  pid_t safe_wait6(idtype_t idtype, id_t id, int *status, int options,
>  struct __wrusage *wrusage, siginfo_t *infop);
>
> +extern int __setugid(int flag);
> +
>  /* execve(2) */
>  static inline abi_long do_freebsd_execve(abi_ulong path_or_fd, abi_ulong
> argp,
>  abi_ulong envp)
> @@ -155,4 +157,25 @@ static inline abi_long
> do_freebsd_getloginclass(abi_ulong arg1, abi_ulong arg2)
>  return ret;
>  }
>
> +/* pdgetpid(2) */
> +static inline abi_long do_freebsd_pdgetpid(abi_long fd, abi_ulong
> target_pidp)
> +{
> +abi_long ret;
> +pid_t pid;
> +
> +ret = get_errno(pdgetpid(fd, ));
> +if (!is_error(ret)) {
> +if (put_user_u32(pid, target_pidp)) {
> +return -TARGET_EFAULT;
> +}
> +}
> +return ret;
> +}
> +
> +/* undocumented __setugid */
> +static inline abi_long do_freebsd___setugid(abi_long arg1)
> +{
> +return get_errno(__setugid(arg1));
>

This should be return -TARGET_ENOSYS since the kernel doesn't implement
it for anything except a regression test  And what it does is quite
dangerous,
so we don't want someone to think it's a good idea to implement it in the
future.

Warner


> +}
> +
>  #endif /* BSD_USER_FREEBSD_OS_PROC_H */
> diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
> index d614409e69..99af0f6b15 100644
> --- a/bsd-user/freebsd/os-syscall.c
> +++ b/bsd-user/freebsd/os-syscall.c
> @@ -383,6 +383,14 @@ static abi_long freebsd_syscall(void *cpu_env, int
> num, abi_long arg1,
>  ret = do_freebsd_getloginclass(arg1, arg2);
>  break;
>
> +case TARGET_FREEBSD_NR_pdgetpid: /* pdgetpid(2) */
> +ret = do_freebsd_pdgetpid(arg1, arg2);
> +break;
> +
> +case TARGET_FREEBSD_NR___setugid: /* undocumented */
> +ret = do_freebsd___setugid(arg1);
> +break;
> +
>  case TARGET_FREEBSD_NR_utrace: /* utrace(2) */
>  ret = do_bsd_utrace(arg1, arg2);
>  break;
> --
> 2.42.0
>
>

Re: [PATCH v2 24/28] bsd-user: Implement setloginclass(2) and getloginclass(2) system calls.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 32 
>  bsd-user/freebsd/os-syscall.c |  8 
>  2 files changed, 40 insertions(+)
>
> diff --git a/bsd-user/freebsd/os-proc.h b/bsd-user/freebsd/os-proc.h
> index 8a0b6e25bb..1866f0b2d6 100644
> --- a/bsd-user/freebsd/os-proc.h
> +++ b/bsd-user/freebsd/os-proc.h
> @@ -123,4 +123,36 @@ static inline abi_long do_freebsd_wait6(void
> *cpu_env, abi_long idtype,
>  return ret;
>  }
>
> +/* setloginclass(2) */
> +static inline abi_long do_freebsd_setloginclass(abi_ulong arg1)
> +{
> +abi_long ret;
> +void *p;
> +
> +p = lock_user_string(arg1);
> +if (p == NULL) {
> +return -TARGET_EFAULT;
> +}
> +ret = get_errno(setloginclass(p));
> +unlock_user(p, arg1, 0);
> +
> +return ret;
> +}
> +
> +/* getloginclass(2) */
> +static inline abi_long do_freebsd_getloginclass(abi_ulong arg1, abi_ulong
> arg2)
> +{
> +abi_long ret;
> +void *p;
> +
> +p = lock_user_string(arg1);
>

This has the same problem that I highlighted in _getlogin() has. The kernel
returns a string, so we have to lock the buffer for it, not the string.

Warner


> +if (p == NULL) {
> +return -TARGET_EFAULT;
> +}
> +ret = get_errno(getloginclass(p, arg2));
> +unlock_user(p, arg1, 0);
> +
> +return ret;
> +}
> +
>  #endif /* BSD_USER_FREEBSD_OS_PROC_H */
> diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
> index 55e68e4815..d614409e69 100644
> --- a/bsd-user/freebsd/os-syscall.c
> +++ b/bsd-user/freebsd/os-syscall.c
> @@ -375,6 +375,14 @@ static abi_long freebsd_syscall(void *cpu_env, int
> num, abi_long arg1,
>  ret = do_bsd_ktrace(arg1, arg2, arg3, arg4);
>  break;
>
> +case TARGET_FREEBSD_NR_setloginclass: /* setloginclass(2) */
> +ret = do_freebsd_setloginclass(arg1);
> +break;
> +
> +case TARGET_FREEBSD_NR_getloginclass: /* getloginclass(2) */
> +ret = do_freebsd_getloginclass(arg1, arg2);
> +break;
> +
>  case TARGET_FREEBSD_NR_utrace: /* utrace(2) */
>  ret = do_bsd_utrace(arg1, arg2);
>  break;
> --
> 2.42.0
>
>

Re: [PATCH v2 23/28] bsd-user: Implement wait4(2) and wait6(2) system calls.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 77 +++
>  bsd-user/freebsd/os-syscall.c | 15 +++
>  2 files changed, 92 insertions(+)
>
> diff --git a/bsd-user/freebsd/os-proc.h b/bsd-user/freebsd/os-proc.h
> index 75ed39f8dd..8a0b6e25bb 100644
> --- a/bsd-user/freebsd/os-proc.h
> +++ b/bsd-user/freebsd/os-proc.h
> @@ -30,6 +30,10 @@
>
>  #include "target_arch_cpu.h"
>
> +pid_t safe_wait4(pid_t wpid, int *status, int options, struct rusage
> *rusage);
> +pid_t safe_wait6(idtype_t idtype, id_t id, int *status, int options,
> +struct __wrusage *wrusage, siginfo_t *infop);
> +
>  /* execve(2) */
>  static inline abi_long do_freebsd_execve(abi_ulong path_or_fd, abi_ulong
> argp,
>  abi_ulong envp)
> @@ -46,4 +50,77 @@ static inline abi_long do_freebsd_fexecve(abi_ulong
> path_or_fd, abi_ulong argp,
>  return freebsd_exec_common(path_or_fd, argp, envp, 1);
>  }
>
> +/* wait4(2) */
> +static inline abi_long do_freebsd_wait4(abi_long arg1, abi_ulong
> target_status,
> +abi_long arg3, abi_ulong target_rusage)
> +{
> +abi_long ret;
> +int status;
> +struct rusage rusage, *rusage_ptr = NULL;
> +
> +if (target_rusage) {
> +rusage_ptr = 
> +}
> +ret = get_errno(safe_wait4(arg1, , arg3, rusage_ptr));
> +if (target_status != 0) {
> +status = host_to_target_waitstatus(status);
> +if (put_user_s32(status, target_status) != 0) {
> +return -TARGET_EFAULT;
> +}
> +}
> +if (target_rusage != 0) {
> +host_to_target_rusage(target_rusage, );
> +}
> +return ret;
>

I think that both of these 'if' statements should only be done if ret == 0.
Otherwise
it's an error return which doesn't usually write any arguments (unless the
error
is because of a fault on trying to write a return value).

Warner


> +}
> +
> +/* wait6(2) */
> +static inline abi_long do_freebsd_wait6(void *cpu_env, abi_long idtype,
> +abi_long id1, abi_long id2,
> +abi_ulong target_status, abi_long options, abi_ulong target_wrusage,
> +abi_ulong target_infop, abi_ulong pad1)
> +{
> +abi_long ret;
> +int status;
> +struct __wrusage wrusage, *wrusage_ptr = NULL;
> +siginfo_t info;
> +void *p;
> +
> +if (regpairs_aligned(cpu_env) != 0) {
> +/* printf("shifting args\n"); */
> +/* 64-bit id is aligned, so shift all the arguments over by one */
> +id1 = id2;
> +id2 = target_status;
> +target_status = options;
> +options = target_wrusage;
> +target_wrusage = target_infop;
> +target_infop = pad1;
> +}
> +
> +if (target_wrusage) {
> +wrusage_ptr = 
> +}
> +ret = safe_wait6(idtype, target_arg64(id1, id2),
> + , options, wrusage_ptr, );
> +ret = get_errno(ret);
> +if (target_status != 0) {
> +status = host_to_target_waitstatus(status);
> +if (put_user_s32(status, target_status) != 0) {
> +return -TARGET_EFAULT;
> +}
> +}
> +if (target_wrusage != 0) {
> +host_to_target_wrusage(target_wrusage, );
> +}
> +if (target_infop != 0) {
> +p = lock_user(VERIFY_WRITE, target_infop,
> sizeof(target_siginfo_t), 0);
> +if (p == NULL) {
> +return -TARGET_EFAULT;
> +}
> +host_to_target_siginfo(p, );
> +unlock_user(p, target_infop, sizeof(target_siginfo_t));
> +}
> +return ret;
> +}
> +
>  #endif /* BSD_USER_FREEBSD_OS_PROC_H */
> diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
> index 515eaaf31f..55e68e4815 100644
> --- a/bsd-user/freebsd/os-syscall.c
> +++ b/bsd-user/freebsd/os-syscall.c
> @@ -40,6 +40,12 @@
>  #include "os-stat.h"
>  #include "os-proc.h"
>
> +/* used in os-proc */
> +safe_syscall4(pid_t, wait4, pid_t, wpid, int *, status, int, options,
> +struct rusage *, rusage);
> +safe_syscall6(pid_t, wait6, idtype_t, idtype, id_t, id, int *, status,
> int,
> +options, struct __wrusage *, wrusage, siginfo_t *, infop);
> +
>  /* I/O */
>  safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
>  safe_syscall4(int, openat, int, fd, const char *, path, int, flags,
> mode_t,
> @@ -228,6 +234,15 @@ static abi_long freebsd_syscall(void *cpu_env, int
> num, abi_long arg1,
>  ret = do_freebsd_fexecve(arg1, arg2, arg3);
>  break;
>
> +case TARGET_FREEBSD_NR_wait4: /* wait4(2) */
> +ret = do_freebsd_wait4(arg1, arg2, arg3, arg4);
> +break;
> +
> +case TARGET_FREEBSD_NR_wait6: /* wait6(2) */
> +ret = do_freebsd_wait6(cpu_env, arg1, arg2, arg3,
> +   arg4, arg5, arg6, arg7, arg8);
> +break;
> +
>  case TARGET_FREEBSD_NR_exit: /* exit(2) */
>  ret = do_bsd_exit(cpu_env, arg1);
>

Re: [PATCH v2 22/28] bsd-user: Implement execve(2) and fexecve(2) system calls.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/os-proc.h| 49 +++
>  bsd-user/freebsd/os-syscall.c | 11 +++-
>  2 files changed, 59 insertions(+), 1 deletion(-)
>  create mode 100644 bsd-user/freebsd/os-proc.h
>

Reviewed-by: Warner Losh 

But make sure that the guard variable name is correct, I think with scripts/
clean-header-guards.pl


> diff --git a/bsd-user/freebsd/os-proc.h b/bsd-user/freebsd/os-proc.h
> new file mode 100644
> index 00..75ed39f8dd
> --- /dev/null
> +++ b/bsd-user/freebsd/os-proc.h
> @@ -0,0 +1,49 @@
> +/*
> + *  process related system call shims and definitions
> + *
> + *  Copyright (c) 2013-14 Stacey D. Son
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, see .
> + */
> +
> +#ifndef BSD_USER_FREEBSD_OS_PROC_H
> +#define BSD_USER_FREEBSD_OS_PROC_H
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "target_arch_cpu.h"
> +
> +/* execve(2) */
> +static inline abi_long do_freebsd_execve(abi_ulong path_or_fd, abi_ulong
> argp,
> +abi_ulong envp)
> +{
> +
> +return freebsd_exec_common(path_or_fd, argp, envp, 0);
> +}
> +
> +/* fexecve(2) */
> +static inline abi_long do_freebsd_fexecve(abi_ulong path_or_fd, abi_ulong
> argp,
> +abi_ulong envp)
> +{
> +
> +return freebsd_exec_common(path_or_fd, argp, envp, 1);
> +}
> +
> +#endif /* BSD_USER_FREEBSD_OS_PROC_H */
> diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
> index b7bd0b92a6..515eaaf31f 100644
> --- a/bsd-user/freebsd/os-syscall.c
> +++ b/bsd-user/freebsd/os-syscall.c
> @@ -36,8 +36,9 @@
>  #include "bsd-file.h"
>  #include "bsd-proc.h"
>
> -/* *BSD dependent syscall shims */
> +/* BSD dependent syscall shims */
>  #include "os-stat.h"
> +#include "os-proc.h"
>
>  /* I/O */
>  safe_syscall3(int, open, const char *, path, int, flags, mode_t, mode);
> @@ -219,6 +220,14 @@ static abi_long freebsd_syscall(void *cpu_env, int
> num, abi_long arg1,
>  /*
>   * process system calls
>   */
> +case TARGET_FREEBSD_NR_execve: /* execve(2) */
> +ret = do_freebsd_execve(arg1, arg2, arg3);
> +break;
> +
> +case TARGET_FREEBSD_NR_fexecve: /* fexecve(2) */
> +ret = do_freebsd_fexecve(arg1, arg2, arg3);
> +break;
> +
>  case TARGET_FREEBSD_NR_exit: /* exit(2) */
>  ret = do_bsd_exit(cpu_env, arg1);
>  break;
> --
> 2.42.0
>
>

Re: [PATCH v2 21/28] bsd-user: Implement procctl(2) along with necessary conversion functions.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Implement t2h_procctl_cmd, h2t_reaper_status, h2t_reaper_pidinfo and
> h2t/t2h reaper_kill conversion functions.
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> ---
>  bsd-user/freebsd/os-proc.c| 223 ++
>  bsd-user/freebsd/os-syscall.c |   3 +
>  2 files changed, 226 insertions(+)
>
> diff --git a/bsd-user/freebsd/os-proc.c b/bsd-user/freebsd/os-proc.c
> index 12d78b7fc9..6b8753f8e5 100644
> --- a/bsd-user/freebsd/os-proc.c
> +++ b/bsd-user/freebsd/os-proc.c
> @@ -255,3 +255,226 @@ execve_end:
>  return ret;
>  }
>
> +#include 
> +
> +static abi_long
> +t2h_procctl_cmd(int target_cmd, int *host_cmd)
> +{
> +switch (target_cmd) {
> +case TARGET_PROC_SPROTECT:
> +*host_cmd = PROC_SPROTECT;
> +break;
> +
> +case TARGET_PROC_REAP_ACQUIRE:
> +*host_cmd = PROC_REAP_ACQUIRE;
> +break;
> +
> +case TARGET_PROC_REAP_RELEASE:
> +*host_cmd = PROC_REAP_RELEASE;
> +break;
> +
> +case TARGET_PROC_REAP_STATUS:
> +*host_cmd = PROC_REAP_STATUS;
> +break;
> +
> +case TARGET_PROC_REAP_KILL:
> +*host_cmd = PROC_REAP_KILL;
> +break;
> +
> +default:
> +return -TARGET_EINVAL;
> +}
> +
> +return 0;
> +}
> +
> +static abi_long
> +h2t_reaper_status(struct procctl_reaper_status *host_rs,
> +abi_ulong target_rs_addr)
> +{
> +struct target_procctl_reaper_status *target_rs;
> +
> +if (!lock_user_struct(VERIFY_WRITE, target_rs, target_rs_addr, 0)) {
> +return -TARGET_EFAULT;
> +}
> +__put_user(host_rs->rs_flags, _rs->rs_flags);
> +__put_user(host_rs->rs_children, _rs->rs_children);
> +__put_user(host_rs->rs_descendants, _rs->rs_descendants);
> +__put_user(host_rs->rs_reaper, _rs->rs_reaper);
> +__put_user(host_rs->rs_pid, _rs->rs_pid);
> +unlock_user_struct(target_rs, target_rs_addr, 1);
> +
> +return 0;
> +}
> +
> +static abi_long
> +t2h_reaper_kill(abi_ulong target_rk_addr, struct procctl_reaper_kill
> *host_rk)
> +{
> +struct target_procctl_reaper_kill *target_rk;
> +
> +if (!lock_user_struct(VERIFY_READ, target_rk, target_rk_addr, 1)) {
> +return -TARGET_EFAULT;
> +}
> +__get_user(host_rk->rk_sig, _rk->rk_sig);
> +__get_user(host_rk->rk_flags, _rk->rk_flags);
> +__get_user(host_rk->rk_subtree, _rk->rk_subtree);
> +__get_user(host_rk->rk_killed, _rk->rk_killed);
> +__get_user(host_rk->rk_fpid, _rk->rk_fpid);
> +unlock_user_struct(target_rk, target_rk_addr, 0);
> +
> +return 0;
> +}
> +
> +static abi_long
> +h2t_reaper_kill(struct procctl_reaper_kill *host_rk, abi_ulong
> target_rk_addr)
> +{
> +struct target_procctl_reaper_kill *target_rk;
> +
> +if (!lock_user_struct(VERIFY_WRITE, target_rk, target_rk_addr, 0)) {
> +return -TARGET_EFAULT;
> +}
> +__put_user(host_rk->rk_sig, _rk->rk_sig);
> +__put_user(host_rk->rk_flags, _rk->rk_flags);
> +__put_user(host_rk->rk_subtree, _rk->rk_subtree);
> +__put_user(host_rk->rk_killed, _rk->rk_killed);
> +__put_user(host_rk->rk_fpid, _rk->rk_fpid);
> +unlock_user_struct(target_rk, target_rk_addr, 1);
> +
> +return 0;
> +}
> +
> +static abi_long
> +h2t_procctl_reaper_pidinfo(struct procctl_reaper_pidinfo *host_pi,
> +abi_ulong target_pi_addr)
> +{
> +struct target_procctl_reaper_pidinfo *target_pi;
> +
> +if (!lock_user_struct(VERIFY_WRITE, target_pi, target_pi_addr, 0)) {
> +return -TARGET_EFAULT;
> +}
> +__put_user(host_pi->pi_pid, _pi->pi_pid);
> +__put_user(host_pi->pi_subtree, _pi->pi_subtree);
> +__put_user(host_pi->pi_flags, _pi->pi_flags);
> +unlock_user_struct(target_pi, target_pi_addr, 1);
> +
> +return 0;
> +}
> +
> +abi_long
> +do_freebsd_procctl(void *cpu_env, int idtype, abi_ulong arg2, abi_ulong
> arg3,
> +   abi_ulong arg4, abi_ulong arg5, abi_ulong arg6)
> +{
> +abi_long error = 0, target_rp_pids;
> +void *data;
> +int host_cmd, flags;
> +uint32_t u, target_rp_count;
> +union {
> +struct procctl_reaper_status rs;
> +struct procctl_reaper_pids rp;
> +struct procctl_reaper_kill rk;
> +} host;
> +struct target_procctl_reaper_pids *target_rp;
> +id_t id; /* 64-bit */
> +int target_cmd;
> +abi_ulong target_arg;
> +
> +#if TARGET_ABI_BITS == 32
> +/* See if we need to align the register pairs. */
> +if (regpairs_aligned(cpu_env)) {
> +id = (id_t)target_arg64(arg3, arg4);
> +target_cmd = (int)arg5;
> +target_arg = arg6;
> +} else {
> +id = (id_t)target_arg64(arg2, arg3);
> +target_cmd = (int)arg4;
> +target_arg = arg5;
> +}
> +#else
> +id = (id_t)arg2;
> +target_cmd = (int)arg3;
> +target_arg = arg4;
> +#endif
> +
> +error = t2h_procctl_cmd(target_cmd, _cmd);
> +if (error) {
> +return

Re: [PATCH v2 20/28] bsd-user: Implement freebsd_exec_common, used in implementing execve/fexecve.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> ---
>  bsd-user/freebsd/os-proc.c | 177 +
>  bsd-user/main.c|   2 +-
>  bsd-user/qemu.h|   1 +
>  3 files changed, 179 insertions(+), 1 deletion(-)
>

Reviewed-by: Warner Losh 

But see comment below.


> diff --git a/bsd-user/freebsd/os-proc.c b/bsd-user/freebsd/os-proc.c
> index cb35f29f10..12d78b7fc9 100644
> --- a/bsd-user/freebsd/os-proc.c
> +++ b/bsd-user/freebsd/os-proc.c
> @@ -78,3 +78,180 @@ out:
>  return ret;
>  }
>
> +/*
> + * execve/fexecve
> + */
> +abi_long freebsd_exec_common(abi_ulong path_or_fd, abi_ulong guest_argp,
> +abi_ulong guest_envp, int do_fexec)
> +{
> +char **argp, **envp, **qargp, **qarg1, **qarg0, **qargend;
> +int argc, envc;
> +abi_ulong gp;
> +abi_ulong addr;
> +char **q;
> +int total_size = 0;
> +void *p;
> +abi_long ret;
> +
> +argc = 0;
> +for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) {
> +if (get_user_ual(addr, gp)) {
> +return -TARGET_EFAULT;
> +}
> +if (!addr) {
> +break;
> +}
> +argc++;
> +}
> +envc = 0;
> +for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) {
> +if (get_user_ual(addr, gp)) {
> +return -TARGET_EFAULT;
> +}
> +if (!addr) {
> +break;
> +}
> +envc++;
> +}
> +
> +qarg0 = argp = g_new0(char *, argc + 9);
> +/* save the first agrument for the emulator */
> +*argp++ = (char *)getprogname();
> +qargp = argp;
> +*argp++ = (char *)getprogname();
> +qarg1 = argp;
> +envp = g_new0(char *, envc + 1);
> +for (gp = guest_argp, q = argp; gp; gp += sizeof(abi_ulong), q++) {
> +if (get_user_ual(addr, gp)) {
> +ret = -TARGET_EFAULT;
> +goto execve_end;
> +}
> +if (!addr) {
> +break;
> +}
> +*q = lock_user_string(addr);
> +if (*q == NULL) {
> +ret = -TARGET_EFAULT;
> +goto execve_end;
> +}
> +total_size += strlen(*q) + 1;
> +}
> +*q++ = NULL;
> +qargend = q;
> +
> +for (gp = guest_envp, q = envp; gp; gp += sizeof(abi_ulong), q++) {
> +if (get_user_ual(addr, gp)) {
> +ret = -TARGET_EFAULT;
> +goto execve_end;
> +}
> +if (!addr) {
> +break;
> +}
> +*q = lock_user_string(addr);
> +if (*q == NULL) {
> +ret = -TARGET_EFAULT;
> +goto execve_end;
> +}
> +total_size += strlen(*q) + 1;
> +}
> +*q = NULL;
> +
> +/*
> + * This case will not be caught by the host's execve() if its
> + * page size is bigger than the target's.
> + */
> +if (total_size > MAX_ARG_PAGES * TARGET_PAGE_SIZE) {
> +ret = -TARGET_E2BIG;
> +goto execve_end;
> +}
> +
> +if (do_fexec) {
> +if (((int)path_or_fd > 0 &&
> +is_target_elf_binary((int)path_or_fd)) == 1) {
> +char execpath[PATH_MAX];
> +
> +/*
> + * The executable is an elf binary for the target
> + * arch.  execve() it using the emulator if we can
> + * determine the filename path from the fd.
> + */
>

So we do this fd dance so we can make things like 'qemu-arm-static
/armv7/bin/sh' work.
Doug Rabson has some changes that means we can ditch this, I think, since
the
kernel will just track it and it will default to 'what is doing the current
process'
rather than the system default for the same binfmt entry.


> +if (get_filename_from_fd(getpid(), (int)path_or_fd, execpath,
> +sizeof(execpath)) != NULL) {
> +memmove(qarg1 + 2, qarg1, (qargend - qarg1) *
> sizeof(*qarg1));
> +qarg1[1] = qarg1[0];
> +qarg1[0] = (char *)"-0";
> +qarg1 += 2;
> +qargend += 2;
> +*qarg1 = execpath;
> +#ifndef DONT_INHERIT_INTERP_PREFIX
> +memmove(qarg1 + 2, qarg1, (qargend - qarg1) *
> sizeof(*qarg1));
> +*qarg1++ = (char *)"-L";
> +*qarg1++ = (char *)interp_prefix;
> +#endif
>

And we do this inheritance so we can pass in a non-standard library path,
maybe for testing, and have the above example also work.

Warner


> +ret = get_errno(execve(qemu_proc_pathname, qargp, envp));
> +} else {
> +/* Getting the filename path failed. */
> +ret = -TARGET_EBADF;
> +goto execve_end;
> +}
> +} else {
> +ret = get_errno(fexecve((int)path_or_fd, argp, envp));
> +}
> +} else {
> +int fd;
> +
> +p = lock_user_string(path_or_fd);
> +if (p == NULL) {
>

Re: [PATCH v4 03/14] simpletrace: improve parsing of sys.argv; fix files never closed.

2023-09-20 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:18AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> The arguments extracted from `sys.argv` named and unpacked to make it
> clear what the arguments are and what they're used for.
> 
> The two input files were opened, but never explicitly closed. File usage
> changed to use `with` statement to take care of this. At the same time,
> ownership of the file-object is moved up to `run` function. Added option
> to process to support file-like objects.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 50 --
>  1 file changed, 34 insertions(+), 16 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 19/28] bsd-user: Implement get_filename_from_fd.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/freebsd/meson.build |  1 +
>  bsd-user/freebsd/os-proc.c   | 80 
>  2 files changed, 81 insertions(+)
>  create mode 100644 bsd-user/freebsd/os-proc.c
>

Reviewed-by: Warner Losh

Re: [PATCH v2 18/28] bsd-user: Implement getpriority(2) and setpriority(2).

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> ---
>  bsd-user/bsd-proc.h   | 24 
>  bsd-user/freebsd/os-syscall.c |  8 
>  2 files changed, 32 insertions(+)
>

Reviewed-by: Warner Losh 

Looks right to my eye.  Let's see if Richard catches anything.



> diff --git a/bsd-user/bsd-proc.h b/bsd-user/bsd-proc.h
> index fff1d4cded..89792d26c6 100644
> --- a/bsd-user/bsd-proc.h
> +++ b/bsd-user/bsd-proc.h
> @@ -390,4 +390,28 @@ static inline abi_long do_bsd_ptrace(abi_long arg1,
> abi_long arg2,
>  return -TARGET_ENOSYS;
>  }
>
> +/* getpriority(2) */
> +static inline abi_long do_bsd_getpriority(abi_long which, abi_long who)
> +{
> +abi_long ret;
> +/*
> + * Note that negative values are valid for getpriority, so we must
> + * differentiate based on errno settings.
> + */
> +errno = 0;
> +ret = getpriority(which, who);
> +if (ret == -1 && errno != 0) {
> +return -host_to_target_errno(errno);
> +}
> +
> +return ret;
> +}
> +
> +/* setpriority(2) */
> +static inline abi_long do_bsd_setpriority(abi_long which, abi_long who,
> +  abi_long prio)
> +{
> +return get_errno(setpriority(which, who, prio));
> +}
> +
>  #endif /* !BSD_PROC_H_ */
> diff --git a/bsd-user/freebsd/os-syscall.c b/bsd-user/freebsd/os-syscall.c
> index 1a760b1380..71a2657dd0 100644
> --- a/bsd-user/freebsd/os-syscall.c
> +++ b/bsd-user/freebsd/os-syscall.c
> @@ -359,6 +359,14 @@ static abi_long freebsd_syscall(void *cpu_env, int
> num, abi_long arg1,
>  ret = do_bsd_ptrace(arg1, arg2, arg3, arg4);
>  break;
>
> +case TARGET_FREEBSD_NR_getpriority: /* getpriority(2) */
> +ret = do_bsd_getpriority(arg1, arg2);
> +break;
> +
> +case TARGET_FREEBSD_NR_setpriority: /* setpriority(2) */
> +ret = do_bsd_setpriority(arg1, arg2, arg3);
> +break;
> +
>
>  /*
>   * File system calls.
> --
> 2.42.0
>
>

Re: [PATCH v2 17/28] bsd-user: Add stubs for profil(2), ktrace(2), utrace(2) and ptrace(2).

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.h   | 28 
>  bsd-user/freebsd/os-syscall.c | 16 
>  2 files changed, 44 insertions(+)
>

Reviewed-by: Warner Losh

Re: [PATCH v2 16/28] bsd-user: Implement get/set[resuid/resgid/sid] and issetugid.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.h   | 76 +++
>  bsd-user/freebsd/os-syscall.c | 28 +
>  2 files changed, 104 insertions(+)
>

Reviewed-by: Warner Losh

Re: [PATCH v2 15/28] bsd-user: Implement several get/set system calls:

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> getpid(2), getppid(2), getpgrp(2)
> setreuid(2), setregid(2)
> getuid(2), geteuid(2), getgid(2), getegid(2), getpgid(2)
> setuid(2), seteuid(2), setgid(2), setegid(2), setpgid(2)
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.h   | 90 +++
>  bsd-user/freebsd/os-syscall.c | 60 +++
>  2 files changed, 150 insertions(+)
>

Reviewed-by: Warner Losh

Re: [PATCH v2 14/28] bsd-user: Implement getrlimit(2) and setrlimit(2)

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.h   | 59 +++
>  bsd-user/freebsd/os-syscall.c |  8 +
>  2 files changed, 67 insertions(+)
>

Reviewed-by: Warner Losh

Re: [PATCH 10/21] q800: add easc bool machine class property to switch between ASC and EASC

2023-09-20 Thread Markus Armbruster

Mark Cave-Ayland  writes:

> On 11/09/2023 06:15, Markus Armbruster wrote:
>
>> Philippe Mathieu-Daudé  writes:

[...]

>>> I'm not sure when we want a write-only QOM boolean property, so I
>>> genuinely ask, since I agree introspecting QOM object fields from
>>> the monitor is helpful.
>>
>> I suspect write-only properties came out of QOM's generality curse.  Do
>> we have even one?  QOM's design makes this somewhat to tell.
>
> Good question. Given that it's towards the beginning of the next dev cycle, 
> perhaps it is worth sending a patch to find out? ;)

Getting rid of unused generality / unnecessary complexity is good.

Re: [PATCH v2 13/28] bsd-user: Implement getrusage(2).

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.h   | 13 +
>  bsd-user/freebsd/os-syscall.c |  4 
>  2 files changed, 17 insertions(+)
>

Reviewed by: Warner Losh

Re: [PATCH v2 12/28] bsd-user: Implement umask(2), setlogin(2) and getlogin(2)

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.h   | 39 +++
>  bsd-user/freebsd/os-syscall.c | 12 +++
>  2 files changed, 51 insertions(+)
>
> diff --git a/bsd-user/bsd-proc.h b/bsd-user/bsd-proc.h
> index 7b25aa1982..fd05422d9a 100644
> --- a/bsd-user/bsd-proc.h
> +++ b/bsd-user/bsd-proc.h
> @@ -26,6 +26,7 @@
>  #include "gdbstub/syscalls.h"
>  #include "qemu/plugin.h"
>
> +extern int _getlogin(char*, int);
>  int bsd_get_ncpu(void);
>
>  /* exit(2) */
> @@ -85,4 +86,42 @@ static inline abi_long do_bsd_setgroups(abi_long
> gidsetsize, abi_long arg2)
>  return get_errno(setgroups(gidsetsize, grouplist));
>  }
>
> +/* umask(2) */
> +static inline abi_long do_bsd_umask(abi_long arg1)
> +{
> +return get_errno(umask(arg1));
> +}
> +
> +/* setlogin(2) */
> +static inline abi_long do_bsd_setlogin(abi_long arg1)
> +{
> +abi_long ret;
> +void *p;
> +
> +p = lock_user_string(arg1);
> +if (p == NULL) {
> +return -TARGET_EFAULT;
> +}
> +ret = get_errno(setlogin(p));
> +unlock_user(p, arg1, 0);
> +
> +return ret;
> +}
> +
> +/* getlogin(2) */
> +static inline abi_long do_bsd_getlogin(abi_long arg1, abi_long arg2)
> +{
> +abi_long ret;
> +void *p;
> +
> +p = lock_user_string(arg1);
> +if (p == NULL) {
> +return -TARGET_EFAULT;
> +}
>

This looks backwards. We're calling the kernel to get this string, so the
target_strlen() tht lock_user_string() does is on the receiving buffer, not
the length of the string that we'd like to write.

 I think we want
p = lock_user(VERIFY_READ, arg1, arg2, 0);

for this. sys_getlogin in sys/kern/kern_prot.c does a copyout. This is
clearly
broken in the 'blitz' branch.

Warner

[PATCH v2 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-20 Thread Markus Armbruster

Variables declared in macros can shadow other variables.  Much of the
time, this is harmless, e.g.:

#define _FDT(exp)  \
do {   \
int ret = (exp);   \
if (ret < 0) { \
error_report("error creating device tree: %s: %s",   \
#exp, fdt_strerror(ret));  \
exit(1);   \
}  \
} while (0)

Harmless shadowing in h_client_architecture_support():

target_ulong ret;

[...]

ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
if (ret == H_SUCCESS) {
_FDT((fdt_pack(spapr->fdt_blob)));
[...]
}

return ret;

However, we can get in trouble when the shadowed variable is used in a
macro argument:

#define QOBJECT(obj) ({ \
typeof(obj) o = (obj);  \
o ? container_of(&(o)->base, QObject, base) : NULL; \
 })

QOBJECT(o) expands into

({
--->typeof(o) o = (o);
o ? container_of(&(o)->base, QObject, base) : NULL;
})

Unintended variable name capture at --->.  We'd be saved by
-Winit-self.  But I could certainly construct more elaborate death
traps that don't trigger it.

To reduce the risk of trapping ourselves, we use variable names in
macros that no sane person would use elsewhere.  Here's our actual
definition of QOBJECT():

#define QOBJECT(obj) ({ \
typeof(obj) _obj = (obj);   \
_obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
})

Works well enough until we nest macro calls.  For instance, with

#define qobject_ref(obj) ({ \
typeof(obj) _obj = (obj);   \
qobject_ref_impl(QOBJECT(_obj));\
_obj;   \
})

the expression qobject_ref(obj) expands into

({
typeof(obj) _obj = (obj);
qobject_ref_impl(
({
--->typeof(_obj) _obj = (_obj);
_obj ? container_of(&(_obj)->base, QObject, base) : NULL;
}));
_obj;
})

Unintended variable name capture at --->.

The only reliable way to prevent unintended variable name capture is
-Wshadow.

One blocker for enabling it is shadowing hiding in function-like
macros like

 qdict_put(dict, "name", qobject_ref(...))

qdict_put() wraps its last argument in QOBJECT(), and the last
argument here contains another QOBJECT().

Use dark preprocessor sorcery to make the macros that give us this
problem use different variable names on every call.

Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
---
 include/qapi/qmp/qobject.h | 11 +--
 include/qemu/atomic.h  | 17 -
 include/qemu/compiler.h|  3 +++
 include/qemu/osdep.h   | 31 +++
 4 files changed, 47 insertions(+), 15 deletions(-)

diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index 9003b71fd3..d36cc97805 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -45,10 +45,17 @@ struct QObject {
 struct QObjectBase_ base;
 };
 
-#define QOBJECT(obj) ({ \
+/*
+ * Preprocessory sorcery ahead: use a different identifier for the
+ * local variable in each expansion, so we can nest macro calls
+ * without shadowing variables.
+ */
+#define QOBJECT_INTERNAL(obj, _obj) ({  \
 typeof(obj) _obj = (obj);   \
-_obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
+_obj\
+? container_of(&(_obj)->base, QObject, base) : NULL;\
 })
+#define QOBJECT(obj) QOBJECT_INTERNAL((obj), MAKE_IDENTFIER(_obj))
 
 /* Required for qobject_to() */
 #define QTYPE_CAST_TO_QNull QTYPE_QNULL
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index d95612f7a0..d4cbd01909 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -157,13 +157,20 @@
 smp_read_barrier_depends();
 #endif
 
-#define qatomic_rcu_read(ptr)  \
-({ \
+/*
+ * Preprocessory sorcery ahead: use a different identifier for the
+ * local variable in each expansion, so we can nest macro calls
+ * without shadowing variables.
+ */
+#define qatomic_rcu_read_internal(ptr, _val)\
+({  \
 qemu_build_assert(sizeof(*ptr) <= ATOMIC_REG_SIZE); \
-

[PATCH v2 1/7] migration/rdma: Fix save_page method to fail on polling error

2023-09-20 Thread Markus Armbruster

qemu_rdma_save_page() reports polling error with error_report(), then
succeeds anyway.  This is because the variable holding the polling
status *shadows* the variable the function returns.  The latter
remains zero.

Broken since day one, and duplicated more recently.

Fixes: 2da776db4846 (rdma: core logic)
Fixes: b390afd8c50b (migration/rdma: Fix out of order wrid)
Signed-off-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Reviewed-by: Peter Xu 
Reviewed-by: Li Zhijian 
---
 migration/rdma.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a2a3db35b1..3915d1d7c9 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3282,7 +3282,8 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
  */
 while (1) {
 uint64_t wr_id, wr_id_in;
-int ret = qemu_rdma_poll(rdma, rdma->recv_cq, _id_in, NULL);
+ret = qemu_rdma_poll(rdma, rdma->recv_cq, _id_in, NULL);
+
 if (ret < 0) {
 error_report("rdma migration: polling error! %d", ret);
 goto err;
@@ -3297,7 +3298,8 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
 
 while (1) {
 uint64_t wr_id, wr_id_in;
-int ret = qemu_rdma_poll(rdma, rdma->send_cq, _id_in, NULL);
+ret = qemu_rdma_poll(rdma, rdma->send_cq, _id_in, NULL);
+
 if (ret < 0) {
 error_report("rdma migration: polling error! %d", ret);
 goto err;
-- 
2.41.0

[PATCH v2 3/7] ui: Clean up local variable shadowing

2023-09-20 Thread Markus Armbruster

Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.

Signed-off-by: Markus Armbruster 
Reviewed-by: Peter Maydell 
---
 ui/gtk.c  | 14 +++---
 ui/spice-display.c|  9 +
 ui/vnc-palette.c  |  2 --
 ui/vnc.c  | 12 ++--
 ui/vnc-enc-zrle.c.inc |  9 -
 5 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index e09f97a86b..3373427c9b 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -930,8 +930,8 @@ static gboolean gd_motion_event(GtkWidget *widget, 
GdkEventMotion *motion,
 GdkMonitor *monitor = gdk_display_get_monitor_at_window(dpy, win);
 GdkRectangle geometry;
 
-int x = (int)motion->x_root;
-int y = (int)motion->y_root;
+int xr = (int)motion->x_root;
+int yr = (int)motion->y_root;
 
 gdk_monitor_get_geometry(monitor, );
 
@@ -942,13 +942,13 @@ static gboolean gd_motion_event(GtkWidget *widget, 
GdkEventMotion *motion,
  * may still be only half way across the screen. Without
  * this warp, the server pointer would thus appear to hit
  * an invisible wall */
-if (x <= geometry.x || x - geometry.x >= geometry.width - 1 ||
-y <= geometry.y || y - geometry.y >= geometry.height - 1) {
+if (xr <= geometry.x || xr - geometry.x >= geometry.width - 1 ||
+yr <= geometry.y || yr - geometry.y >= geometry.height - 1) {
 GdkDevice *dev = gdk_event_get_device((GdkEvent *)motion);
-x = geometry.x + geometry.width / 2;
-y = geometry.y + geometry.height / 2;
+xr = geometry.x + geometry.width / 2;
+yr = geometry.y + geometry.height / 2;
 
-gdk_device_warp(dev, screen, x, y);
+gdk_device_warp(dev, screen, xr, yr);
 s->last_set = FALSE;
 return FALSE;
 }
diff --git a/ui/spice-display.c b/ui/spice-display.c
index 5cc47bd668..6eb98a5a5c 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -1081,15 +1081,16 @@ static void qemu_spice_gl_update(DisplayChangeListener 
*dcl,
 }
 
 if (render_cursor) {
-int x, y;
+int ptr_x, ptr_y;
+
 qemu_mutex_lock(>lock);
-x = ssd->ptr_x;
-y = ssd->ptr_y;
+ptr_x = ssd->ptr_x;
+ptr_y = ssd->ptr_y;
 qemu_mutex_unlock(>lock);
 egl_texture_blit(ssd->gls, >blit_fb, >guest_fb,
  !y_0_top);
 egl_texture_blend(ssd->gls, >blit_fb, >cursor_fb,
-  !y_0_top, x, y, 1.0, 1.0);
+  !y_0_top, ptr_x, ptr_y, 1.0, 1.0);
 glFlush();
 }
 
diff --git a/ui/vnc-palette.c b/ui/vnc-palette.c
index dc7c0ba997..4e88c412f0 100644
--- a/ui/vnc-palette.c
+++ b/ui/vnc-palette.c
@@ -86,8 +86,6 @@ int palette_put(VncPalette *palette, uint32_t color)
 return 0;
 }
 if (!entry) {
-VncPaletteEntry *entry;
-
 entry = >pool[palette->size];
 entry->color = color;
 entry->idx = idx;
diff --git a/ui/vnc.c b/ui/vnc.c
index 6fd86996a5..ecb75ff8c8 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1584,15 +1584,15 @@ static void vnc_jobs_bh(void *opaque)
  */
 static int vnc_client_read(VncState *vs)
 {
-size_t ret;
+size_t sz;
 
 #ifdef CONFIG_VNC_SASL
 if (vs->sasl.conn && vs->sasl.runSSF)
-ret = vnc_client_read_sasl(vs);
+sz = vnc_client_read_sasl(vs);
 else
 #endif /* CONFIG_VNC_SASL */
-ret = vnc_client_read_plain(vs);
-if (!ret) {
+sz = vnc_client_read_plain(vs);
+if (!sz) {
 if (vs->disconnecting) {
 vnc_disconnect_finish(vs);
 return -1;
@@ -3118,8 +3118,8 @@ static int vnc_refresh_server_surface(VncDisplay *vd)
 cmp_bytes = MIN(VNC_DIRTY_PIXELS_PER_BIT * VNC_SERVER_FB_BYTES,
 server_stride);
 if (vd->guest.format != VNC_SERVER_FB_FORMAT) {
-int width = pixman_image_get_width(vd->server);
-tmpbuf = qemu_pixman_linebuf_create(VNC_SERVER_FB_FORMAT, width);
+int w = pixman_image_get_width(vd->server);
+tmpbuf = qemu_pixman_linebuf_create(VNC_SERVER_FB_FORMAT, w);
 } else {
 int guest_bpp =
 PIXMAN_FORMAT_BPP(pixman_image_get_format(vd->guest.fb));
diff --git a/ui/vnc-enc-zrle.c.inc b/ui/vnc-enc-zrle.c.inc
index a8ca37d05e..2ef7501d52 100644
--- a/ui/vnc-enc-zrle.c.inc
+++ b/ui/vnc-enc-zrle.c.inc
@@ -153,11 +153,12 @@ static void ZRLE_ENCODE_TILE(VncState *vs, ZRLE_PIXEL 
*data, int w, int h,
 }
 
 if (use_rle) {
-ZRLE_PIXEL *ptr = data;
-ZRLE_PIXEL *end = ptr + w * h;
 ZRLE_PIXEL *run_start;
 ZRLE_PIXEL pix;
 
+ptr = data;
+end = ptr + w * h;
+
 while (ptr < end) {
 int len;

[PATCH v2 5/7] block/vdi: Clean up local variable shadowing

2023-09-20 Thread Markus Armbruster

Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.

Signed-off-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
---
 block/vdi.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 6c35309e04..934e1b849b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -634,7 +634,6 @@ vdi_co_pwritev(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 bmap_entry = le32_to_cpu(s->bmap[block_index]);
 if (!VDI_IS_ALLOCATED(bmap_entry)) {
 /* Allocate new block and write to it. */
-uint64_t data_offset;
 qemu_co_rwlock_upgrade(>bmap_lock);
 bmap_entry = le32_to_cpu(s->bmap[block_index]);
 if (VDI_IS_ALLOCATED(bmap_entry)) {
@@ -700,7 +699,7 @@ nonallocating_write:
 /* One or more new blocks were allocated. */
 VdiHeader *header;
 uint8_t *base;
-uint64_t offset;
+uint64_t bmap_offset;
 uint32_t n_sectors;
 
 g_free(block);
@@ -723,11 +722,11 @@ nonallocating_write:
 bmap_first /= (SECTOR_SIZE / sizeof(uint32_t));
 bmap_last /= (SECTOR_SIZE / sizeof(uint32_t));
 n_sectors = bmap_last - bmap_first + 1;
-offset = s->bmap_sector + bmap_first;
+bmap_offset = s->bmap_sector + bmap_first;
 base = ((uint8_t *)>bmap[0]) + bmap_first * SECTOR_SIZE;
 logout("will write %u block map sectors starting from entry %u\n",
n_sectors, bmap_first);
-ret = bdrv_co_pwrite(bs->file, offset * SECTOR_SIZE,
+ret = bdrv_co_pwrite(bs->file, bmap_offset * SECTOR_SIZE,
  n_sectors * SECTOR_SIZE, base, 0);
 }
 
-- 
2.41.0

[PATCH v2 6/7] block: Clean up local variable shadowing

2023-09-20 Thread Markus Armbruster

Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.

Signed-off-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
Acked-by: Anthony PERARD 
Acked-by: Ilya Dryomov 
---
 block.c  |  9 +
 block/rbd.c  |  2 +-
 block/stream.c   |  1 -
 block/vvfat.c| 35 ++-
 hw/block/xen-block.c |  6 +++---
 5 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 8da89aaa62..bb5dd17e9c 100644
--- a/block.c
+++ b/block.c
@@ -3035,18 +3035,19 @@ static BdrvChild 
*bdrv_attach_child_common(BlockDriverState *child_bs,
   _err);
 
 if (ret < 0 && child_class->change_aio_ctx) {
-Transaction *tran = tran_new();
+Transaction *aio_ctx_tran = tran_new();
 GHashTable *visited = g_hash_table_new(NULL, NULL);
 bool ret_child;
 
 g_hash_table_add(visited, new_child);
 ret_child = child_class->change_aio_ctx(new_child, child_ctx,
-visited, tran, NULL);
+visited, aio_ctx_tran,
+NULL);
 if (ret_child == true) {
 error_free(local_err);
 ret = 0;
 }
-tran_finalize(tran, ret_child == true ? 0 : -1);
+tran_finalize(aio_ctx_tran, ret_child == true ? 0 : -1);
 g_hash_table_destroy(visited);
 }
 
@@ -6077,12 +6078,12 @@ void bdrv_iterate_format(void (*it)(void *opaque, const 
char *name),
 QLIST_FOREACH(drv, _drivers, list) {
 if (drv->format_name) {
 bool found = false;
-int i = count;
 
 if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, read_only)) {
 continue;
 }
 
+i = count;
 while (formats && i && !found) {
 found = !strcmp(formats[--i], drv->format_name);
 }
diff --git a/block/rbd.c b/block/rbd.c
index 978671411e..472ca05cba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1290,7 +1290,7 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,
  * operations that exceed the current size.
  */
 if (offset + bytes > s->image_size) {
-int r = qemu_rbd_resize(bs, offset + bytes);
+r = qemu_rbd_resize(bs, offset + bytes);
 if (r < 0) {
 return r;
 }
diff --git a/block/stream.c b/block/stream.c
index e522bbdec5..007253880b 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -282,7 +282,6 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 /* Make sure that the image is opened in read-write mode */
 bs_read_only = bdrv_is_read_only(bs);
 if (bs_read_only) {
-int ret;
 /* Hold the chain during reopen */
 if (bdrv_freeze_backing_chain(bs, above_base, errp) < 0) {
 return;
diff --git a/block/vvfat.c b/block/vvfat.c
index 0ddc91fc09..856b479c91 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -777,7 +777,6 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 while((entry=readdir(dir))) {
 unsigned int length=strlen(dirname)+2+strlen(entry->d_name);
 char* buffer;
-direntry_t* direntry;
 struct stat st;
 int is_dot=!strcmp(entry->d_name,".");
 int is_dotdot=!strcmp(entry->d_name,"..");
@@ -857,7 +856,7 @@ static int read_directory(BDRVVVFATState* s, int 
mapping_index)
 
 /* fill with zeroes up to the end of the cluster */
 while(s->directory.next%(0x10*s->sectors_per_cluster)) {
-direntry_t* direntry=array_get_next(&(s->directory));
+direntry = array_get_next(&(s->directory));
 memset(direntry,0,sizeof(direntry_t));
 }
 
@@ -1962,24 +1961,24 @@ get_cluster_count_for_direntry(BDRVVVFATState* s, 
direntry_t* direntry, const ch
  * This is horribly inefficient, but that is okay, since
  * it is rarely executed, if at all.
  */
-int64_t offset = cluster2sector(s, cluster_num);
+int64_t offs = cluster2sector(s, cluster_num);
 
 vvfat_close_current_file(s);
 for (i = 0; i < s->sectors_per_cluster; i++) {
 int res;
 
 res = bdrv_is_allocated(s->qcow->bs,
-(offset + i) * BDRV_SECTOR_SIZE,
+(offs + i) * BDRV_SECTOR_SIZE,
 BDRV_SECTOR_SIZE, NULL);
 if (res < 0) {
 return -1;
 }

[PATCH v2 0/7] Steps towards enabling -Wshadow=local

2023-09-20 Thread Markus Armbruster

Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Bugs love to hide in such code.
Evidence: PATCH 1.

Enabling -Wshadow would prevent bugs like this one.  But we'd have to
clean up all the offenders first.  We got a lot of them.

Enabling -Wshadow=local should be less work for almost as much gain.
I took a stab at it.  There's a small, exciting part, and a large,
boring part.

The exciting part is dark preprocessor sorcery to let us nest macro
calls without shadowing: PATCH 7.

The boring part is cleaning up all the other warnings.  I did some
[PATCH 2-6], but ran out of steam long before finishing the job.  Some
160 unique warnings remain.

To see them, enable -Wshadow=local like so:

diff --git a/meson.build b/meson.build
index 98e68ef0b1..9fc4c7ac9d 100644
--- a/meson.build
+++ b/meson.build
@@ -466,6 +466,9 @@ warn_flags = [
   '-Wno-tautological-type-limit-compare',
   '-Wno-psabi',
   '-Wno-gnu-variable-sized-type-not-at-end',
+  '-Wshadow=local',
+  '-Wno-error=shadow=local',
+  '-Wno-error=shadow=compatible-local',
 ]
 
 if targetos != 'darwin'

You may want to drop the -Wno-error lines.

v2:
* PATCH 3+6: Mollify checkpatch
* PATCH 4: Redo for clearer code, R-bys dropped [Kevin]
* PATCH 5: Rename tweaked [Kevin]
* PATCH 6: Rename local @tran instead of the parameter [Kevin]
* PATCH 7: Drop PASTE(), use glue() instead [Richard]; pass
  identifiers instead of __COUNTER__ for readability [Eric]; add
  comments

Markus Armbruster (7):
  migration/rdma: Fix save_page method to fail on polling error
  migration: Clean up local variable shadowing
  ui: Clean up local variable shadowing
  block/dirty-bitmap: Clean up local variable shadowing
  block/vdi: Clean up local variable shadowing
  block: Clean up local variable shadowing
  qobject atomics osdep: Make a few macros more hygienic

 include/qapi/qmp/qobject.h  | 11 +--
 include/qemu/atomic.h   | 17 +++-
 include/qemu/compiler.h |  3 +++
 include/qemu/osdep.h| 31 +
 block.c |  9 +
 block/monitor/bitmap-qmp-cmds.c | 19 +-
 block/qcow2-bitmap.c|  3 +--
 block/rbd.c |  2 +-
 block/stream.c  |  1 -
 block/vdi.c |  7 +++
 block/vvfat.c   | 35 +
 hw/block/xen-block.c|  6 +++---
 migration/block.c   |  4 ++--
 migration/ram.c |  8 +++-
 migration/rdma.c| 14 -
 migration/vmstate.c |  2 +-
 ui/gtk.c| 14 ++---
 ui/spice-display.c  |  9 +
 ui/vnc-palette.c|  2 --
 ui/vnc.c| 12 +--
 ui/vnc-enc-zrle.c.inc   |  9 -
 21 files changed, 125 insertions(+), 93 deletions(-)

-- 
2.41.0

[PATCH v2 4/7] block/dirty-bitmap: Clean up local variable shadowing

2023-09-20 Thread Markus Armbruster

Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: rename both the pair of parameters and the pair of local
variables.  While there, move the local variables to function scope.

Suggested-by: Kevin Wolf 
Signed-off-by: Markus Armbruster 
---
 block/monitor/bitmap-qmp-cmds.c | 19 ++-
 block/qcow2-bitmap.c|  3 +--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 55f778f5af..70d01a3776 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -258,37 +258,38 @@ void qmp_block_dirty_bitmap_disable(const char *node, 
const char *name,
 bdrv_disable_dirty_bitmap(bitmap);
 }
 
-BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, const char *target,
+BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *dst_node,
+  const char *dst_bitmap,
   BlockDirtyBitmapOrStrList *bms,
   HBitmap **backup, Error **errp)
 {
 BlockDriverState *bs;
 BdrvDirtyBitmap *dst, *src;
 BlockDirtyBitmapOrStrList *lst;
+const char *src_node, *src_bitmap;
 HBitmap *local_backup = NULL;
 
 GLOBAL_STATE_CODE();
 
-dst = block_dirty_bitmap_lookup(node, target, , errp);
+dst = block_dirty_bitmap_lookup(dst_node, dst_bitmap, , errp);
 if (!dst) {
 return NULL;
 }
 
 for (lst = bms; lst; lst = lst->next) {
 switch (lst->value->type) {
-const char *name, *node;
 case QTYPE_QSTRING:
-name = lst->value->u.local;
-src = bdrv_find_dirty_bitmap(bs, name);
+src_bitmap = lst->value->u.local;
+src = bdrv_find_dirty_bitmap(bs, src_bitmap);
 if (!src) {
-error_setg(errp, "Dirty bitmap '%s' not found", name);
+error_setg(errp, "Dirty bitmap '%s' not found", src_bitmap);
 goto fail;
 }
 break;
 case QTYPE_QDICT:
-node = lst->value->u.external.node;
-name = lst->value->u.external.name;
-src = block_dirty_bitmap_lookup(node, name, NULL, errp);
+src_node = lst->value->u.external.node;
+src_bitmap = lst->value->u.external.name;
+src = block_dirty_bitmap_lookup(src_node, src_bitmap, NULL, errp);
 if (!src) {
 goto fail;
 }
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 037fa2d435..ffd5cd3b23 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -1555,7 +1555,6 @@ bool 
qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
 FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
 const char *name = bdrv_dirty_bitmap_name(bitmap);
 uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
-Qcow2Bitmap *bm;
 
 if (!bdrv_dirty_bitmap_get_persistence(bitmap) ||
 bdrv_dirty_bitmap_inconsistent(bitmap)) {
@@ -1625,7 +1624,7 @@ bool 
qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
 
 /* allocate clusters and store bitmaps */
 QSIMPLEQ_FOREACH(bm, bm_list, entry) {
-BdrvDirtyBitmap *bitmap = bm->dirty_bitmap;
+bitmap = bm->dirty_bitmap;
 
 if (bitmap == NULL || bdrv_dirty_bitmap_readonly(bitmap)) {
 continue;
-- 
2.41.0

[PATCH v2 2/7] migration: Clean up local variable shadowing

2023-09-20 Thread Markus Armbruster

Local variables shadowing other local variables or parameters make the
code needlessly hard to understand.  Tracked down with -Wshadow=local.
Clean up: delete inner declarations when they are actually redundant,
else rename variables.

Signed-off-by: Markus Armbruster 
Reviewed-by: Peter Xu 
---
 migration/block.c   | 4 ++--
 migration/ram.c | 8 +++-
 migration/rdma.c| 8 +---
 migration/vmstate.c | 2 +-
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 86c2256a2b..eb6aafeb9e 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -440,8 +440,8 @@ static int init_blk_migration(QEMUFile *f)
 /* Can only insert new BDSes now because doing so while iterating block
  * devices may end up in a deadlock (iterating the new BDSes, too). */
 for (i = 0; i < num_bs; i++) {
-BlkMigDevState *bmds = bmds_bs[i].bmds;
-BlockDriverState *bs = bmds_bs[i].bs;
+bmds = bmds_bs[i].bmds;
+bs = bmds_bs[i].bs;
 
 if (bmds) {
 ret = blk_insert_bs(bmds->blk, bs, _err);
diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..0c202f8109 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3517,8 +3517,6 @@ int colo_init_ram_cache(void)
 * we use the same name 'ram_bitmap' as for migration.
 */
 if (ram_bytes_total()) {
-RAMBlock *block;
-
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
 block->bmap = bitmap_new(pages);
@@ -3998,12 +3996,12 @@ static int ram_load_precopy(QEMUFile *f)
 }
 }
 if (migrate_ignore_shared()) {
-hwaddr addr = qemu_get_be64(f);
+hwaddr addr2 = qemu_get_be64(f);
 if (migrate_ram_is_ignored(block) &&
-block->mr->addr != addr) {
+block->mr->addr != addr2) {
 error_report("Mismatched GPAs for block %s "
  "%" PRId64 "!= %" PRId64,
- id, (uint64_t)addr,
+ id, (uint64_t)addr2,
  (uint64_t)block->mr->addr);
 ret = -EINVAL;
 }
diff --git a/migration/rdma.c b/migration/rdma.c
index 3915d1d7c9..c78ddfcb74 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1902,9 +1902,11 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, 
RDMAControlHeader *head,
  * by waiting for a READY message.
  */
 if (rdma->control_ready_expected) {
-RDMAControlHeader resp;
-ret = qemu_rdma_exchange_get_response(rdma,
-, RDMA_CONTROL_READY, 
RDMA_WRID_READY);
+RDMAControlHeader resp_ignored;
+
+ret = qemu_rdma_exchange_get_response(rdma, _ignored,
+  RDMA_CONTROL_READY,
+  RDMA_WRID_READY);
 if (ret < 0) {
 return ret;
 }
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 31842c3afb..438ea77cfa 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -97,7 +97,7 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription 
*vmsd,
 return -EINVAL;
 }
 if (vmsd->pre_load) {
-int ret = vmsd->pre_load(opaque);
+ret = vmsd->pre_load(opaque);
 if (ret) {
 return ret;
 }
-- 
2.41.0

Re: [PATCH v2 11/28] bsd-user: Implement getgroups(2) and setgroups(2) system calls.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
> ---
>  bsd-user/bsd-proc.h   | 44 +++
>  bsd-user/freebsd/os-syscall.c |  9 +++
>  2 files changed, 53 insertions(+)
>

Reviewed by: Warner Losh

Re: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Jason Gunthorpe

On Wed, Sep 20, 2023 at 12:17:24PM -0600, Alex Williamson wrote:

> > The iommufd design requires one open of the /dev/iommu to be shared
> > across all the vfios.
> 
> "requires"?  It's certainly of limited value to have multiple iommufd
> instances rather than create multiple address spaces within a single
> iommufd, but what exactly precludes an iommufd per device if QEMU, or
> any other userspace so desired?  Thanks,

>From the kernel side requires is too strong I suppose

Not sure about these qemu patches though?

Jason

Re: [PATCH v2 10/28] bsd-user: Get number of cpus.

2023-09-20 Thread Warner Losh

This one is almost right... one tweak is needed I think...

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Kyle Evans 
>
> Signed-off-by: Kyle Evans 
> Signed-off-by: Karim Taha 
> ---
>  bsd-user/bsd-proc.c | 23 +++
>  bsd-user/bsd-proc.h |  2 ++
>  2 files changed, 25 insertions(+)
>
> diff --git a/bsd-user/bsd-proc.c b/bsd-user/bsd-proc.c
> index 19f6efe1f7..78f5b172d7 100644
> --- a/bsd-user/bsd-proc.c
> +++ b/bsd-user/bsd-proc.c
> @@ -119,3 +119,26 @@ int host_to_target_waitstatus(int status)
>  return status;
>  }
>
> +int bsd_get_ncpu(void)
> +{
> +int ncpu = -1;
> +cpuset_t mask;
> +
> +CPU_ZERO();
> +
> +if (cpuset_getaffinity(CPU_LEVEL_WHICH, CPU_WHICH_TID, -1,
> sizeof(mask),
> +   ) == 0) {
> +ncpu = CPU_COUNT();
> +}
> +#ifdef _SC_NPROCESSORS_ONLN
> +if (ncpu == -1) {
> +ncpu = sysconf(_SC_NPROCESSORS_ONLN);
> +}
> +#endif
>

I think that the #ifdef and #endif lines can be removed. These are defined
on all version of FreeBSD, NetBSD and OpenBSD (I think also DragonFly)
in the unlikely event that it gets bsd-user support.

With that fixed,

Reviewed by: Warner Losh

Re: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Daniel P . Berrangé

On Wed, Sep 20, 2023 at 12:01:42PM -0600, Alex Williamson wrote:
> On Wed, 20 Sep 2023 03:42:20 +
> "Duan, Zhenzhong"  wrote:
> 
> > >-Original Message-
> > >From: Cédric Le Goater 
> > >Sent: Wednesday, September 20, 2023 1:08 AM
> > >Subject: Re: [PATCH v1 15/22] Add iommufd configure option
> > >
> > >On 8/30/23 12:37, Zhenzhong Duan wrote:  
> > >> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> > >> iommufd support, enabled by default.  
> > >
> > >Why would someone want to disable support at compile time ? It might  
> > 
> > For those users who only want to support legacy container feature?
> > Let me know if you still prefer to drop this patch, I'm fine with that.
> > 
> > >have been useful for dev but now QEMU should self-adjust at runtime
> > >depending only on the host capabilities AFAIUI. Am I missing something ?  
> > 
> > IOMMUFD doesn't support all features of legacy container, so QEMU
> > doesn't self-adjust at runtime by checking if host supports IOMMUFD.
> > We need to specify it explicitly to use IOMMUFD as below:
> > 
> > -object iommufd,id=iommufd0
> > -device vfio-pci,host=:02:00.0,iommufd=iommufd0
> 
> There's an important point here that maybe we've let slip for too long.
> Laine had asked in an internal forum whether the switch to IOMMUFD was
> visible to the guest.  I replied that it wasn't, but this note about
> IOMMUFD vs container features jogged my memory that I think we still
> lack p2p support with IOMMUFD, ie. IOMMU mapping of device MMIO.  It
> seemed like there was something else too, but I don't recall without
> some research.
> 
> Ideally we'd have feature parity and libvirt could simply use the
> native IOMMUFD interface whenever both the kernel and QEMU support it.
> 
> Without that parity, when does libvirt decide to use IOMMUFD?
> 
> How would libvirt know if some future IOMMUFD does have parity?
> 
> Does the XML direct this through some new interpretation of the driver
> field? ex. "vfio-container" vs "vfio-iommufd" where "vfio" becomes an
> alias or priority preference.  Thanks,

Right now a host device would have


  
   ...
  

where model could also accept 'vfio-ccw' / 'vfio-ap' on s390x IIUC.

If the use of IOMMUFD has guest ABI feature differences, then we
would need to treat this as a new device model in libvirt, ie add
vfio-iommu-pci model.   Does thos iommufd work with vfio-ccw / vfio-ap
too ? If so we'd need new models for those too in libvirt.

The downside of this is that it means no appication is going to
use iommufd mode without having explicit coding done to make it
aware of the new model in libvirt.

If we /want/ apps to move over to iommufd approach in a finite
short timeframe then IMHO achieving feature parity is critical
as feature partiy would let libvirt switch over to it automatically
and avoid the pain of updating any apps. This would be my preference,
as exposing the iommufd concept to apps feels wrong - this is an
internal impl detail ideally. Again we must have feature parity
for this to work though.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v1 15/22] Add iommufd configure option

2023-09-20 Thread Alex Williamson

On Wed, 20 Sep 2023 14:49:19 -0300
Jason Gunthorpe  wrote:

> On Wed, Sep 20, 2023 at 07:37:53PM +0200, Eric Auger wrote:
> 
> > >> qemu will typically not be able to
> > >> self-open /dev/iommufd as it is root-only.  
> > >
> > > I don't understand, we open multiple fds to KVM devices. This is the
> > > same.  
> > Actually qemu opens the /dev/iommu in case no fd is passed along with
> > the iommufd object. This is done in
> > [PATCH v1 16/22] backends/iommufd: Introduce the iommufd object, in
> > 
> > iommufd_backend_connect(). I don't understand either.  
> 
> The char dev node is root only so this automatic behvaior is fine
> but not useful if qmeu is running in a sandbox.
> 
> I'm not sure what "multiple fds to KVM devices" means, I don't know
> anything about kvm devices..

Looking at a local VM, the only kvm related open file is /dev/kvm,
which kvm_init() does directly open.  The other tun/tap/vhost files are
all passed by fd.  We have a bunch of anon_inodes representing eventfds
and vcpu source from /dev/kvm, but the only other direct files are disk
images and the created pid file.

> The iommufd design requires one open of the /dev/iommu to be shared
> across all the vfios.

"requires"?  It's certainly of limited value to have multiple iommufd
instances rather than create multiple address spaces within a single
iommufd, but what exactly precludes an iommufd per device if QEMU, or
any other userspace so desired?  Thanks,

Alex

Re: [PATCH v2 09/28] bsd-user: Implement host_to_target_waitstatus conversion.

2023-09-20 Thread Warner Losh

On Sun, Sep 17, 2023 at 10:39 PM Karim Taha 
wrote:

> From: Stacey Son 
>
> Signed-off-by: Stacey Son 
> Signed-off-by: Karim Taha 
>
> Reviewed-by: Richard Henderson 
> ---
>  bsd-user/bsd-proc.c | 17 +
>  1 file changed, 17 insertions(+)
>

Reviewed-by: Warner Losh

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo