Re: [PATCH 2/2] vhost: Add Error parameter to vhost_scsi_common_start()

2023-08-16 Thread Li Feng


> 2023年8月14日 下午8:11,Raphael Norwitz  写道:
> 
> Thanks for the cleanup! A few comments.
> 
>> On Aug 4, 2023, at 1:29 AM, Li Feng  wrote:
>> 
>> Add a Error parameter to report the real error, like vhost-user-blk.
>> 
>> Signed-off-by: Li Feng 
>> ---
>> hw/scsi/vhost-scsi-common.c   | 17 ++---
>> hw/scsi/vhost-scsi.c  |  5 +++--
>> hw/scsi/vhost-user-scsi.c | 14 --
>> include/hw/virtio/vhost-scsi-common.h |  2 +-
>> 4 files changed, 22 insertions(+), 16 deletions(-)
>> 
>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
>> index a61cd0e907..392587dfb5 100644
>> --- a/hw/scsi/vhost-scsi-common.c
>> +++ b/hw/scsi/vhost-scsi-common.c
>> @@ -16,6 +16,7 @@
>> */
>> 
>> #include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> #include "qemu/error-report.h"
>> #include "qemu/module.h"
>> #include "hw/virtio/vhost.h"
>> @@ -25,7 +26,7 @@
>> #include "hw/virtio/virtio-access.h"
>> #include "hw/fw-path-provider.h"
>> 
>> -int vhost_scsi_common_start(VHostSCSICommon *vsc)
>> +int vhost_scsi_common_start(VHostSCSICommon *vsc, Error **errp)
>> {
>>int ret, i;
>>VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
>> @@ -35,18 +36,19 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>VirtIOSCSICommon *vs = (VirtIOSCSICommon *)vsc;
>> 
>>if (!k->set_guest_notifiers) {
>> -error_report("binding does not support guest notifiers");
>> +error_setg(errp, "binding does not support guest notifiers");
>>return -ENOSYS;
>>}
>> 
>>ret = vhost_dev_enable_notifiers(&vsc->dev, vdev);
>>if (ret < 0) {
>> +error_setg_errno(errp, -ret, "Error enabling host notifiers");
>>return ret;
>>}
>> 
>>ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
>>if (ret < 0) {
>> -error_report("Error binding guest notifier");
>> +error_setg_errno(errp, -ret, "Error binding guest notifier");
>>goto err_host_notifiers;
>>}
>> 
>> @@ -54,7 +56,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>> 
>>ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>>if (ret < 0) {
>> -error_report("Error setting inflight format: %d", -ret);
> 
> Curious why you’re adding the error value to the string. Isn’t it redundant 
> since we pass it in as the second param?
> 
>> +error_setg_errno(errp, -ret, "Error setting inflight format: %d", 
>> -ret);

I don’t understand. Here I put the error message in errp, where is redundant?

>>goto err_guest_notifiers;
>>}
>> 
>> @@ -64,21 +66,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>vs->conf.virtqueue_size,
>>vsc->inflight);
>>if (ret < 0) {
>> -error_report("Error getting inflight: %d", -ret);
> 
> Ditto
> 
>> +error_setg_errno(errp, -ret, "Error getting inflight: %d",
>> + -ret);
>>goto err_guest_notifiers;
>>}
>>}
>> 
>>ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>>if (ret < 0) {
>> -error_report("Error setting inflight: %d", -ret);
>> +error_setg_errno(errp, -ret, "Error setting inflight: %d", 
>> -ret);
>>goto err_guest_notifiers;
>>}
>>}
>> 
>>ret = vhost_dev_start(&vsc->dev, vdev, true);
>>if (ret < 0) {
>> -error_report("Error start vhost dev");
> 
> “Error starting vhost dev”?
ACK.

> 
>> +error_setg_errno(errp, -ret, "Error start vhost dev");
>>goto err_guest_notifiers;
>>}
>> 
>> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
>> index 443f67daa4..01a3ab4277 100644
>> --- a/hw/scsi/vhost-scsi.c
>> +++ b/hw/scsi/vhost-scsi.c
>> @@ -75,6 +75,7 @@ static int vhost_scsi_start(VHostSCSI *s)
>>int ret, abi_version;
>>VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>>const VhostOps *vhost_ops = vsc->dev.vhost_ops;
>> +Error *local_err = NULL;
>> 
>>ret = vhost_ops->vhost_scsi_get_abi_version(&vsc->dev, &abi_version);
>>if (ret < 0) {
>> @@ -88,14 +89,14 @@ static int vhost_scsi_start(VHostSCSI *s)
>>return -ENOSYS;
>>}
>> 
>> -ret = vhost_scsi_common_start(vsc);
>> +ret = vhost_scsi_common_start(vsc, &local_err);
>>if (ret < 0) {
>>return ret;
>>}
>> 
>>ret = vhost_scsi_set_endpoint(s);
>>if (ret < 0) {
>> -error_report("Error setting vhost-scsi endpoint");
>> +error_reportf_err(local_err, "Error setting vhost-scsi endpoint");
>>vhost_scsi_common_stop(vsc);
>>}
>> 
>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
>> index a7fa8e8df2..d368171e28 100644
>> --- a/hw/scsi/vhost-user-scsi.c
>> +++ b/hw/scsi/vhost-user-scsi.c
>> @@ -43,12 +43,12 @@ enum VhostUserProtocolFeature {
>>VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
>> };
>> 
>> -static int vhost_user_scsi_start(V

Re: [PATCH 1/2] vhost-user: fix lost reconnect

2023-08-16 Thread Li Feng


> 2023年8月14日 下午8:11,Raphael Norwitz  写道:
> 
> Why can’t we rather fix this by adding a “event_cb” param to 
> vhost_user_async_close and then call qemu_chr_fe_set_handlers in 
> vhost_user_async_close_bh()?
> 
> Even if calling vhost_dev_cleanup() twice is safe today I worry future 
> changes may easily stumble over the reconnect case and introduce crashes or 
> double frees.
> 
I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ has 
been called in vhost_user_async_close, and will be called in event->cb, so why 
need add a
new event_cb?
 
For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in struct 
vhost-dev to mark if it’s inited like this:

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..edc80c0231 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 goto fail_busyloop;
 }

+hdev->inited = true;
 return 0;

 fail_busyloop:
@@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
 {
 int i;

+if (!hdev->inited) {
+return;
+}
+hdev->inited = false;
 trace_vhost_dev_cleanup(hdev);

 for (i = 0; i < hdev->nvqs; ++i) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index ca3131b1af..74b1aec960 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -123,6 +123,7 @@ struct vhost_dev {
 /* @started: is the vhost device started? */
 bool started;
 bool log_enabled;
+bool inited;
 uint64_t log_size;
 Error *migration_blocker;
 const VhostOps *vhost_ops;

Thanks.

> 
>> On Aug 4, 2023, at 1:29 AM, Li Feng  wrote:
>> 
>> When the vhost-user is reconnecting to the backend, and if the vhost-user 
>> fails
>> at the get_features in vhost_dev_init(), then the reconnect will fail
>> and it will not be retriggered forever.
>> 
>> The reason is:
>> When the vhost-user fail at get_features, the vhost_dev_cleanup will be 
>> called
>> immediately.
>> 
>> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
>> 
>> The reconnect path is:
>> vhost_user_blk_event
>>  vhost_user_async_close(.. vhost_user_blk_disconnect ..)
>>qemu_chr_fe_set_handlers <- clear the notifier callback
>>  schedule vhost_user_async_close_bh
>> 
>> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
>> called, then the event fd callback will not be reinstalled.
>> 
>> With this patch, the vhost_user_blk_disconnect will call the
>> vhost_dev_cleanup() again, it's safe.
>> 
>> All vhost-user devices have this issue, including vhost-user-blk/scsi.
>> 
>> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
>> 
>> Signed-off-by: Li Feng 
>> ---
>> hw/virtio/vhost-user.c | 10 +-
>> 1 file changed, 1 insertion(+), 9 deletions(-)
>> 
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index 8dcf049d42..697b403fe2 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -2648,16 +2648,8 @@ typedef struct {
>> static void vhost_user_async_close_bh(void *opaque)
>> {
>>VhostAsyncCallback *data = opaque;
>> -struct vhost_dev *vhost = data->vhost;
>> 
>> -/*
>> - * If the vhost_dev has been cleared in the meantime there is
>> - * nothing left to do as some other path has completed the
>> - * cleanup.
>> - */
>> -if (vhost->vdev) {
>> -data->cb(data->dev);
>> -}
>> +data->cb(data->dev);
>> 
>>g_free(data);
>> }
>> -- 
>> 2.41.0
>> 
> 




Re: qemu-system-x86 dependencies

2023-08-16 Thread Paul Menzel

Dear Fourhundred,


Am 17.08.23 um 07:10 schrieb Fourhundred Thecat:

 > On 2023-08-16 15:02, Fourhundred Thecat wrote:

 > On 2023-08-16 14:52, Michael Tokarev wrote:

16.08.2023 15:37, Philippe Mathieu-Daudé пишет:

Cc'ing Michael


why does qemu depend on sound and gstreamer and wayland libraries?
After all, i am just trying to run VMs on my hypervisor.

If I remember correctly, my previous installation on Debian 10,
qemu-system-x86 had no such dependencies.

Seems to me like trying to install openssh-server, but it needs full
gnome environment libraries.


sorry if my question offended people.

Perhaps there is a good reason for these dependencies, which i don't see?

Also, I am told that Arch has split all these into separate packages:

https://archlinux.org/packages/?sort=&repo=Extra&q=qemu&maintainer=&flagged=

So it looks like my original question might be Debian specific?


Please tell us, how you actually build QEMU (in the past). Building QEMU 
from upstream sources and configuring it with minimal options, why do 
the dependencies from the Debian package matter at all?



Kind regards,

Paul



Re: [PATCH] qga: Start qemu-ga service after NetworkManager start

2023-08-16 Thread Konstantin Kostiuk
Hi, Efim

Thanks for your contribution.

I think your patch is a partial solution because other network managers can
be used
for example systemd-networkd or dhcpcd. Maybe a better solution is
After=network.target.

Do you have any other suggestions?

Best Regards,
Konstantin Kostiuk.


On Wed, Aug 16, 2023 at 11:20 PM Efim Shevrin 
wrote:

> From: Fima Shevrin 
>
> When the guest OS starts, qemu-ga sends an event to the host.
> This event allows services on the host to start configuring
> the already running guest OS. When configuring network settings,
> it is possible that an external service will receive a signal
> from qemu-ga about the start of guest OS, while NetworkManager
> may not be running yet. Therefore, network setting may not
> be available. With the current patch, we eliminate the described
> race condition between qemu-ga and NetworkManager for guest OS
> network setting cases.
>
> Signed-off-by: Fima Shevrin 
> ---
>  contrib/systemd/qemu-guest-agent.service | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/contrib/systemd/qemu-guest-agent.service
> b/contrib/systemd/qemu-guest-agent.service
> index 51cd7b37ff..6e2d059356 100644
> --- a/contrib/systemd/qemu-guest-agent.service
> +++ b/contrib/systemd/qemu-guest-agent.service
> @@ -2,6 +2,7 @@
>  Description=QEMU Guest Agent
>  BindTo=dev-virtio\x2dports-org.qemu.guest_agent.0.device
>  After=dev-virtio\x2dports-org.qemu.guest_agent.0.device
> +After=NetworkManager.service
>
>  [Service]
>  ExecStart=-/usr/bin/qemu-ga
> --
> 2.34.1
>
>


Re: QEMU Summit Minutes 2023

2023-08-16 Thread Thomas Huth

On 13/07/2023 15.21, Peter Maydell wrote:

QEMU Summit Minutes 2023


...

Topic 3: Should we split responsibility for managing CoC reports?
=

The QEMU project happily does not have to deal with many Code of
Conduct (CoC) reports, but we could do a better job with managing the
ones we do get.  At the moment CoC reports go to the QEMU Leadership
Committee; Paolo proposed that it would be better to decouple CoC
handling to a separate team: although the CoC itself seems good,
asking the Leadership Committee to deal with the reports has not been
working so well.  The model for this is that Linux also initially had
its tech advisory board be the contact for CoC reports before
switching to a dedicated team for them.

There was general consensus that we should try the separate-team
approach. We plan to ask on the mailing list for volunteers who would
be interested in helping out with this.


So who is going to drive this now? I haven't seen any mail on the mailing 
list with that question yet...


 Thomas





RE: [RFC PATCH v4 22/24] vfio/pci: Adapt vfio pci hot reset support with iommufd BE

2023-08-16 Thread Duan, Zhenzhong
>-Original Message-
>From: Nicolin Chen 
>Sent: Thursday, August 17, 2023 1:26 PM
>Subject: Re: [RFC PATCH v4 22/24] vfio/pci: Adapt vfio pci hot reset support
>with iommufd BE
>
>On Wed, Jul 12, 2023 at 03:25:26PM +0800, Zhenzhong Duan wrote:
>
>> +#ifdef CONFIG_IOMMUFD
>> +static VFIODevice *vfio_pci_iommufd_binded(__u32 devid)
>> +{
>> +VFIOAddressSpace *space;
>> +VFIOContainer *bcontainer;
>> +VFIOIOMMUFDContainer *container;
>> +VFIOIOASHwpt *hwpt;
>> +VFIODevice *vbasedev_iter;
>> +VFIOIOMMUBackendOpsClass *ops =
>VFIO_IOMMU_BACKEND_OPS_CLASS(
>> +
>object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_IOMMUFD_OPS));
>> +
>> + QLIST_FOREACH(space, &vfio_address_spaces, list) {
>
>Indentation here doesn't seem to be aligned with the lines above.
>
>> +QLIST_FOREACH(bcontainer, &space->containers, next) {
>> +if (bcontainer->ops != ops) {
>> +continue;
>> +}
>> +container = container_of(bcontainer, VFIOIOMMUFDContainer,
>> + bcontainer);
>> +QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +QLIST_FOREACH(vbasedev_iter, &hwpt->device_list, next) {
>> +if (devid == vbasedev_iter->devid) {
>> +return vbasedev_iter;
>> +}
>> +}
>> +}
>> +}
>> +}
>> +return NULL;
>> +}
>
>By a quick look, the "binded" sounds a bit odd to me. And this
>function could be vfio_pci_find_by_iommufd_devid()?

Sorry about my poor English, yes, vfio_pci_find_by_iommufd_devid is better, 
I'll use it as function name.

Thanks
Zhenzhong



RE: [RFC PATCH v4 21/24] vfio/as: Add vfio device iterator callback for iommufd

2023-08-16 Thread Duan, Zhenzhong
>-Original Message-
>From: Nicolin Chen 
>Sent: Thursday, August 17, 2023 1:49 PM
>Subject: Re: [RFC PATCH v4 21/24] vfio/as: Add vfio device iterator callback 
>for
>iommufd
>
>On Wed, Jul 12, 2023 at 03:25:25PM +0800, Zhenzhong Duan wrote:
>
>> The way to get vfio device pointer is different between legacy
>> container and iommufd container, with iommufd backend support
>> added, it's time to add the iterator support for iommufd.
>>
>> In order to implement it, a pointer to hwpt is added in vbasedev.
>[...]
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>> index 6434a442fd..d596e802b0 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -133,6 +133,7 @@ typedef struct VFIODevice {
>>  #ifdef CONFIG_IOMMUFD
>>  int devid;
>>  IOMMUFDBackend *iommufd;
>> +VFIOIOASHwpt *hwpt;
>
>I don't feel quite confident about this, since a patch prior just
>added the following function:
>
>+static VFIOIOASHwpt *vfio_find_hwpt_for_dev(VFIOIOMMUFDContainer
>*container,
>+VFIODevice *vbasedev)
>
>This feels a bit of conflict in the same series. Mind elaborating?

Good finding, I'll move " VFIOIOASHwpt *hwpt" to the prior patch,
then vfio_find_hwpt_for_dev() could also use it.

Thanks
Zhenzhong



Re: [RFC PATCH v4 21/24] vfio/as: Add vfio device iterator callback for iommufd

2023-08-16 Thread Nicolin Chen
On Wed, Jul 12, 2023 at 03:25:25PM +0800, Zhenzhong Duan wrote:

> The way to get vfio device pointer is different between legacy
> container and iommufd container, with iommufd backend support
> added, it's time to add the iterator support for iommufd.
> 
> In order to implement it, a pointer to hwpt is added in vbasedev.
[...]
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 6434a442fd..d596e802b0 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -133,6 +133,7 @@ typedef struct VFIODevice {
>  #ifdef CONFIG_IOMMUFD
>  int devid;
>  IOMMUFDBackend *iommufd;
> +VFIOIOASHwpt *hwpt;

I don't feel quite confident about this, since a patch prior just
added the following function:

+static VFIOIOASHwpt *vfio_find_hwpt_for_dev(VFIOIOMMUFDContainer *container,
+VFIODevice *vbasedev)

This feels a bit of conflict in the same series. Mind elaborating?

Thanks
Nicolin



Re: [PATCH v2 4/5] vdpa: move vhost_vdpa_set_vrings_ready to the caller

2023-08-16 Thread Eugenio Perez Martin
On Mon, Aug 14, 2023 at 8:57 AM Jason Wang  wrote:
>
> On Thu, Aug 10, 2023 at 11:36 PM Eugenio Pérez  wrote:
> >
> > Doing that way allows CVQ to be enabled before the dataplane vqs,
> > restoring the state as MQ or MAC addresses properly in the case of a
> > migration.
> >
>
> A typo in the subject, should be vhost_vdpa_set_vring_ready.

I'll fix it in the next version, thanks!

>
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  hw/virtio/vdpa-dev.c   |  3 +++
> >  hw/virtio/vhost-vdpa.c |  3 ---
> >  net/vhost-vdpa.c   | 57 +-
> >  3 files changed, 42 insertions(+), 21 deletions(-)
> >
> > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > index 363b625243..f22d5d5bc0 100644
> > --- a/hw/virtio/vdpa-dev.c
> > +++ b/hw/virtio/vdpa-dev.c
> > @@ -255,6 +255,9 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> > Error **errp)
> >  error_setg_errno(errp, -ret, "Error starting vhost");
> >  goto err_guest_notifiers;
> >  }
> > +for (i = 0; i < s->dev.nvqs; ++i) {
> > +vhost_vdpa_set_vring_ready(&s->vdpa, i);
> > +}
> >  s->started = true;
> >
> >  /*
> > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > index 0d9975b5b5..8ca2e3800c 100644
> > --- a/hw/virtio/vhost-vdpa.c
> > +++ b/hw/virtio/vhost-vdpa.c
> > @@ -1297,9 +1297,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
> > *dev, bool started)
> >  if (unlikely(!ok)) {
> >  return -1;
> >  }
> > -for (int i = 0; i < dev->nvqs; ++i) {
> > -vhost_vdpa_set_vring_ready(v, dev->vq_index + i);
> > -}
> >  } else {
> >  vhost_vdpa_suspend(dev);
> >  vhost_vdpa_svqs_stop(dev);
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 9251351b4b..3bf60f9431 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -371,6 +371,22 @@ static int vhost_vdpa_net_data_start(NetClientState 
> > *nc)
> >  return 0;
> >  }
> >
> > +static int vhost_vdpa_net_data_load(NetClientState *nc)
> > +{
> > +VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +struct vhost_vdpa *v = &s->vhost_vdpa;
> > +bool has_cvq = v->dev->vq_index_end % 2;
> > +
> > +if (has_cvq) {
> > +return 0;
> > +}
> > +
> > +for (int i = 0; i < v->dev->nvqs; ++i) {
> > +vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> > +}
> > +return 0;
> > +}
> > +
> >  static void vhost_vdpa_net_client_stop(NetClientState *nc)
> >  {
> >  VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> > @@ -393,6 +409,7 @@ static NetClientInfo net_vhost_vdpa_info = {
> >  .size = sizeof(VhostVDPAState),
> >  .receive = vhost_vdpa_receive,
> >  .start = vhost_vdpa_net_data_start,
> > +.load = vhost_vdpa_net_data_load,
>
> This deserve an independent patch?
>

Do you mean to add another callback op? What name does it work?

Thanks!

> Thanks
>
> >  .stop = vhost_vdpa_net_client_stop,
> >  .cleanup = vhost_vdpa_cleanup,
> >  .has_vnet_hdr = vhost_vdpa_has_vnet_hdr,
> > @@ -974,26 +991,30 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
> >
> >  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> >
> > -if (!v->shadow_vqs_enabled) {
> > -return 0;
> > -}
> > +vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> >
> > -n = VIRTIO_NET(v->dev->vdev);
> > -r = vhost_vdpa_net_load_mac(s, n);
> > -if (unlikely(r < 0)) {
> > -return r;
> > -}
> > -r = vhost_vdpa_net_load_mq(s, n);
> > -if (unlikely(r)) {
> > -return r;
> > -}
> > -r = vhost_vdpa_net_load_offloads(s, n);
> > -if (unlikely(r)) {
> > -return r;
> > +if (v->shadow_vqs_enabled) {
> > +n = VIRTIO_NET(v->dev->vdev);
> > +r = vhost_vdpa_net_load_mac(s, n);
> > +if (unlikely(r < 0)) {
> > +return r;
> > +}
> > +r = vhost_vdpa_net_load_mq(s, n);
> > +if (unlikely(r)) {
> > +return r;
> > +}
> > +r = vhost_vdpa_net_load_offloads(s, n);
> > +if (unlikely(r)) {
> > +return r;
> > +}
> > +r = vhost_vdpa_net_load_rx(s, n);
> > +if (unlikely(r)) {
> > +return r;
> > +}
> >  }
> > -r = vhost_vdpa_net_load_rx(s, n);
> > -if (unlikely(r)) {
> > -return r;
> > +
> > +for (int i = 0; i < v->dev->vq_index; ++i) {
> > +vhost_vdpa_set_vring_ready(v, i);
> >  }
> >
> >  return 0;
> > --
> > 2.39.3
> >
>




[PATCH v5 01/26] contrib/plugins: Use GRWLock in execlog

2023-08-16 Thread Akihiko Odaki
execlog had the following comment:
> As we could have multiple threads trying to do this we need to
> serialise the expansion under a lock. Threads accessing already
> created entries can continue without issue even if the ptr array
> gets reallocated during resize.

However, when the ptr array gets reallocated, the other threads may have
a stale reference to the old buffer. This results in use-after-free.

Use GRWLock to properly fix this issue.

Fixes: 3d7caf145e ("contrib/plugins: add execlog to log instruction execution 
and memory access")
Signed-off-by: Akihiko Odaki 
Reviewed-by: Alex Bennée 
---
 contrib/plugins/execlog.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
index 7129d526f8..82dc2f584e 100644
--- a/contrib/plugins/execlog.c
+++ b/contrib/plugins/execlog.c
@@ -19,7 +19,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_version = 
QEMU_PLUGIN_VERSION;
 
 /* Store last executed instruction on each vCPU as a GString */
 static GPtrArray *last_exec;
-static GMutex expand_array_lock;
+static GRWLock expand_array_lock;
 
 static GPtrArray *imatches;
 static GArray *amatches;
@@ -28,18 +28,16 @@ static GArray *amatches;
  * Expand last_exec array.
  *
  * As we could have multiple threads trying to do this we need to
- * serialise the expansion under a lock. Threads accessing already
- * created entries can continue without issue even if the ptr array
- * gets reallocated during resize.
+ * serialise the expansion under a lock.
  */
 static void expand_last_exec(int cpu_index)
 {
-g_mutex_lock(&expand_array_lock);
+g_rw_lock_writer_lock(&expand_array_lock);
 while (cpu_index >= last_exec->len) {
 GString *s = g_string_new(NULL);
 g_ptr_array_add(last_exec, s);
 }
-g_mutex_unlock(&expand_array_lock);
+g_rw_lock_writer_unlock(&expand_array_lock);
 }
 
 /**
@@ -51,8 +49,10 @@ static void vcpu_mem(unsigned int cpu_index, 
qemu_plugin_meminfo_t info,
 GString *s;
 
 /* Find vCPU in array */
+g_rw_lock_reader_lock(&expand_array_lock);
 g_assert(cpu_index < last_exec->len);
 s = g_ptr_array_index(last_exec, cpu_index);
+g_rw_lock_reader_unlock(&expand_array_lock);
 
 /* Indicate type of memory access */
 if (qemu_plugin_mem_is_store(info)) {
@@ -80,10 +80,14 @@ static void vcpu_insn_exec(unsigned int cpu_index, void 
*udata)
 GString *s;
 
 /* Find or create vCPU in array */
+g_rw_lock_reader_lock(&expand_array_lock);
 if (cpu_index >= last_exec->len) {
+g_rw_lock_reader_unlock(&expand_array_lock);
 expand_last_exec(cpu_index);
+g_rw_lock_reader_lock(&expand_array_lock);
 }
 s = g_ptr_array_index(last_exec, cpu_index);
+g_rw_lock_reader_unlock(&expand_array_lock);
 
 /* Print previous instruction in cache */
 if (s->len) {
-- 
2.41.0




[PATCH v5 03/26] gdbstub: Add num_regs member to GDBFeature

2023-08-16 Thread Akihiko Odaki
Currently the number of registers exposed to GDB is written as magic
numbers in code. Derive the number of registers GDB actually see from
XML files to replace the magic numbers in code later.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
---
 include/exec/gdbstub.h  |  1 +
 scripts/feature_to_c.py | 46 +++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 3f08093321..9b484d7eef 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -13,6 +13,7 @@
 typedef struct GDBFeature {
 const char *xmlname;
 const char *xml;
+int num_regs;
 } GDBFeature;
 
 
diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
index bcbcb83beb..e04d6b2df7 100755
--- a/scripts/feature_to_c.py
+++ b/scripts/feature_to_c.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: GPL-2.0-or-later
 
-import os, sys
+import os, sys, xml.etree.ElementTree
 
 def writeliteral(indent, bytes):
 sys.stdout.write(' ' * indent)
@@ -39,10 +39,52 @@ def writeliteral(indent, bytes):
 with open(input, 'rb') as file:
 read = file.read()
 
+parser = xml.etree.ElementTree.XMLPullParser(['start', 'end'])
+parser.feed(read)
+events = parser.read_events()
+event, element = next(events)
+if event != 'start':
+sys.stderr.write(f'unexpected event: {event}\n')
+exit(1)
+if element.tag != 'feature':
+sys.stderr.write(f'unexpected start tag: {element.tag}\n')
+exit(1)
+
+regnum = 0
+regnums = []
+tags = ['feature']
+for event, element in events:
+if event == 'end':
+if element.tag != tags[len(tags) - 1]:
+sys.stderr.write(f'unexpected end tag: {element.tag}\n')
+exit(1)
+
+tags.pop()
+if element.tag == 'feature':
+break
+elif event == 'start':
+if len(tags) < 2 and element.tag == 'reg':
+if 'regnum' in element.attrib:
+regnum = int(element.attrib['regnum'])
+
+regnums.append(regnum)
+regnum += 1
+
+tags.append(element.tag)
+else:
+raise Exception(f'unexpected event: {event}\n')
+
+if len(tags):
+sys.stderr.write('unterminated feature tag\n')
+exit(1)
+
+base_reg = min(regnums)
+num_regs = max(regnums) - base_reg + 1 if len(regnums) else 0
+
 sys.stdout.write('{\n')
 writeliteral(8, bytes(os.path.basename(input), 'utf-8'))
 sys.stdout.write(',\n')
 writeliteral(8, read)
-sys.stdout.write('\n},\n')
+sys.stdout.write(f',\n{num_regs},\n}},\n')
 
 sys.stdout.write('{ NULL }\n};\n')
-- 
2.41.0




[PATCH v5 02/26] gdbstub: Introduce GDBFeature structure

2023-08-16 Thread Akihiko Odaki
Before this change, the information from a XML file was stored in an
array that is not descriptive. Introduce a dedicated structure type to
make it easier to understand and to extend with more fields.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 MAINTAINERS |  2 +-
 meson.build |  2 +-
 include/exec/gdbstub.h  |  9 --
 gdbstub/gdbstub.c   |  4 +--
 stubs/gdbstub.c |  6 ++--
 scripts/feature_to_c.py | 48 
 scripts/feature_to_c.sh | 69 -
 7 files changed, 62 insertions(+), 78 deletions(-)
 create mode 100755 scripts/feature_to_c.py
 delete mode 100644 scripts/feature_to_c.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 12e59b6b27..514ac74101 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2826,7 +2826,7 @@ F: include/exec/gdbstub.h
 F: include/gdbstub/*
 F: gdb-xml/
 F: tests/tcg/multiarch/gdbstub/
-F: scripts/feature_to_c.sh
+F: scripts/feature_to_c.py
 F: scripts/probe-gdb-support.py
 
 Memory API
diff --git a/meson.build b/meson.build
index 98e68ef0b1..5c633f7e01 100644
--- a/meson.build
+++ b/meson.build
@@ -3683,7 +3683,7 @@ common_all = static_library('common',
 dependencies: common_all.dependencies(),
 name_suffix: 'fa')
 
-feature_to_c = find_program('scripts/feature_to_c.sh')
+feature_to_c = find_program('scripts/feature_to_c.py')
 
 if targetos == 'darwin'
   entitlement = find_program('scripts/entitlement.sh')
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 7d743fe1e9..3f08093321 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -10,6 +10,11 @@
 #define GDB_WATCHPOINT_READ  3
 #define GDB_WATCHPOINT_ACCESS4
 
+typedef struct GDBFeature {
+const char *xmlname;
+const char *xml;
+} GDBFeature;
+
 
 /* Get or set a register.  Returns the size of the register.  */
 typedef int (*gdb_get_reg_cb)(CPUArchState *env, GByteArray *buf, int reg);
@@ -38,7 +43,7 @@ void gdb_set_stop_cpu(CPUState *cpu);
  */
 extern bool gdb_has_xml;
 
-/* in gdbstub-xml.c, generated by scripts/feature_to_c.sh */
-extern const char *const xml_builtin[][2];
+/* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
+extern const GDBFeature gdb_static_features[];
 
 #endif
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 6911b73c07..2772f07bbe 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -407,11 +407,11 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 }
 }
 for (i = 0; ; i++) {
-name = xml_builtin[i][0];
+name = gdb_static_features[i].xmlname;
 if (!name || (strncmp(name, p, len) == 0 && strlen(name) == len))
 break;
 }
-return name ? xml_builtin[i][1] : NULL;
+return name ? gdb_static_features[i].xml : NULL;
 }
 
 static int gdb_read_register(CPUState *cpu, GByteArray *buf, int reg)
diff --git a/stubs/gdbstub.c b/stubs/gdbstub.c
index 2b7aee50d3..580e20702b 100644
--- a/stubs/gdbstub.c
+++ b/stubs/gdbstub.c
@@ -1,6 +1,6 @@
 #include "qemu/osdep.h"
-#include "exec/gdbstub.h"   /* xml_builtin */
+#include "exec/gdbstub.h"   /* gdb_static_features */
 
-const char *const xml_builtin[][2] = {
-  { NULL, NULL }
+const GDBFeature gdb_static_features[] = {
+  { NULL }
 };
diff --git a/scripts/feature_to_c.py b/scripts/feature_to_c.py
new file mode 100755
index 00..bcbcb83beb
--- /dev/null
+++ b/scripts/feature_to_c.py
@@ -0,0 +1,48 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os, sys
+
+def writeliteral(indent, bytes):
+sys.stdout.write(' ' * indent)
+sys.stdout.write('"')
+quoted = True
+
+for c in bytes:
+if not quoted:
+sys.stdout.write('\n')
+sys.stdout.write(' ' * indent)
+sys.stdout.write('"')
+quoted = True
+
+if c == b'"'[0]:
+sys.stdout.write('\\"')
+elif c == b'\\'[0]:
+sys.stdout.write('')
+elif c == b'\n'[0]:
+sys.stdout.write('\\n"')
+quoted = False
+elif c >= 32 and c < 127:
+sys.stdout.write(c.to_bytes(1, 'big').decode())
+else:
+sys.stdout.write(f'\{c:03o}')
+
+if quoted:
+sys.stdout.write('"')
+
+sys.stdout.write('#include "qemu/osdep.h"\n' \
+ '#include "exec/gdbstub.h"\n' \
+ '\n'
+ 'const GDBFeature gdb_static_features[] = {\n')
+
+for input in sys.argv[1:]:
+with open(input, 'rb') as file:
+read = file.read()
+
+sys.stdout.write('{\n')
+writeliteral(8, bytes(os.path.basename(input), 'utf-8'))
+sys.stdout.write(',\n')
+writeliteral(8, read)
+sys.stdout.write('\n},\n')
+
+sys.stdout.write('{ NULL }\n};\n')
diff --git a/scripts/feature_to_c.sh b/scripts/feature_to_c.sh
d

[PATCH v5 00/26] plugins: Allow to read registers

2023-08-16 Thread Akihiko Odaki
I and other people in the University of Tokyo, where I research processor
design, found TCG plugins are very useful for processor design exploration.

The feature we find missing is the capability to read registers from
plugins. In this series, I propose to add such a capability by reusing
gdbstub code.

The reuse of gdbstub code ensures the long-term stability of the TCG plugin
interface for register access without incurring a burden to maintain yet
another interface for register access.

This process to add TCG plugin involves four major changes. The first one
is to add GDBFeature structure that represents a GDB feature, which usually
includes registers. GDBFeature can be generated from static XML files or
dynamically generated by architecture-specific code. In fact, this is a
refactoring independent of the feature this series adds, and potentially
it's benefitial even without the plugin feature. The plugin feature will
utilize this new structure to describe registers exposed to plugins.

The second one is to make gdb_read_register/gdb_write_register usable
outside of gdbstub context.

The third one is to actually make registers readable for plugins.

The last one is to allow to implement a QEMU plugin in C++. A plugin that
I'll describe later is written in C++.

The below is a summary of patches:
Patch 01 fixes a bug in execlog plugin.
Patch [02, 16] introduce GDBFeature.
Patch 17 adds information useful for plugins to GDBFeature.
Patch [18, 21] make registers readable outside of gdbstub context.
Patch [22, 24] add the feature to read registers from plugins.
Patch [25, 26] make it possible to write plugins in C++.

The execlog plugin will have new options to demonstrate the new feature.
I also have a plugin that uses this new feature to generate execution
traces for Sniper processor simulator, which is available at:
https://github.com/shioya-lab/sniper/tree/akihikodaki/bb

V4 -> V5:
  Corrected g_rw_lock_writer_lock() call. (Richard Henderson)
  Replaced abort() with g_assert_not_reached(). (Richard Henderson)
  Fixed CSR name leak in target/riscv. (Richard Henderson)
  Removed gdb_has_xml variable.

V3 -> V4:
  Added execlog changes I forgot to include in the last version.

V2 -> V3:
  Added patch "hw/core/cpu: Return static value with gdb_arch_name()".
  Added patch "gdbstub: Dynamically allocate target.xml buffer".
  (Alex Bennée)
  Added patch "gdbstub: Introduce GDBFeatureBuilder". (Alex Bennée)
  Dropped Reviewed-by tags for "target/*: Use GDBFeature for dynamic XML".
  Changed gdb_find_static_feature() to abort on failure. (Alex Bennée)
  Changed the execlog plugin to log the register value only when changed.
  (Alex Bennée)
  Dropped 0x prefixes for register value logs for conciseness.

V1 -> V2:
  Added SPDX-License-Identifier: GPL-2.0-or-later. (Philippe Mathieu-Daudé)
  Split long lines. (Philippe Mathieu-Daudé)
  Renamed gdb_features to gdb_static_features (Philippe Mathieu-Daudé)
  Dropped RFC.

Akihiko Odaki (26):
  contrib/plugins: Use GRWLock in execlog
  gdbstub: Introduce GDBFeature structure
  gdbstub: Add num_regs member to GDBFeature
  gdbstub: Introduce gdb_find_static_feature()
  target/arm: Move the reference to arm-core.xml
  hw/core/cpu: Replace gdb_core_xml_file with gdb_core_feature
  gdbstub: Introduce GDBFeatureBuilder
  target/arm: Use GDBFeature for dynamic XML
  target/ppc: Use GDBFeature for dynamic XML
  target/riscv: Use GDBFeature for dynamic XML
  gdbstub: Use GDBFeature for gdb_register_coprocessor
  gdbstub: Use GDBFeature for GDBRegisterState
  hw/core/cpu: Return static value with gdb_arch_name()
  gdbstub: Dynamically allocate target.xml buffer
  gdbstub: Simplify XML lookup
  hw/core/cpu: Remove gdb_get_dynamic_xml member
  gdbstub: Add members to identify registers to GDBFeature
  target/arm: Remove references to gdb_has_xml
  target/ppc: Remove references to gdb_has_xml
  gdbstub: Remove gdb_has_xml variable
  gdbstub: Expose functions to read registers
  cpu: Call plugin hooks only when ready
  plugins: Allow to read registers
  contrib/plugins: Allow to log registers
  plugins: Support C++
  contrib/plugins: Add cc plugin

 MAINTAINERS  |   2 +-
 docs/devel/tcg-plugins.rst   |  18 +++-
 configure|  15 ++-
 meson.build  |   2 +-
 gdbstub/internals.h  |   2 +-
 include/exec/gdbstub.h   |  51 +++--
 include/hw/core/cpu.h|  11 +-
 include/qemu/qemu-plugin.h   |  69 +++-
 target/arm/cpu.h |  26 ++---
 target/arm/internals.h   |   2 +-
 target/ppc/cpu-qom.h |   3 +-
 target/ppc/cpu.h |   3 +-
 target/ppc/internal.h|   2 +-
 target/riscv/cpu.h   |   4 +-
 target/s390x/cpu.h   |   2 -
 contrib/plugins/execlog.c| 150 --
 cpu.c|  11 --
 gdbstub/gdbstub.c| 198 +++---
 gdbstub/softmmu.c|   3 +-
 gdbstub/user.c   

Re: [RFC PATCH v4 22/24] vfio/pci: Adapt vfio pci hot reset support with iommufd BE

2023-08-16 Thread Nicolin Chen
On Wed, Jul 12, 2023 at 03:25:26PM +0800, Zhenzhong Duan wrote:
 
> +#ifdef CONFIG_IOMMUFD
> +static VFIODevice *vfio_pci_iommufd_binded(__u32 devid)
> +{
> +VFIOAddressSpace *space;
> +VFIOContainer *bcontainer;
> +VFIOIOMMUFDContainer *container;
> +VFIOIOASHwpt *hwpt;
> +VFIODevice *vbasedev_iter;
> +VFIOIOMMUBackendOpsClass *ops = VFIO_IOMMU_BACKEND_OPS_CLASS(
> +object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_IOMMUFD_OPS));
> +
> + QLIST_FOREACH(space, &vfio_address_spaces, list) {

Indentation here doesn't seem to be aligned with the lines above. 

> +QLIST_FOREACH(bcontainer, &space->containers, next) {
> +if (bcontainer->ops != ops) {
> +continue;
> +}
> +container = container_of(bcontainer, VFIOIOMMUFDContainer,
> + bcontainer);
> +QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +QLIST_FOREACH(vbasedev_iter, &hwpt->device_list, next) {
> +if (devid == vbasedev_iter->devid) {
> +return vbasedev_iter;
> +}
> +}
> +}
> +}
> +}
> +return NULL;
> +}

By a quick look, the "binded" sounds a bit odd to me. And this
function could be vfio_pci_find_by_iommufd_devid()?

Thanks
Nicolin



Re: [PATCH v7 9/9] docs/system: add basic virtio-gpu documentation

2023-08-16 Thread Akihiko Odaki

On 2023/08/17 11:23, Gurchetan Singh wrote:

From: Gurchetan Singh 

This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v2: - Incorporated suggestions by Akihiko Odaki
 - Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

v5: - Removed pci suffix from examples
 - Verified that -device virtio-gpu-rutabaga works.  Strangely
   enough, I don't remember changing anything, and I remember
   it not working.  I did rebase to top of tree though.
 - Fixed meson examples in crosvm docs

  docs/system/device-emulation.rst   |   1 +
  docs/system/devices/virtio-gpu.rst | 113 +
  2 files changed, 114 insertions(+)
  create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
 devices/nvme.rst
 devices/usb.rst
 devices/vhost-user.rst
+   devices/virtio-gpu.rst
 devices/virtio-pmem.rst
 devices/vhost-user-rng.rst
 devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..8c5c708272
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,113 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage `rutabaga_gfx`_ to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests. For Android 

Re: qemu-system-x86 dependencies

2023-08-16 Thread Fourhundred Thecat

> On 2023-08-16 15:02, Fourhundred Thecat wrote:

 > On 2023-08-16 14:52, Michael Tokarev wrote:

16.08.2023 15:37, Philippe Mathieu-Daudé пишет:

Cc'ing Michael


why does qemu depend on sound and gstreamer and wayland libraries?
After all, i am just trying to run VMs on my hypervisor.

If I remember correctly, my previous installation on Debian 10,
qemu-system-x86 had no such dependencies.

Seems to me like trying to install openssh-server, but it needs full
gnome environment libraries.


sorry if my question offended people.

Perhaps there is a good reason for these dependencies, which i don't see?

Also, I am told that Arch has split all these into separate packages:

https://archlinux.org/packages/?sort=&repo=Extra&q=qemu&maintainer=&flagged=

So it looks like my original question might be Debian specific?



Re: [PATCH 01/10] hw/arm/virt-acpi-build.c: Move fw_cfg and virtio to common location

2023-08-16 Thread Sunil V L
On Wed, Aug 16, 2023 at 03:51:58PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 7/26/23 05:25, Igor Mammedov wrote:
> > On Tue, 25 Jul 2023 22:20:36 +0530
> > Sunil V L  wrote:
> > 
> > > On Mon, Jul 24, 2023 at 05:18:59PM +0200, Igor Mammedov wrote:
> > > > On Wed, 12 Jul 2023 22:09:34 +0530
> > > > Sunil V L  wrote:
> > > > > The functions which add fw_cfg and virtio to DSDT are same for ARM
> > > > > and RISC-V. So, instead of duplicating in RISC-V, move them from
> > > > > hw/arm/virt-acpi-build.c to common aml-build.c.
> > > > > 
> > > > > Signed-off-by: Sunil V L 
> > > > > ---
> > > > >   hw/acpi/aml-build.c | 41 
> > > > > 
> > > > >   hw/arm/virt-acpi-build.c| 42 
> > > > > -
> > > > >   hw/riscv/virt-acpi-build.c  | 16 --
> > > > >   include/hw/acpi/aml-build.h |  6 ++
> > > > >   4 files changed, 47 insertions(+), 58 deletions(-)
> > > > > 
> > > > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > > > 
> > > > patch looks fine modulo,
> > > > I'd put these into respective device files instead of generic
> > > > aml-build.c which was intended for basic AML primitives
> > > > (it 's got polluted over time with device specific functions
> > > > but that's not the reason to continue doing that).
> > > > 
> > > > Also having those functions along with devices models
> > > > goes along with self enumerating ACPI devices (currently
> > > > it works for x86 PCI/ISA device but there is no reason
> > > > that it can't work with other types as well when
> > > > I get there)
> > > Thanks!, Igor. Let me add them to device specific files as per your
> > > recommendation.
> > just be careful and build test other targets (while disabling the rest)
> > at least no to regress them due to build deps. (I'd pick 2 with ACPI
> > support that use and not uses affected code) and 1 that  uses device
> > model but doesn't use ACPI at all (if such exists)
> 
> Sunil is already aware of it but I'll also mention here since it seems 
> relevant
> to Igor's point.
> 
Thanks! Daniel. Yes, I am aware of the issue and will fix it along with
Igor's suggestion. I need to fix this irrespective of the approach.

Thanks,
Sunil
> 
> This patch breaks i386-softmmu build:
> 
> 
> FAILED: libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o
> cc -m64 -mcx16 -Ilibqemu-i386-softmmu.fa.p -I. -I.. -Itarget/i386 
> -I../target/i386 -Iqapi -Itrace -Iui -Iui/shader -I/usr/include/pixman-1 
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
> -I/usr/include/sysprof-4 -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror 
> -std=gnu11 -O2 -g -fstack-protector-strong -U_FORTIFY_SOURCE 
> -D_FORTIFY_SOURCE=2 -Wundef -Wwrite-strings -Wmissing-prototypes 
> -Wstrict-prototypes -Wredundant-decls -Wold-style-declaration 
> -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k 
> -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
> -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wmissing-format-attribute 
> -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi -isystem 
> /home/danielhb/work/qemu/linux-headers -isystem linux-headers -iquote . 
> -iquote /home/danielhb/work/qemu -iquote /home/danielhb/work/qemu/include 
> -iquote /home/danielhb/work/qemu/host/include/x86_64 -iquote 
> /home/danielhb/work/qemu/host/include/generic -iquote 
> /home/danielhb/work/qemu/tcg/i386 -pthread -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common 
> -fwrapv -fPIE -isystem../linux-headers -isystemlinux-headers -DNEED_CPU_H 
> '-DCONFIG_TARGET="i386-softmmu-config-target.h"' 
> '-DCONFIG_DEVICES="i386-softmmu-config-devices.h"' -MD -MQ 
> libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o -MF 
> libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o.d -o 
> libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o -c 
> ../hw/i386/acpi-microvm.c
> ../hw/i386/acpi-microvm.c:48:13: error: conflicting types for 
> ‘acpi_dsdt_add_virtio’; have ‘void(Aml *, MicrovmMachineState *)’
>48 | static void acpi_dsdt_add_virtio(Aml *scope,
>   | ^~~~
> In file included from 
> /home/danielhb/work/qemu/include/hw/acpi/acpi_aml_interface.h:5,
>  from ../hw/i386/acpi-microvm.c:29:
> /home/danielhb/work/qemu/include/hw/acpi/aml-build.h:503:6: note: previous 
> declaration of ‘acpi_dsdt_add_virtio’ with type ‘void(Aml *, const 
> MemMapEntry *, uint32_t,  int)’ {aka ‘void(Aml *, const MemMapEntry *, 
> unsigned int,  int)’}
>   503 | void acpi_dsdt_add_virtio(Aml *scope, const MemMapEntry 
> *virtio_mmio_memmap,
>   |  ^~~~
> [5/714] Compiling C object libqemu-i386-softmmu.fa.p/hw_i386_kvm_clock.c.o
> 
> This happens because the common 'acpi_dsdt_add_virtio' function matches a 
> local
> function with the same name in hw/i386/acpi-microvm.c. We would need to either
> rename the shared helper or rename the local acpi-

Re: [PATCH v2 02/19] ppc/vof: Fix missed fields in VOF cleanup

2023-08-16 Thread Alexey Kardashevskiy




On 08/08/2023 14:19, Nicholas Piggin wrote:

Failing to reset the of_instance_last makes ihandle allocation continue
to increase, which causes record-replay replay fail to match the
recorded trace.

Not resetting claimed_base makes VOF eventually run out of memory after
some resets.

Cc: Alexey Kardashevskiy 
Fixes: fc8c745d501 ("spapr: Implement Open Firmware client interface")
Signed-off-by: Nicholas Piggin 



Reviewed-by: Alexey Kardashevskiy 

Cool to see it still in use :)



---
  hw/ppc/vof.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c
index 18c3f92317..e3b430a81f 100644
--- a/hw/ppc/vof.c
+++ b/hw/ppc/vof.c
@@ -1024,6 +1024,8 @@ void vof_cleanup(Vof *vof)
  }
  vof->claimed = NULL;
  vof->of_instances = NULL;
+vof->of_instance_last = 0;
+vof->claimed_base = 0;
  }
  
  void vof_build_dt(void *fdt, Vof *vof)


--
Alexey



[PATCH v7 5/9] gfxstream + rutabaga prep: added need defintions, fields, and options

2023-08-16 Thread Gurchetan Singh
From: Gurchetan Singh 

This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: void *rutabaga --> struct rutabaga *rutabaga (Akihiko)
have a separate rutabaga device instead of using GL device (Bernard)

v2: VirtioGpuRutabaga --> VirtIOGPURutabaga (Akihiko)
move MemoryRegionInfo into VirtIOGPURutabaga (Akihiko)
remove 'ctx' field (Akihiko)
remove 'rutabaga_active'

v6: remove command from commit message, refer to docs instead (Manos)

 include/hw/virtio/virtio-gpu.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 55973e112f..e2a07e68d9 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -38,6 +38,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL)
 #define TYPE_VHOST_USER_GPU "vhost-user-gpu"
 OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU)
 
+#define TYPE_VIRTIO_GPU_RUTABAGA "virtio-gpu-rutabaga-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabaga, VIRTIO_GPU_RUTABAGA)
+
 struct virtio_gpu_simple_resource {
 uint32_t resource_id;
 uint32_t width;
@@ -94,6 +97,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
 VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
+VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -108,6 +112,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_rutabaga_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
 #define virtio_gpu_hostmem_enabled(_cfg) \
 (_cfg.hostmem > 0)
 
@@ -232,6 +238,28 @@ struct VhostUserGPU {
 bool backend_blocked;
 };
 
+#define MAX_SLOTS 4096
+
+struct MemoryRegionInfo {
+int used;
+MemoryRegion mr;
+uint32_t resource_id;
+};
+
+struct rutabaga;
+
+struct VirtIOGPURutabaga {
+struct VirtIOGPU parent_obj;
+
+struct MemoryRegionInfo memory_regions[MAX_SLOTS];
+char *capset_names;
+char *wayland_socket_path;
+char *wsi;
+bool headless;
+uint32_t num_capsets;
+struct rutabaga *rutabaga;
+};
+
 #define VIRTIO_GPU_FILL_CMD(out) do {   \
 size_t s;   \
 s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v7 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-16 Thread Gurchetan Singh
From: Gurchetan Singh 

This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
  - Very simple initial display integration: just used Pixman
  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
calls

Next steps for Android VMs:
  - The next step would be improving display integration and UI interfaces
with the goal of the QEMU upstream graphics being in an emulator
release [d].

Next steps for Linux VMs for display virtualization:
  - For widespread distribution, someone needs to package Sommelier or the
wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
which allows disabling KMS hypercalls.  If anyone cares enough, it'll
probably be possible to build a custom VM variant that uses this display
virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
- Used error_report(..)
- Used g_autofree to fix leaks on error paths
- Removed unnecessary casts
- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files

v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau and
Bernard Berschow:
- Parenthesis in CHECK macro
- CHECK_RESULT(result, ..) --> CHECK(!result, ..)
- delay until g->parent_obj.enable = 1
- Additional cast fixes
- initialize directly in virtio_gpu_rutabaga_realize(..)
- add debug callback to hook into QEMU error's APIs

v3: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
- Autodetect Wayland socket when not explicitly specified
- Fix map_blob error paths
- Add comment why we need both `res` and `resource` in create blob
- Cast and whitespace fixes
- Big endian check comes before virtio_gpu_rutabaga_init().
- VirtIOVGARUTABAGA --> VirtIOVGARutabaga

v4: Incorporated feedback from Akihiko Oda

[PATCH v7 1/9] virtio: Add shared memory capability

2023-08-16 Thread Gurchetan Singh
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Reviewed-by: Gurchetan Singh 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Acked-by: Huang Rui 
Tested-by: Huang Rui 
Reviewed-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index edbc0daa18..da8c9ea12d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v7 9/9] docs/system: add basic virtio-gpu documentation

2023-08-16 Thread Gurchetan Singh
From: Gurchetan Singh 

This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v2: - Incorporated suggestions by Akihiko Odaki
- Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

v5: - Removed pci suffix from examples
- Verified that -device virtio-gpu-rutabaga works.  Strangely
  enough, I don't remember changing anything, and I remember
  it not working.  I did rebase to top of tree though.
- Fixed meson examples in crosvm docs

 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/virtio-gpu.rst | 113 +
 2 files changed, 114 insertions(+)
 create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
devices/nvme.rst
devices/usb.rst
devices/vhost-user.rst
+   devices/virtio-gpu.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..8c5c708272
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,113 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage `rutabaga_gfx`_ to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests. For Android guests, ``gfxstream-gles`` is also supported.
+
+The device w

[PATCH v7 7/9] gfxstream + rutabaga: meson support

2023-08-16 Thread Gurchetan Singh
From: Gurchetan Singh 

- Add meson detection of rutabaga_gfx
- Build virtio-gpu-rutabaga.c + associated vga/pci files when
  present

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v3: Fix alignment issues (Akihiko)

 hw/display/meson.build| 22 ++
 meson.build   |  7 +++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 34 insertions(+)

diff --git a/hw/display/meson.build b/hw/display/meson.build
index 413ba4ab24..e362d625dd 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -79,6 +79,13 @@ if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
  if_true: [files('virtio-gpu-gl.c', 
'virtio-gpu-virgl.c'), pixman, virgl])
 hw_display_modules += {'virtio-gpu-gl': virtio_gpu_gl_ss}
   endif
+
+  if rutabaga.found()
+virtio_gpu_rutabaga_ss = ss.source_set()
+virtio_gpu_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', rutabaga],
+   if_true: [files('virtio-gpu-rutabaga.c'), 
pixman])
+hw_display_modules += {'virtio-gpu-rutabaga': virtio_gpu_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
@@ -95,6 +102,12 @@ if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
  if_true: [files('virtio-gpu-pci-gl.c'), pixman])
 hw_display_modules += {'virtio-gpu-pci-gl': virtio_gpu_pci_gl_ss}
   endif
+  if rutabaga.found()
+virtio_gpu_pci_rutabaga_ss = ss.source_set()
+virtio_gpu_pci_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', 
'CONFIG_VIRTIO_PCI', rutabaga],
+   if_true: 
[files('virtio-gpu-pci-rutabaga.c'), pixman])
+hw_display_modules += {'virtio-gpu-pci-rutabaga': 
virtio_gpu_pci_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
@@ -113,6 +126,15 @@ if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
   virtio_vga_gl_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'),
 if_false: files('acpi-vga-stub.c'))
   hw_display_modules += {'virtio-vga-gl': virtio_vga_gl_ss}
+
+  if rutabaga.found()
+virtio_vga_rutabaga_ss = ss.source_set()
+virtio_vga_rutabaga_ss.add(when: ['CONFIG_VIRTIO_VGA', rutabaga],
+   if_true: [files('virtio-vga-rutabaga.c'), 
pixman])
+virtio_vga_rutabaga_ss.add(when: 'CONFIG_ACPI', if_true: 
files('acpi-vga.c'),
+if_false: 
files('acpi-vga-stub.c'))
+hw_display_modules += {'virtio-vga-rutabaga': virtio_vga_rutabaga_ss}
+  endif
 endif
 
 system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_lcdc.c'))
diff --git a/meson.build b/meson.build
index 98e68ef0b1..293f388e53 100644
--- a/meson.build
+++ b/meson.build
@@ -1069,6 +1069,12 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
dependencies: virgl))
   endif
 endif
+rutabaga = not_found
+if not get_option('rutabaga_gfx').auto() or have_system or have_vhost_user_gpu
+  rutabaga = dependency('rutabaga_gfx_ffi',
+ method: 'pkg-config',
+ required: get_option('rutabaga_gfx'))
+endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
   blkio = dependency('blkio',
@@ -4272,6 +4278,7 @@ summary_info += {'libtasn1':  tasn1}
 summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 summary_info += {'virgl support': virgl}
+summary_info += {'rutabaga support':  rutabaga}
 summary_info += {'blkio support': blkio}
 summary_info += {'curl support':  curl}
 summary_info += {'Multipath support': mpathpersist}
diff --git a/meson_options.txt b/meson_options.txt
index aaea5ddd77..dea3bf7d9c 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -224,6 +224,8 @@ option('vmnet', type : 'feature', value : 'auto',
description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
+option('rutabaga_gfx', type : 'feature', value : 'auto',
+   description: 'rutabaga_gfx support')
 option('png', type : 'feature', value : 'auto',
description: 'PNG support with libpng')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9da3fe299b..9a95b4f782 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -154,6 +154,7 @@ meson_options_help() {
   printf "%s\n" '  rbd Ceph block device driver'
   printf "%s\n" '  rdmaEnable RDMA-based migration'
   printf "%s\n" '  replication replication support'
+  printf "%s\n" '  rutabaga-gfxrutabaga_gfx support'
   printf "%s\n" '  sdl SDL user interface'
   printf

[PATCH v7 3/9] virtio-gpu: hostmem

2023-08-16 Thread Gurchetan Singh
From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Tested-by: Alyssa Ross 
Acked-by: Michael S. Tsirkin 
---
 hw/display/virtio-gpu-pci.c| 14 ++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..da6a99f038 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(&vpci_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index bbd5c6561a..48ef0d9fad 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1509,6 +1509,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index e6fb0aa876..c8552ff760 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(&vpci_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, &vga->vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 8377c365ef..de4f624e94 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -137,6 +140,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v7 8/9] gfxstream + rutabaga: enable rutabaga

2023-08-16 Thread Gurchetan Singh
From: Gurchetan Singh 

This change enables rutabaga to receive virtio-gpu-3d hypercalls
when it is active.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v3: Whitespace fix (Akihiko)

 hw/display/virtio-gpu-base.c | 3 ++-
 hw/display/virtio-gpu.c  | 5 +++--
 softmmu/qdev-monitor.c   | 3 +++
 softmmu/vl.c | 1 +
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..50c5373b65 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 {
 VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev);
 
-if (virtio_gpu_virgl_enabled(g->conf)) {
+if (virtio_gpu_virgl_enabled(g->conf) ||
+virtio_gpu_rutabaga_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_VIRGL);
 }
 if (virtio_gpu_edid_enabled(g->conf)) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3e658f1fef..08e170e029 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1361,8 +1361,9 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
-if (!virtio_gpu_have_udmabuf()) {
-error_setg(errp, "cannot enable blob resources without udmabuf");
+if (!virtio_gpu_have_udmabuf() &&
+!virtio_gpu_rutabaga_enabled(g->parent_obj.conf)) {
+error_setg(errp, "need udmabuf or rutabaga for blob resources");
 return;
 }
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..1b8005ae55 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -86,6 +86,9 @@ static const QDevAlias qdev_alias_table[] = {
 { "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-gpu-gl-device", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-gpu-gl-pci", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_PCI },
+{ "virtio-gpu-rutabaga-device", "virtio-gpu-rutabaga",
+  QEMU_ARCH_VIRTIO_MMIO },
+{ "virtio-gpu-rutabaga-pci", "virtio-gpu-rutabaga", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_VIRTIO_CCW },
 { "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_VIRTIO_PCI },
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b0b96f67fa..2f98eefdf3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -216,6 +216,7 @@ static struct {
 { .driver = "ati-vga",  .flag = &default_vga   },
 { .driver = "vhost-user-vga",   .flag = &default_vga   },
 { .driver = "virtio-vga-gl",.flag = &default_vga   },
+{ .driver = "virtio-vga-rutabaga",  .flag = &default_vga   },
 };
 
 static QemuOptsList qemu_rtc_opts = {
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v7 2/9] virtio-gpu: CONTEXT_INIT feature

2023-08-16 Thread Gurchetan Singh
From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index ca1fb7b16f..4f2b0ba1f3 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8377c365ef 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v7 0/9] gfxstream + rutabaga_gfx

2023-08-16 Thread Gurchetan Singh
Prior versions:

v6:
https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02520.html

v5:
https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02339.html

v4:
https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg01566.html

v3:
https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg00565.html

v2:
https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg05801.html

v1:
https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg02341.html

RFC:
https://patchew.org/QEMU/20230421011223.718-1-gurchetansi...@chromium.org/

Changes since v6:
- Incorporated review feedback

How to build both rutabaga and gfxstream guest/host libs:

https://crosvm.dev/book/appendix/rutabaga_gfx.html

Branch containing this patch series:

https://gitlab.freedesktop.org/gurchetansingh/qemu-gfxstream/-/commits/qemu-gfxstream-v7

Antonio Caggiano (2):
  virtio-gpu: CONTEXT_INIT feature
  virtio-gpu: blob prep

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem

Gurchetan Singh (5):
  gfxstream + rutabaga prep: added need defintions, fields, and options
  gfxstream + rutabaga: add initial support for gfxstream
  gfxstream + rutabaga: meson support
  gfxstream + rutabaga: enable rutabaga
  docs/system: add basic virtio-gpu documentation

 docs/system/device-emulation.rst |1 +
 docs/system/devices/virtio-gpu.rst   |  113 +++
 hw/display/meson.build   |   22 +
 hw/display/virtio-gpu-base.c |6 +-
 hw/display/virtio-gpu-pci-rutabaga.c |   48 ++
 hw/display/virtio-gpu-pci.c  |   14 +
 hw/display/virtio-gpu-rutabaga.c | 1115 ++
 hw/display/virtio-gpu.c  |   16 +-
 hw/display/virtio-vga-rutabaga.c |   51 ++
 hw/display/virtio-vga.c  |   33 +-
 hw/virtio/virtio-pci.c   |   18 +
 include/hw/virtio/virtio-gpu-bswap.h |   18 +
 include/hw/virtio/virtio-gpu.h   |   41 +
 include/hw/virtio/virtio-pci.h   |4 +
 meson.build  |7 +
 meson_options.txt|2 +
 scripts/meson-buildoptions.sh|3 +
 softmmu/qdev-monitor.c   |3 +
 softmmu/vl.c |1 +
 19 files changed, 1497 insertions(+), 19 deletions(-)
 create mode 100644 docs/system/devices/virtio-gpu.rst
 create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
 create mode 100644 hw/display/virtio-gpu-rutabaga.c
 create mode 100644 hw/display/virtio-vga-rutabaga.c

-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v7 4/9] virtio-gpu: blob prep

2023-08-16 Thread Gurchetan Singh
From: Antonio Caggiano 

This adds preparatory functions needed to:

 - decode blob cmds
 - tracking iovecs

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
 hw/display/virtio-gpu.c  | 10 +++---
 include/hw/virtio/virtio-gpu-bswap.h | 18 ++
 include/hw/virtio/virtio-gpu.h   |  5 +
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 48ef0d9fad..3e658f1fef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -33,15 +33,11 @@
 
 #define VIRTIO_GPU_VM_VERSION 1
 
-static struct virtio_gpu_simple_resource*
-virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 static struct virtio_gpu_simple_resource *
 virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id,
bool require_backing,
const char *caller, uint32_t *error);
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res);
 static void virtio_gpu_reset_bh(void *opaque);
 
 void virtio_gpu_update_cursor_data(VirtIOGPU *g,
@@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct 
virtio_gpu_update_cursor *cursor)
   cursor->resource_id ? 1 : 0);
 }
 
-static struct virtio_gpu_simple_resource *
+struct virtio_gpu_simple_resource *
 virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
 {
 struct virtio_gpu_simple_resource *res;
@@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 g_free(iov);
 }
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res)
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res)
 {
 virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
 res->iov = NULL;
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index 9124108485..dd1975e2d4 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -63,10 +63,28 @@ virtio_gpu_create_blob_bswap(struct 
virtio_gpu_resource_create_blob *cblob)
 {
 virtio_gpu_ctrl_hdr_bswap(&cblob->hdr);
 le32_to_cpus(&cblob->resource_id);
+le32_to_cpus(&cblob->blob_mem);
 le32_to_cpus(&cblob->blob_flags);
+le32_to_cpus(&cblob->nr_entries);
+le64_to_cpus(&cblob->blob_id);
 le64_to_cpus(&cblob->size);
 }
 
+static inline void
+virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob)
+{
+virtio_gpu_ctrl_hdr_bswap(&mblob->hdr);
+le32_to_cpus(&mblob->resource_id);
+le64_to_cpus(&mblob->offset);
+}
+
+static inline void
+virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob)
+{
+virtio_gpu_ctrl_hdr_bswap(&ublob->hdr);
+le32_to_cpus(&ublob->resource_id);
+}
+
 static inline void
 virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb)
 {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index de4f624e94..55973e112f 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
 void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout,
struct virtio_gpu_resp_edid *edid);
 /* virtio-gpu.c */
+struct virtio_gpu_simple_resource *
+virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
+
 void virtio_gpu_ctrl_response(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd,
   struct virtio_gpu_ctrl_hdr *resp,
@@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.42.0.rc1.204.g551eb34607-goog




Re: [PATCH] migrate/ram: let ram_save_target_page_legacy() return if qemu file got error

2023-08-16 Thread Guoyi Tu




On 2023/8/16 23:15, 【外部账号】 Fabiano Rosas wrote:

Peter Xu  writes:


On Tue, Aug 15, 2023 at 07:42:24PM -0300, Fabiano Rosas wrote:

Yep, I see that. I meant explicitly move the code into the loop. Feels a
bit weird to check the QEMUFile for errors first thing inside the
function when nothing around it should have touched the QEMUFile.


Valid point.  This reminded me that now we have one indirection into
->ram_save_target_page() which is a hook now.  Putting in the caller will
work for all hooks, even though they're not yet exist.

But since we don't have any other hooks yet, it'll be the same for now.

Acked-by: Peter Xu 

For the long term: there's one more reason to rework qemu_put_byte()/... to
return error codes.. Then things like save_normal_page() can simply already
return negatives when hit an error.

Fabiano - I see that you've done quite a few patches in reworking migration
code.  I had that for a long time in my todo, but if you're interested feel
free to look into it.

IIUC the idea is introducing another similar layer of API for qemufile (I'd
call it qemu_put_1|2|4|8(), or anything you can come up better with..) then
let migration to switch over to it, with retval reflecting errors.  Then we
should be able to drop this patch along with most of the explicit error
checks for the qemufile spread all over.


I was just ranting about this situation in another thread! Yes, we need
something like that. QEMUFile errors should only be set by code doing
actual IO and if we want to store the error for other parts of the code
to use, that should be another interface.

While reviewing this patch I noticed we have stuff like this:

pages = ram_find_and_save_block()
...
if (pages < 0) {
 qemu_file_set_error(f, pages);
 break;
}

So the low-level code sets the error, ram_save_target_page_legacy() sees
it and returns -1, and this^ code loses all track of the initial error
and inadvertently turns it into -EPERM!

I'll try to find some time to start cleaning this up


It sounds very reasonable. the return value of the QEMUFile interface
cannot accurately reflect the actual situation, and the way these
interfaces are being called during the migration process also is a
little bit weird.

I'm glad to see that you have plans to improve these interfaces. If you
need any assistance, I'd be more than happy to be involved.



Re: [PATCH v3 2/2] target/i386: Avoid overflow of the cache parameter enumerated by leaf 4

2023-08-16 Thread Xiaoyao Li

On 8/16/2023 4:06 PM, Qian Wen wrote:

According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of
addressable IDs for processor cores in the physical package. If we
launch over 64 cores VM, the 6-bit field will overflow, and the wrong
core_id number will be reported.

Since the HW reports 0x3f when the intel processor has over 64 cores,
limit the max value written to EBX[31:26] to 63, so max num_cores should
be 64.

Signed-off-by: Qian Wen 
Reviewed-by: Zhao Liu 


Reviewed-by: Xiaoyao Li 


---
  target/i386/cpu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 5c008b9d7e..3b6854300a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -248,7 +248,7 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
  *eax = CACHE_TYPE(cache->type) |
 CACHE_LEVEL(cache->level) |
 (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
-   ((num_cores - 1) << 26) |
+   ((MIN(num_cores, 64) - 1) << 26) |
 ((num_apic_ids - 1) << 14);
  
  assert(cache->line_size > 0);





Re: [PATCH v3 1/2] target/i386: Avoid cpu number overflow in legacy topology

2023-08-16 Thread Xiaoyao Li

On 8/16/2023 4:06 PM, Qian Wen wrote:

The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM
Vol2:

Bits 23-16: Maximum number of addressable IDs for logical processors in
this physical package.

When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is
apic_id, 2) bits [23:16] get truncated.

Specifically, if launching the VM with -smp 256, the value written to
EBX[23:16] is 0 because of data overflow. If the guest only supports
legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f
or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs,
the return of the kernel invoking cpu_smt_allowed() is false and APs
(application processors) will fail to bring up. Then only CPU 0 is online,
and others are offline.

For example, launch VM via:
qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \
 -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \
 -drive file=guest.img,if=none,id=virtio-disk0,format=raw \
 -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic

The guest shows:
 CPU(s):   256
 On-line CPU(s) list:  0
 Off-line CPU(s) list: 1-255

To avoid this issue caused by overflow, limit the max value written to
EBX[23:16] to 255 as the HW does.

Signed-off-by: Qian Wen 
Reviewed-by: Zhao Liu 


Reviewed-by: Xiaoyao Li 


---
  target/i386/cpu.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 97ad229d8b..5c008b9d7e 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6008,6 +6008,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  uint32_t die_offset;
  uint32_t limit;
  uint32_t signature[3];
+uint32_t threads_per_socket;
  X86CPUTopoInfo topo_info;
  
  topo_info.dies_per_pkg = env->nr_dies;

@@ -6049,8 +6050,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
  *ecx |= CPUID_EXT_OSXSAVE;
  }
  *edx = env->features[FEAT_1_EDX];
-if (cs->nr_cores * cs->nr_threads > 1) {
-*ebx |= (cs->nr_cores * cs->nr_threads) << 16;
+threads_per_socket = cs->nr_cores * cs->nr_threads;
+if (threads_per_socket > 1) {
+*ebx |= MIN(threads_per_socket, 255) << 16;
  *edx |= CPUID_HT;
  }
  if (!cpu->enable_pmu) {





Re: [PATCH v4 05/18] linux-user: Use ImageSource in load_symbols

2023-08-16 Thread Philippe Mathieu-Daudé

On 16/8/23 20:03, Richard Henderson wrote:

Aside from the section headers, we're unlikely to hit the
ImageSource cache on guest executables.  But the interface
for imgsrc_read_* is better.

Signed-off-by: Richard Henderson 
---
  linux-user/elfload.c | 87 
  1 file changed, 48 insertions(+), 39 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 01/18] linux-user: Introduce imgsrc_read, imgsrc_read_alloc

2023-08-16 Thread Philippe Mathieu-Daudé

On 16/8/23 20:03, Richard Henderson wrote:

Introduced and initialized, but not yet really used.
These will tidy the current tests vs BPRM_BUF_SIZE.

Signed-off-by: Richard Henderson 
---
  linux-user/loader.h| 61 +++-
  linux-user/linuxload.c | 90 ++
  2 files changed, 142 insertions(+), 9 deletions(-)




+/**
+ * imgsrc_read: Read from ImageSource
+ * @dst: destination for read
+ * @offset: offset within file for read
+ * @len: size of the read
+ * @img: ImageSource to read from
+ * @errp: Error details.
+ *
+ * Read into @dst, using the cache when possible.
+ */
+bool imgsrc_read(void *dst, off_t offset, size_t len,
+ const ImageSource *img, Error **errp);
+
+/**
+ * imgsrc_read_alloc: Read from ImageSource
+ * @offset: offset within file for read
+ * @size: size of the read
+ * @img: ImageSource to read from
+ * @errp: Error details.
+ *
+ * Read into newly allocated memory, using the cache when possible.
+ */
+void *imgsrc_read_alloc(off_t offset, size_t len,
+const ImageSource *img, Error **errp);
+
+/**
+ * imgsrc_mmap: Map from ImageSource
+ *
+ * If @src has a file descriptor, pass on to target_mmap.  Otherwise,
+ * this is "mapping" from a host buffer, which resolves to memcpy.
+ * Therefore, flags must be MAP_PRIVATE | MAP_FIXED; the argument is
+ * retained for clarity.
+ */
+abi_long imgsrc_mmap(abi_ulong start, abi_ulong len, int prot,
+ int flags, const ImageSource *src, abi_ulong offset);


Nitpicking, having imgsrc_mmap() in another patch would ease review
(in case you ever respin). Otherwise:

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH for-8.1] vfio/display: Fix missing update to set backing fields

2023-08-16 Thread Philippe Mathieu-Daudé

On 16/8/23 23:55, Alex Williamson wrote:

The below referenced commit renames scanout_width/height to
backing_width/height, but also promotes these fields in various portions
of the egl interface.  Meanwhile vfio dmabuf support has never used the
previous scanout fields and is therefore missed in the update.  This
results in a black screen when transitioning from ramfb to dmabuf display
when using Intel vGPU with these features.


Referenced commit isn't trivial. Maybe because it is too late here.
I'd have tried to split it. Anyhow, too late (again).

Is vhost-user-gpu also affected? (see VHOST_USER_GPU_DMABUF_SCANOUT
in vhost_user_gpu_handle_display()).


Link: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02726.html
Fixes: 9ac06df8b684 ("virtio-gpu-udmabuf: correct naming of QemuDmaBuf size 
properties")
Signed-off-by: Alex Williamson 
---

This fixes a regression in dmabuf/EGL support for Intel GVT-g and
potentially the mbochs mdev driver as well.  Once validated by those
that understand dmabuf/EGL integration, I'd welcome QEMU maintainers to
take this directly for v8.1 or queue it as soon as possible for v8.1.1.

  hw/vfio/display.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index bec864f482f4..837d9e6a309e 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -243,6 +243,8 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice 
*vdev,
  dmabuf->dmabuf_id  = plane.dmabuf_id;
  dmabuf->buf.width  = plane.width;
  dmabuf->buf.height = plane.height;
+dmabuf->buf.backing_width = plane.width;
+dmabuf->buf.backing_height = plane.height;
  dmabuf->buf.stride = plane.stride;
  dmabuf->buf.fourcc = plane.drm_format;
  dmabuf->buf.modifier = plane.drm_format_mod;





Re: [PATCH 0/6] linux-user: Rewrite open_self_maps

2023-08-16 Thread Helge Deller

Hi Richard,

On 8/16/23 20:14, Richard Henderson wrote:

Based-on: 20230816180338.572576-1-richard.hender...@linaro.org
("[PATCH v4 00/18] linux-user: Implement VDSOs")

As promised, a rewrite of /proc/self/{maps,smaps} emulation
using interval trees.

Incorporate Helge's change to mark [heap], and also mark [vdso].


Series looks good, so you may add

Tested-by: Helge Deller 

to this series and the previous one (linux-user: Implement VDSOs).


The only thing I noticed is, that mips64el doesn't seem to have heap?

mips64el-chroot
Linux p100 6.4.10-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 11 12:20:29 
UTC 2023 mips64 GNU/Linux
6000-7000 ---p  00:00 0
7000-55d57000 rwxp  00:00 0  [stack]
55d57000-55d84000 r-xp  fd:00 806056 
/usr/lib/mips64el-linux-gnuabi64/ld.so.1
55d84000-55d96000 ---p  00:00 0
55d96000-55d97000 r--p 0002f000 fd:00 806056 
/usr/lib/mips64el-linux-gnuabi64/ld.so.1
55d97000-55d99000 rw-p 0003 fd:00 806056 
/usr/lib/mips64el-linux-gnuabi64/ld.so.1
55d99000-55d9a000 r-xp  00:00 0
55d9a000-55d9c000 rw-p  00:00 0
55da-55f8a000 r-xp  fd:00 806059 
/usr/lib/mips64el-linux-gnuabi64/libc.so.6
55f8a000-55f9a000 ---p 001ea000 fd:00 806059 
/usr/lib/mips64el-linux-gnuabi64/libc.so.6
55f9a000-55fa r--p 001ea000 fd:00 806059 
/usr/lib/mips64el-linux-gnuabi64/libc.so.6
55fa-55fa5000 rw-p 001f fd:00 806059 
/usr/lib/mips64el-linux-gnuabi64/libc.so.6
55fa5000-55fb2000 rw-p  00:00 0
55fbe000-560c rw-p  00:00 0
7f9bc9987000-7f9bc9992000 r-xp  fd:00 811277 
/usr/bin/cat
7f9bc9992000-7f9bc99a6000 ---p  00:00 0
7f9bc99a6000-7f9bc99a7000 r--p f000 fd:00 811277 
/usr/bin/cat
7f9bc99a7000-7f9bc99a8000 rw-p 0001 fd:00 811277 
/usr/bin/cat

Helge



Re: [PATCH 4/4] tcg: Map code_gen_buffer with PROT_BTI

2023-08-16 Thread Philippe Mathieu-Daudé

(Cc'ing Joelle)

On 16/8/23 16:25, Richard Henderson wrote:

For linux aarch64 host supporting BTI, map the buffer
to require BTI instructions at branch landing pads.

Signed-off-by: Richard Henderson 
---
  tcg/region.c | 39 ---
  1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/tcg/region.c b/tcg/region.c
index 2b28ed3556..58cb68c6c8 100644
--- a/tcg/region.c
+++ b/tcg/region.c
@@ -33,8 +33,19 @@
  #include "tcg/tcg.h"
  #include "exec/translation-block.h"
  #include "tcg-internal.h"
+#include "host/cpuinfo.h"
  
  
+/*

+ * Local source-level compatibility with Unix.
+ * Used by tcg_region_init below.
+ */
+#if defined(_WIN32)
+#define PROT_READ   1
+#define PROT_WRITE  2
+#define PROT_EXEC   4
+#endif
+
  struct tcg_region_tree {
  QemuMutex lock;
  QTree *tree;
@@ -83,6 +94,16 @@ bool in_code_gen_buffer(const void *p)
  return (size_t)(p - region.start_aligned) <= region.total_size;
  }
  
+static int host_prot_read_exec(void)

+{
+#if defined(CONFIG_LINUX) && defined(HOST_AARCH64) && defined(PROT_BTI)
+if (cpuinfo & CPUINFO_BTI) {
+return PROT_READ | PROT_EXEC | PROT_BTI;
+}
+#endif
+return PROT_READ | PROT_EXEC;
+}
+
  #ifdef CONFIG_DEBUG_TCG
  const void *tcg_splitwx_to_rx(void *rw)
  {
@@ -505,14 +526,6 @@ static int alloc_code_gen_buffer(size_t tb_size, int 
splitwx, Error **errp)
  return PROT_READ | PROT_WRITE;
  }
  #elif defined(_WIN32)
-/*
- * Local source-level compatibility with Unix.
- * Used by tcg_region_init below.
- */
-#define PROT_READ   1
-#define PROT_WRITE  2
-#define PROT_EXEC   4
-
  static int alloc_code_gen_buffer(size_t size, int splitwx, Error **errp)
  {
  void *buf;
@@ -567,7 +580,7 @@ static int alloc_code_gen_buffer_splitwx_memfd(size_t size, 
Error **errp)
  goto fail;
  }
  
-buf_rx = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0);

+buf_rx = mmap(NULL, size, host_prot_read_exec(), MAP_SHARED, fd, 0);
  if (buf_rx == MAP_FAILED) {
  goto fail_rx;
  }
@@ -642,7 +655,7 @@ static int alloc_code_gen_buffer_splitwx_vmremap(size_t 
size, Error **errp)
  return -1;
  }
  
-if (mprotect((void *)buf_rx, size, PROT_READ | PROT_EXEC) != 0) {

+if (mprotect((void *)buf_rx, size, host_prot_read_exec()) != 0) {
  error_setg_errno(errp, errno, "mprotect for jit splitwx");
  munmap((void *)buf_rx, size);
  munmap((void *)buf_rw, size);
@@ -805,7 +818,7 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
  need_prot = PROT_READ | PROT_WRITE;
  #ifndef CONFIG_TCG_INTERPRETER
  if (tcg_splitwx_diff == 0) {
-need_prot |= PROT_EXEC;
+need_prot |= host_prot_read_exec();
  }
  #endif
  for (size_t i = 0, n = region.n; i < n; i++) {
@@ -820,7 +833,11 @@ void tcg_region_init(size_t tb_size, int splitwx, unsigned 
max_cpus)
  } else if (need_prot == (PROT_READ | PROT_WRITE)) {
  rc = qemu_mprotect_rw(start, end - start);
  } else {
+#ifdef CONFIG_POSIX
+rc = mprotect(start, end - start, need_prot);


Hmm this bypass the qemu_real_host_page_mask() checks in
qemu_mprotect__osdep(), but I guess this is acceptable.

Reviewed-by: Philippe Mathieu-Daudé 


+#else
  g_assert_not_reached();
+#endif
  }
  if (rc) {
  error_setg_errno(&error_fatal, errno,





[PATCH for-8.1] vfio/display: Fix missing update to set backing fields

2023-08-16 Thread Alex Williamson
The below referenced commit renames scanout_width/height to
backing_width/height, but also promotes these fields in various portions
of the egl interface.  Meanwhile vfio dmabuf support has never used the
previous scanout fields and is therefore missed in the update.  This
results in a black screen when transitioning from ramfb to dmabuf display
when using Intel vGPU with these features.

Link: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg02726.html
Fixes: 9ac06df8b684 ("virtio-gpu-udmabuf: correct naming of QemuDmaBuf size 
properties")
Signed-off-by: Alex Williamson 
---

This fixes a regression in dmabuf/EGL support for Intel GVT-g and
potentially the mbochs mdev driver as well.  Once validated by those
that understand dmabuf/EGL integration, I'd welcome QEMU maintainers to
take this directly for v8.1 or queue it as soon as possible for v8.1.1.

 hw/vfio/display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index bec864f482f4..837d9e6a309e 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -243,6 +243,8 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice 
*vdev,
 dmabuf->dmabuf_id  = plane.dmabuf_id;
 dmabuf->buf.width  = plane.width;
 dmabuf->buf.height = plane.height;
+dmabuf->buf.backing_width = plane.width;
+dmabuf->buf.backing_height = plane.height;
 dmabuf->buf.stride = plane.stride;
 dmabuf->buf.fourcc = plane.drm_format;
 dmabuf->buf.modifier = plane.drm_format_mod;
-- 
2.40.1




Re: [PATCH v4 8/8] migration: Add a wrapper to cleanup migration files

2023-08-16 Thread Peter Xu
On Wed, Aug 16, 2023 at 06:20:58PM -0300, Fabiano Rosas wrote:
> > One more thing to mention is, now I kind of agree probably we should
> > register yank over each qemufile, as you raised the concern in the other
> > thread that otherwise qmp_yank() won't set error for the qemufile, which
> > seems to be unexpected.
> 
> I haven't made up my mind yet, but I think I'd rather stop setting that
> error instead of doing it from other places. A shutdown() is mostly a
> benign operation intended to end the connection. The fact that we use it
> in some cases to kick the thread out of a possible hang doesn't seem
> compelling enough to set -EIO.
> 
> Of course we currently have no other way to indicate that the file was
> shutdown, so the -EIO will have to stay and that's a discussion for
> another day.

Yes, if we can avoid setting -EIO at all when shutdown that'll also be
good, maybe making more sense.  Thanks,

-- 
Peter Xu




Re: [PATCH 1/4] tcg: Add tcg_out_tb_start backend hook

2023-08-16 Thread Philippe Mathieu-Daudé

On 16/8/23 16:25, Richard Henderson wrote:

This hook may emit code at the beginning of the TB.

Suggested-by: Jordan Niethe 
Signed-off-by: Richard Henderson 
---
  tcg/tcg.c| 3 +++
  tcg/aarch64/tcg-target.c.inc | 5 +
  tcg/arm/tcg-target.c.inc | 5 +
  tcg/i386/tcg-target.c.inc| 5 +
  tcg/loongarch64/tcg-target.c.inc | 5 +
  tcg/mips/tcg-target.c.inc| 5 +
  tcg/ppc/tcg-target.c.inc | 5 +
  tcg/riscv/tcg-target.c.inc   | 5 +
  tcg/s390x/tcg-target.c.inc   | 5 +
  tcg/sparc64/tcg-target.c.inc | 5 +
  tcg/tci/tcg-target.c.inc | 5 +
  11 files changed, 53 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 2/4] util/cpuinfo-aarch64: Add CPUINFO_BTI

2023-08-16 Thread Philippe Mathieu-Daudé

On 16/8/23 16:25, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  host/include/aarch64/host/cpuinfo.h | 1 +
  util/cpuinfo-aarch64.c  | 4 
  2 files changed, 5 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [8.1 regression] Re: [PULL 05/19] virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties

2023-08-16 Thread Alex Williamson
On Wed, 16 Aug 2023 15:08:10 -0600
Alex Williamson  wrote:
> > diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
> > index 8f9fbf583e..3d19dbe382 100644
> > --- a/ui/egl-helpers.c
> > +++ b/ui/egl-helpers.c
> > @@ -314,9 +314,9 @@ void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf)
> >  }
> >  
> >  attrs[i++] = EGL_WIDTH;
> > -attrs[i++] = dmabuf->width;
> > +attrs[i++] = dmabuf->backing_width;
> >  attrs[i++] = EGL_HEIGHT;
> > -attrs[i++] = dmabuf->height;
> > +attrs[i++] = dmabuf->backing_height;
> >  attrs[i++] = EGL_LINUX_DRM_FOURCC_EXT;
> >  attrs[i++] = dmabuf->fourcc;
> >  
> > diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
> > index 42db1bb6cf..eee821d73a 100644
> > --- a/ui/gtk-egl.c
> > +++ b/ui/gtk-egl.c
> > @@ -262,9 +262,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
> >  }
> >  
> >  gd_egl_scanout_texture(dcl, dmabuf->texture,
> > -   dmabuf->y0_top, dmabuf->width, dmabuf->height,
> > -   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
> > -   dmabuf->scanout_height, NULL);
> > +   dmabuf->y0_top,
> > +   dmabuf->backing_width, dmabuf->backing_height,
> > +   dmabuf->x, dmabuf->y, dmabuf->width,
> > +   dmabuf->height, NULL);
> >  
> >  if (dmabuf->allow_fences) {
> >  vc->gfx.guest_fb.dmabuf = dmabuf;
> > @@ -284,7 +285,8 @@ void gd_egl_cursor_dmabuf(DisplayChangeListener *dcl,
> >  if (!dmabuf->texture) {
> >  return;
> >  }
> > -egl_fb_setup_for_tex(&vc->gfx.cursor_fb, dmabuf->width, 
> > dmabuf->height,
> > +egl_fb_setup_for_tex(&vc->gfx.cursor_fb,
> > + dmabuf->backing_width, dmabuf->backing_height,
> >   dmabuf->texture, false);
> >  } else {
> >  egl_fb_destroy(&vc->gfx.cursor_fb);
> > diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
> > index a9a7fdf50c..4513d3d059 100644
> > --- a/ui/gtk-gl-area.c
> > +++ b/ui/gtk-gl-area.c
> > @@ -301,9 +301,10 @@ void gd_gl_area_scanout_dmabuf(DisplayChangeListener 
> > *dcl,
> >  }
> >  
> >  gd_gl_area_scanout_texture(dcl, dmabuf->texture,
> > -   dmabuf->y0_top, dmabuf->width, 
> > dmabuf->height,
> > -   dmabuf->x, dmabuf->y, dmabuf->scanout_width,
> > -   dmabuf->scanout_height, NULL);
> > +   dmabuf->y0_top,
> > +   dmabuf->backing_width, 
> > dmabuf->backing_height,
> > +   dmabuf->x, dmabuf->y, dmabuf->width,
> > +   dmabuf->height, NULL);
> >  
> >  if (dmabuf->allow_fences) {
> >  vc->gfx.guest_fb.dmabuf = dmabuf;  
> 

I suspect the issues is in these last few chunks where width and height
are replaced with backing_width and backing height, but
hw/vfio/display.c never sets backing_*.  It appears that the following
resolves the issue:

diff --git a/hw/vfio/display.c b/hw/vfio/display.c
index bec864f482f4..837d9e6a309e 100644
--- a/hw/vfio/display.c
+++ b/hw/vfio/display.c
@@ -243,6 +243,8 @@ static VFIODMABuf *vfio_display_get_dmabuf(VFIOPCIDevice 
*vdev,
 dmabuf->dmabuf_id  = plane.dmabuf_id;
 dmabuf->buf.width  = plane.width;
 dmabuf->buf.height = plane.height;
+dmabuf->buf.backing_width = plane.width;
+dmabuf->buf.backing_height = plane.height;
 dmabuf->buf.stride = plane.stride;
 dmabuf->buf.fourcc = plane.drm_format;
 dmabuf->buf.modifier = plane.drm_format_mod;

I'll post that formally, but I really have no idea how dmabuf display
works, so confirmation would be appreciated.  Thanks,

Alex




Re: [PATCH] target/riscv: Allocate itrigger timers only once

2023-08-16 Thread Philippe Mathieu-Daudé

On 16/8/23 18:27, Akihiko Odaki wrote:

riscv_trigger_init() had been called on reset events that can happen
several times for a CPU and it allocated timers for itrigger. If old
timers were present, they were simply overwritten by the new timers,
resulting in a memory leak.

Divide riscv_trigger_init() into two functions, namely
riscv_trigger_realize() and riscv_trigger_reset() and call them in
appropriate timing. The timer allocation will happen only once for a
CPU in riscv_trigger_realize().

Fixes: 5a4ae64cac ("target/riscv: Add itrigger support when icount is enabled")
Signed-off-by: Akihiko Odaki 
---
  target/riscv/debug.h |  3 ++-
  target/riscv/cpu.c   |  8 +++-
  target/riscv/debug.c | 15 ---
  3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/target/riscv/debug.h b/target/riscv/debug.h
index c471748d5a..7edc31e7cc 100644
--- a/target/riscv/debug.h
+++ b/target/riscv/debug.h
@@ -143,7 +143,8 @@ void riscv_cpu_debug_excp_handler(CPUState *cs);
  bool riscv_cpu_debug_check_breakpoint(CPUState *cs);
  bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp);
  
-void riscv_trigger_init(CPURISCVState *env);

+void riscv_trigger_realize(CPURISCVState *env);
+void riscv_trigger_reset(CPURISCVState *env);
  
  bool riscv_itrigger_enabled(CPURISCVState *env);

  void riscv_itrigger_update_priv(CPURISCVState *env);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index e12b6ef7f6..3bc3f96a58 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -904,7 +904,7 @@ static void riscv_cpu_reset_hold(Object *obj)
  
  #ifndef CONFIG_USER_ONLY

  if (cpu->cfg.debug) {
-riscv_trigger_init(env);
+riscv_trigger_reset(env);


Maybe name _reset_hold()? Otherwise:

Reviewed-by: Philippe Mathieu-Daudé 


  }





Re: [PATCH v4 8/8] migration: Add a wrapper to cleanup migration files

2023-08-16 Thread Fabiano Rosas
Peter Xu  writes:

> On Wed, Aug 16, 2023 at 03:35:24PM -0400, Peter Xu wrote:
>> On Wed, Aug 16, 2023 at 03:47:26PM -0300, Fabiano Rosas wrote:
>> > Peter Xu  writes:
>> > 
>> > > On Wed, Aug 16, 2023 at 11:25:10AM -0300, Fabiano Rosas wrote:
>> > >> @@ -2003,6 +1980,8 @@ static int 
>> > >> open_return_path_on_source(MigrationState *ms)
>> > >>  return -1;
>> > >>  }
>> > >>  
>> > >> +
>> > >> migration_ioc_register_yank(qemu_file_get_ioc(ms->rp_state.from_dst_file));
>> > >
>> > > I think I didn't really get why it wasn't paired before yesterday.  My
>> > > fault.
>> > >
>> > > Registering from_dst_file, afaict, will register two identical yank 
>> > > objects
>> > > because the ioc is the same.
>> > >
>> > 
>> > Why do we have two QEMUFiles for the same fd again?
>> 
>> Because qemufile has a "direction" (either read / write)?
>> 
>> > 
>> > We're bound to crash at some point by trying to qemu_fclose() the two
>> > QEMUFiles at the same time.
>> 
>> Even with each qemufile holding a reference on the ioc object?  I thought
>> it won't crash, but if it will please point that out; or fix it would be
>> even better.

You're right, it wouldn't crash. But it's still the same ioc object. If
qio_channel_close() is called twice, then we could potentially close the
fd twice. Which would either error out or close a reused fd. The window
is small though, so probably unlikely to ever happen.

>> 
>> > 
>> > > Should we make migration_file_release() not handle the unregister of
>> > > yank(), but leave that to callers?  Then we keep the rule of only 
>> > > register
>> > > yank for each ioc once.
>> > >
>> > 
>> > We need the unregister to be at migration_file_release() so that it
>> > takes benefit of the locking while checking the file for NULL. If it
>> > moves out then the caller will have to do locking as well. Which
>> > defeats the purpose of the patch.
>> > 
>> > I don't understand why you moved the unregister out of channel_close in
>> > commit 39675b ("migration: Move the yank unregister of channel_close
>> > out"). You called it a "hack" at the time, but looking at the current
>> > situation, it seems like a reasonable thing to do: unregister the yank
>> > when the ioc refcount drops to 1.
>> > 
>> > I would go even further and say that qemu_fclose should also avoid
>> > calling qio_channel_close if the ioc refcnt is elevated.
>> 
>> I'd rather not; I still think it's a hack, always open to be corrected.

It's hard to figure out what you mean by hack at times. Even more when
reading a years-old commit message.

>> 
>> I think the problem is yank can register anything so it's separate from
>> iochannels.  If one would like to have ioc close() automatically
>> unregister, then one should also register yank transparently without the
>> ioc user even aware of yank's existance.

Ok, fair point.

>> 
>> Now the condition is the caller register yank itself, then I think the
>> caller should unreg it.. not iochannel itself, silently.

I think the issue is that we're linking the yank with the QEMUFile for
no reason. The migration_yank_iochannel() performs a
qio_channel_shutdown() which is an operation on the fd. The QEMUFile
just happens to hold a pointer to the ioc.

>
> I just noticed this is not really copying the list.. let me add the cc list
> back, assuming it was just forgotten.

I'm sorry, I hit the wrong key while replying.

> One more thing to mention is, now I kind of agree probably we should
> register yank over each qemufile, as you raised the concern in the other
> thread that otherwise qmp_yank() won't set error for the qemufile, which
> seems to be unexpected.

I haven't made up my mind yet, but I think I'd rather stop setting that
error instead of doing it from other places. A shutdown() is mostly a
benign operation intended to end the connection. The fact that we use it
in some cases to kick the thread out of a possible hang doesn't seem
compelling enough to set -EIO.

Of course we currently have no other way to indicate that the file was
shutdown, so the -EIO will have to stay and that's a discussion for
another day.





Re: [PATCH 1/2] virtio: use blk_io_plug_call() in virtio_irqfd_notify()

2023-08-16 Thread Stefan Hajnoczi
On Wed, Aug 16, 2023 at 08:30:58PM +0200, Ilya Maximets wrote:
> On 8/16/23 17:30, Stefan Hajnoczi wrote:
> > On Wed, Aug 16, 2023 at 03:36:32PM +0200, Ilya Maximets wrote:
> >> On 8/15/23 14:08, Stefan Hajnoczi wrote:
> >>> virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
> >>> Buffer Notifications from an IOThread. This involves an eventfd
> >>> write(2) syscall. Calling this repeatedly when completing multiple I/O
> >>> requests in a row is wasteful.
> >>
> >> Hi, Stefan.  This is an interesting change!
> >>
> >> There is more or less exactly the same problem with fast network backends
> >> and I was playing around with similar ideas in this area while working on
> >> af-xdp network backend recently.  Primarily, implementation of the Rx BH
> >> for virtio-net device and locking the RCU before passing packets from the
> >> backend to the device one by one.
> >>
> >>>
> >>> Use the blk_io_plug_call() API to batch together virtio_irqfd_notify()
> >>> calls made during Linux AIO (aio=native) or io_uring (aio=io_uring)
> >>> completion processing. Do not modify the thread pool (aio=threads) to
> >>> avoid introducing a dependency from util/ onto the block layer.
> >>
> >> But you're introducing a dependency from generic virtio code onto the
> >> block layer in this patch.  This seem to break the module abstraction.
> >>
> >> It looks like there are 2 options to resolve the semantics issue here:
> > 
> > Yes, it's a layering violation.
> > 
> >>
> >> 1. Move virtio_notify_irqfd() from virtio.c down to the block layer.
> >>Block layer is the only user, so that may be justified, but it
> >>doesn't seem like a particularly good solution.  (I'm actually not
> >>sure why block devices are the only ones using this function...)
> > 
> > Yes, this is the easiest way to avoid the layering violation for now.
> > 
> > The virtio_notify_irqfd() API is necessary when running in an IOThread
> > because the normal QEMU irq API must run under the Big QEMU Lock. Block
> > devices are the only ones that raise interrupts from IOThreads at the
> > moment.
> 
> Ack.  Thanks for explanation!
> 
> > 
> >>
> >> 2. Move and rename the block/plug library somewhere generic.  The plug
> >>library seems to not have any dependencies on a block layer, other
> >>than a name, so it should not be hard to generalize it (well, the
> >>naming might be hard).
> > 
> > Yes, it should be possible to make it generic quite easily. I will give
> > this a try in the next version of the patch.
> 
> OK.  Sounds good to me.
> 
> > 
> >> In general, while looking at the plug library, it seems to do what is
> >> typically implemented via RCU frameworks - the delayed function call.
> >> The only difference is that RCU doesn't check for duplicates and the
> >> callbacks are global.  Should not be hard to add some new functionality
> >> to RCU framework in order to address these, e.g. rcu_call_local() for
> >> calls that should be executed once the current thread exits its own
> >> critical section.
> > 
> > This rcu_call_local() idea is unrelated to Read Copy Update, so I don't
> > think it should be part of the RCU API.
> 
> Agreed.
> 
> > Another deferred function call mechanism is QEMUBH. It already supports
> > coalescing. However, BHs are invoked once per AioContext event loop
> > iteration and there is no way invoke the BH earlier. Also the BH pointer
> > needs to be passed to every function that wishes to schedule a deferred
> > call, which can be tedious (e.g. block/linux-aio.c should defer the
> > io_submit(2) syscall until the end of virtio-blk request processing -
> > there are a lot of layers of code between those two components).
> > 
> >>
> >> Using RCU for non-RCU-protected things might be considered as an abuse.
> >> However, we might solve two issues in one shot if instead of entering
> >> blk_io_plug/unplug section we will enter an RCU critical section and
> >> call callbacks at the exit.  The first issue is the notification batching
> >> that this patch is trying to fix, the second is an excessive number of
> >> thread fences on RCU exits every time virtio_notify_irqfd() and other
> >> virtio functions are invoked.  The second issue can be avoided by using
> >> RCU_READ_LOCK_GUARD() in completion functions.  Not sure if that will
> >> improve performance, but it definitely removes a lot of noise from the
> >> perf top for network backends.  This makes the code a bit less explicit
> >> though, the lock guard will definitely need a comment.  Though, the reason
> >> for blk_io_plug() calls is not fully clear for a module code alone either.
> > 
> > util/aio-posix.c:run_poll_handlers() has a top-level
> > RCU_READ_LOCK_GUARD() for this reason.
> 
> Nice, didn't know that.
> 
> > Maybe we should do the same
> > around aio_bh_poll() + aio_dispatch_ready_handlers() in
> > util/aio-posix.c:aio_poll()? The downside is that latency-sensitive
> > call_rcu() callbacks perform worse.
> 
> "latency-sensitive ca

[PATCH] qga: Start qemu-ga service after NetworkManager start

2023-08-16 Thread Efim Shevrin via
From: Fima Shevrin 

When the guest OS starts, qemu-ga sends an event to the host.
This event allows services on the host to start configuring
the already running guest OS. When configuring network settings,
it is possible that an external service will receive a signal
from qemu-ga about the start of guest OS, while NetworkManager
may not be running yet. Therefore, network setting may not
be available. With the current patch, we eliminate the described
race condition between qemu-ga and NetworkManager for guest OS
network setting cases.

Signed-off-by: Fima Shevrin 
---
 contrib/systemd/qemu-guest-agent.service | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/systemd/qemu-guest-agent.service 
b/contrib/systemd/qemu-guest-agent.service
index 51cd7b37ff..6e2d059356 100644
--- a/contrib/systemd/qemu-guest-agent.service
+++ b/contrib/systemd/qemu-guest-agent.service
@@ -2,6 +2,7 @@
 Description=QEMU Guest Agent
 BindTo=dev-virtio\x2dports-org.qemu.guest_agent.0.device
 After=dev-virtio\x2dports-org.qemu.guest_agent.0.device
+After=NetworkManager.service
 
 [Service]
 ExecStart=-/usr/bin/qemu-ga
-- 
2.34.1




[8.1 regression] Re: [PULL 05/19] virtio-gpu-udmabuf: correct naming of QemuDmaBuf size properties

2023-08-16 Thread Alex Williamson
This commit introduces a regression when using GVT-g with the display
and ramfb options.  At the point where it appears the guest is switching
from ramfb to the vGPU display, the display goes black and QEMU reports:

qemu: eglCreateImageKHR failed

This message occurs repeatedly.

VM command line:

/usr/local/bin/qemu-system-x86_64 \
-M pc \
-machine kernel_irqchip=on \
-accel kvm \
-cpu host \
-m 6G \
-smp 4,sockets=1,dies=1,cores=4,threads=1 \
-smbios 'type=0,vendor=HP,version=P02 Ver. 02.44,date=09/13/2022,release=2.44' \
-smbios 'type=1,manufacturer=HP,product=HP ProDesk 600 G3 
MT,serial=MXL745130H,sku=Y3E02AV,family=103C_53307F HP ProDesk' \
-smbios 'type=2,manufacturer=HP,product=829D,version=KBC Version 
06.29,serial=PFYUT0FCY9731K' \
-smbios type=3,manufacturer=HP,serial=MXL745130H,asset=MXL745130H \
-acpitable sig=SLIC,file=/var/lib/libvirt/images/slic.bin \
-acpitable sig=MSDM,file=/var/lib/libvirt/images/msdm.bin \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-rtc base=localtime,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-user-config \
-nodefaults \
-monitor stdio \
-serial none \
-parallel none \
-no-hpet \
-net nic,macaddr=5c:25:8e:a6:34:6e,model=virtio \
-net user \
-usb \
-usbdevice tablet \
-vga none \
-display gtk,gl=on \
-hda /var/lib/libvirt/images/win10-gvtg.qcow2 \
-device 
vfio-pci-nohotplug,sysfsdev=/sys/bus/mdev/devices/b1338b2d-a709-4c23-b766-cc436c36cdf0,display=on,ramfb=on,x-igd-opregion=on
 \
-snapshot

Thanks,
Alex

On Mon, 17 Jul 2023 16:45:30 +0400
marcandre.lur...@redhat.com wrote:

> From: Dongwon Kim 
> 
> Replace 'width' and 'height' in QemuDmaBuf with 'backing_widht'
> and 'backing_height' as these commonly indicate the size of the
> whole surface (e.g. guest's Xorg extended display). Then use
> 'width' and 'height' for sub region in there (e.g. guest's
> scanouts).
> 
> Cc: Gerd Hoffmann 
> Cc: Marc-André Lureau 
> Cc: Vivek Kasireddy 
> Signed-off-by: Dongwon Kim 
> Reviewed-by: Marc-André Lureau 
> Message-ID: <20230713040444.32267-1-dongwon@intel.com>
> ---
>  include/ui/console.h|  4 ++--
>  hw/display/virtio-gpu-udmabuf.c | 12 ++--
>  ui/dbus-listener.c  |  8 
>  ui/egl-helpers.c|  8 
>  ui/gtk-egl.c| 10 ++
>  ui/gtk-gl-area.c|  7 ---
>  6 files changed, 26 insertions(+), 23 deletions(-)
> 
> diff --git a/include/ui/console.h b/include/ui/console.h
> index f27b2aad4f..3e8b22d6c6 100644
> --- a/include/ui/console.h
> +++ b/include/ui/console.h
> @@ -201,8 +201,8 @@ typedef struct QemuDmaBuf {
>  uint32_t  texture;
>  uint32_t  x;
>  uint32_t  y;
> -uint32_t  scanout_width;
> -uint32_t  scanout_height;
> +uint32_t  backing_width;
> +uint32_t  backing_height;
>  bool  y0_top;
>  void  *sync;
>  int   fence_fd;
> diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
> index ef1a740de5..d51184d658 100644
> --- a/hw/display/virtio-gpu-udmabuf.c
> +++ b/hw/display/virtio-gpu-udmabuf.c
> @@ -181,13 +181,13 @@ static VGPUDMABuf
>  }
>  
>  dmabuf = g_new0(VGPUDMABuf, 1);
> -dmabuf->buf.width = fb->width;
> -dmabuf->buf.height = fb->height;
> +dmabuf->buf.width = r->width;
> +dmabuf->buf.height = r->height;
>  dmabuf->buf.stride = fb->stride;
>  dmabuf->buf.x = r->x;
>  dmabuf->buf.y = r->y;
> -dmabuf->buf.scanout_width = r->width;
> -dmabuf->buf.scanout_height = r->height;
> +dmabuf->buf.backing_width = fb->width;
> +dmabuf->buf.backing_height = fb->height;
>  dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
>  dmabuf->buf.fd = res->dmabuf_fd;
>  dmabuf->buf.allow_fences = true;
> @@ -218,8 +218,8 @@ int virtio_gpu_update_dmabuf(VirtIOGPU *g,
>  
>  g->dmabuf.primary[scanout_id] = new_primary;
>  qemu_console_resize(scanout->con,
> -new_primary->buf.scanout_width,
> -new_primary->buf.scanout_height);
> +new_primary->buf.width,
> +new_primary->buf.height);
>  dpy_gl_scanout_dmabuf(scanout->con, &new_primary->buf);
>  
>  if (old_primary) {
> diff --git a/ui/dbus-listener.c b/ui/dbus-listener.c
> index 0240c39510..68ff343799 100644
> --- a/ui/dbus-listener.c
> +++ b/ui/dbus-listener.c
> @@ -415,13 +415,13 @@ static void dbus_scanout_texture(DisplayChangeListener 
> *dcl,
> backing_width, backing_height, x, y, w, h);
>  #ifdef CONFIG_GBM
>  QemuDmaBuf dmabuf = {
> -.width = backing_width,
> -.height = backing_height,
> +.width = w,
> +.height = h,
>  .y0_top = backing_y_0_top,
>  .x = x,
>  .y = y,
> -.scanout_width = w,
> -.scanout_height = h,
> +.backing_width = backing_width,
> +.backing_height = backing_height,
>  };
>  
>  assert(tex

Re: [PATCH v2 3/4] qcow2: add zoned emulation capability

2023-08-16 Thread Stefan Hajnoczi
On Mon, Aug 14, 2023 at 04:58:01PM +0800, Sam Li wrote:
> By adding zone operations and zoned metadata, the zoned emulation
> capability enables full emulation support of zoned device using
> a qcow2 file. The zoned device metadata includes zone type,
> zoned device state and write pointer of each zone, which is stored
> to an array of unsigned integers.
> 
> Each zone of a zoned device makes state transitions following
> the zone state machine. The zone state machine mainly describes
> five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> READ ONLY and OFFLINE states will generally be affected by device
> internal events. The operations on zones cause corresponding state
> changing.
> 
> Zoned devices have a limit on zone resources, which puts constraints on
> write operations into zones.
> 
> Signed-off-by: Sam Li 
> ---
>  block/qcow2.c  | 676 -
>  block/qcow2.h  |   2 +
>  docs/interop/qcow2.txt |   2 +
>  3 files changed, 678 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c1077c4a4a..5ccf79cbe7 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
> *fmt, Error **errp)
>  return cryptoopts_qdict;
>  }
>  
> +#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
> +
> +static inline int qcow2_get_wp(uint64_t wp)
> +{
> +/* clear state and type information */
> +return ((wp << 5) >> 5);
> +}
> +
> +static inline int qcow2_get_zs(uint64_t wp)
> +{
> +return (wp >> 60);
> +}
> +
> +static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
> +{
> +uint64_t addr = qcow2_get_wp(*wp);
> +addr |= ((uint64_t)zs << 60);
> +*wp = addr;
> +}

Although the function is called qcow2_set_wp() it seems to actually be
qcow2_set_zs() since it only changes the zone state, not the write
pointer. Want to rename it?

> +
> +/*
> + * File wp tracking: reset zone, finish zone and append zone can
> + * change the value of write pointer. All zone operations will change
> + * the state of that/those zone.
> + * */
> +static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
> +/* format: operations, the wp. */
> +printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
> +}

This looks like debugging code that shouldn't go into qemu.git. Please
use tracing (docs/devel/tracing.rst) to capture internal information in
production code.

I will review more of this patch series another time because I need to
get going.

Stefan


signature.asc
Description: PGP signature


[PATCH] chardev/char-pty: Avoid losing bytes when the other side just (re-)connected

2023-08-16 Thread Thomas Huth
When starting a guest via libvirt with "virsh start --console ...",
the first second of the console output is missing. This is especially
annoying on s390x that only has a text console by default and no graphical
output - if the bios fails to boot here, the information about what went
wrong is completely lost.

One part of the problem (there is also some things to be done on the
libvirt side) is that QEMU only checks with a 1 second timer whether
the other side of the pty is already connected, so the first second of
the console output is always lost.

This likely used to work better in the past, since the code once checked
for a re-connection during write, but this has been removed in commit
f8278c7d74 ("char-pty: remove the check for connection on write") to avoid
some locking.

To ease the situation here at least a little bit, let's check with g_poll()
whether we could send out the data anyway, even if the connection has not
been marked as "connected" yet. The file descriptor is marked as non-blocking
anyway since commit fac6688a18 ("Do not hang on full PTY"), so this should
not cause any trouble if the other side is not ready for receiving yet.

With this patch applied, I can now successfully see the bios output of
a s390x guest when running it with "virsh start --console" (with a patched
version of virsh that fixes the remaining issues there, too).

Reported-by: Marc Hartmayer 
Signed-off-by: Thomas Huth 
---
 chardev/char-pty.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/chardev/char-pty.c b/chardev/char-pty.c
index 4e5deac18a..fad12dfef3 100644
--- a/chardev/char-pty.c
+++ b/chardev/char-pty.c
@@ -106,11 +106,27 @@ static void pty_chr_update_read_handler(Chardev *chr)
 static int char_pty_chr_write(Chardev *chr, const uint8_t *buf, int len)
 {
 PtyChardev *s = PTY_CHARDEV(chr);
+GPollFD pfd;
+int rc;
 
-if (!s->connected) {
-return len;
+if (s->connected) {
+return io_channel_send(s->ioc, buf, len);
 }
-return io_channel_send(s->ioc, buf, len);
+
+/*
+ * The other side might already be re-connected, but the timer might
+ * not have fired yet. So let's check here whether we can write again:
+ */
+pfd.fd = QIO_CHANNEL_FILE(s->ioc)->fd;
+pfd.events = G_IO_OUT;
+pfd.revents = 0;
+rc = RETRY_ON_EINTR(g_poll(&pfd, 1, 0));
+g_assert(rc >= 0);
+if (!(pfd.revents & G_IO_HUP) && (pfd.revents & G_IO_OUT)) {
+io_channel_send(s->ioc, buf, len);
+}
+
+return len;
 }
 
 static GSource *pty_chr_add_watch(Chardev *chr, GIOCondition cond)
-- 
2.39.3




Re: [PATCH v4 8/8] migration: Add a wrapper to cleanup migration files

2023-08-16 Thread Peter Xu


On Wed, Aug 16, 2023 at 03:35:24PM -0400, Peter Xu wrote:
> On Wed, Aug 16, 2023 at 03:47:26PM -0300, Fabiano Rosas wrote:
> > Peter Xu  writes:
> > 
> > > On Wed, Aug 16, 2023 at 11:25:10AM -0300, Fabiano Rosas wrote:
> > >> @@ -2003,6 +1980,8 @@ static int 
> > >> open_return_path_on_source(MigrationState *ms)
> > >>  return -1;
> > >>  }
> > >>  
> > >> +
> > >> migration_ioc_register_yank(qemu_file_get_ioc(ms->rp_state.from_dst_file));
> > >
> > > I think I didn't really get why it wasn't paired before yesterday.  My
> > > fault.
> > >
> > > Registering from_dst_file, afaict, will register two identical yank 
> > > objects
> > > because the ioc is the same.
> > >
> > 
> > Why do we have two QEMUFiles for the same fd again?
> 
> Because qemufile has a "direction" (either read / write)?
> 
> > 
> > We're bound to crash at some point by trying to qemu_fclose() the two
> > QEMUFiles at the same time.
> 
> Even with each qemufile holding a reference on the ioc object?  I thought
> it won't crash, but if it will please point that out; or fix it would be
> even better.
> 
> > 
> > > Should we make migration_file_release() not handle the unregister of
> > > yank(), but leave that to callers?  Then we keep the rule of only register
> > > yank for each ioc once.
> > >
> > 
> > We need the unregister to be at migration_file_release() so that it
> > takes benefit of the locking while checking the file for NULL. If it
> > moves out then the caller will have to do locking as well. Which
> > defeats the purpose of the patch.
> > 
> > I don't understand why you moved the unregister out of channel_close in
> > commit 39675b ("migration: Move the yank unregister of channel_close
> > out"). You called it a "hack" at the time, but looking at the current
> > situation, it seems like a reasonable thing to do: unregister the yank
> > when the ioc refcount drops to 1.
> > 
> > I would go even further and say that qemu_fclose should also avoid
> > calling qio_channel_close if the ioc refcnt is elevated.
> 
> I'd rather not; I still think it's a hack, always open to be corrected.
> 
> I think the problem is yank can register anything so it's separate from
> iochannels.  If one would like to have ioc close() automatically
> unregister, then one should also register yank transparently without the
> ioc user even aware of yank's existance.
> 
> Now the condition is the caller register yank itself, then I think the
> caller should unreg it.. not iochannel itself, silently.

I just noticed this is not really copying the list.. let me add the cc list
back, assuming it was just forgotten.

One more thing to mention is, now I kind of agree probably we should
register yank over each qemufile, as you raised the concern in the other
thread that otherwise qmp_yank() won't set error for the qemufile, which
seems to be unexpected.

-- 
Peter Xu




Re: [PATCH v2 2/4] qcow2: add configurations for zoned format extension

2023-08-16 Thread Stefan Hajnoczi
On Mon, Aug 14, 2023 at 04:58:00PM +0800, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires following arguments: the device size, zoned profile,
> zoned model, zone size, zone capacity, number of conventional
> zones, limits on zone resources (max append sectors, max open
> zones, and max_active_zones). The zoned profile option is set
> to zns when using the qcow2 file as a ZNS drive.
> 
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
>  -o zoned_profile=zbc/zns
> 
> Signed-off-by: Sam Li 
> ---
>  block/qcow2.c| 125 +++
>  block/qcow2.h|  21 ++
>  docs/interop/qcow2.txt   |  24 ++
>  include/block/block-common.h |   5 ++
>  include/block/block_int-common.h |  16 
>  qapi/block-core.json |  46 
>  6 files changed, 223 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c51388e99d..c1077c4a4a 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
>  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
>  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
>  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
>  
>  static int coroutine_fn
>  qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> start_offset,
>  uint64_t offset;
>  int ret;
>  Qcow2BitmapHeaderExt bitmaps_ext;
> +Qcow2ZonedHeaderExtension zoned_ext;
>  
>  if (need_update_header != NULL) {
>  *need_update_header = false;
> @@ -431,6 +433,38 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> start_offset,
>  break;
>  }
>  
> +case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> +{
> +if (ext.len != sizeof(zoned_ext)) {
> +error_setg_errno(errp, -ret, "zoned_ext: "

ret does not contain a useful value. I suggest calling error_setg()
instead.

> + "Invalid extension length");
> +return -EINVAL;
> +}
> +ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "zoned_ext: "
> + "Could not read ext header");
> +return ret;
> +}
> +
> +zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> +zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> +zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> +zoned_ext.zone_nr_conv = be32_to_cpu(zoned_ext.zone_nr_conv);
> +zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> +zoned_ext.max_active_zones =
> +be32_to_cpu(zoned_ext.max_active_zones);
> +zoned_ext.max_append_sectors =
> +be32_to_cpu(zoned_ext.max_append_sectors);
> +s->zoned_header = zoned_ext;

I suggest adding checks here and refusing to open broken images:

  if (zone_size == 0) {
  error_setg(errp, "Zoned extension header zone_size field cannot be 0");
  return -EINVAL;
  }
  if (zone_capacity > zone_size) { ... }
  if (nr_zones != DIV_ROUND_UP(bs->total_size, zone_size)) { ... }

> +
> +#ifdef DEBUG_EXT
> +printf("Qcow2: Got zoned format extension: "
> +   "offset=%" PRIu32 "\n", offset);
> +#endif
> +break;
> +}
> +
>  default:
>  /* unknown magic - save it in case we need to rewrite the header 
> */
>  /* If you add a new feature, make sure to also update the fast
> @@ -3089,6 +3123,31 @@ int qcow2_update_header(BlockDriverState *bs)
>  buflen -= ret;
>  }
>  
> +/* Zoned devices header extension */
> +if (s->zoned_header.zoned == BLK_Z_HM) {
> +Qcow2ZonedHeaderExtension zoned_header = {
> +.zoned_profile  = s->zoned_header.zoned_profile,
> +.zoned  = s->zoned_header.zoned,
> +.nr_zones   = cpu_to_be32(s->zoned_header.nr_zones),
> +.zone_size  = cpu_to_be32(s->zoned_header.zone_size),
> +.zone_capacity  = cpu_to_be32(s->zoned_header.zone_capacity),
> +.zone_nr_conv   = cpu_to_be32(s->zoned_header.zone_nr_conv),
> +.max_open_zones = 
> cpu_to_be32(s->zoned_header.max_open_zones),
> +.max_active_zones   =
> +cpu_to_be32(s->zoned_header.max_active_zones),
> +.max_append_sectors =
> +cpu_to_be32(s->zoned_header.max_append_sectors)
> +};
> +r

Re: [PATCH 01/10] hw/arm/virt-acpi-build.c: Move fw_cfg and virtio to common location

2023-08-16 Thread Daniel Henrique Barboza



On 7/26/23 05:25, Igor Mammedov wrote:

On Tue, 25 Jul 2023 22:20:36 +0530
Sunil V L  wrote:


On Mon, Jul 24, 2023 at 05:18:59PM +0200, Igor Mammedov wrote:

On Wed, 12 Jul 2023 22:09:34 +0530
Sunil V L  wrote:
   

The functions which add fw_cfg and virtio to DSDT are same for ARM
and RISC-V. So, instead of duplicating in RISC-V, move them from
hw/arm/virt-acpi-build.c to common aml-build.c.

Signed-off-by: Sunil V L 
---
  hw/acpi/aml-build.c | 41 
  hw/arm/virt-acpi-build.c| 42 -
  hw/riscv/virt-acpi-build.c  | 16 --
  include/hw/acpi/aml-build.h |  6 ++
  4 files changed, 47 insertions(+), 58 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c


patch looks fine modulo,
I'd put these into respective device files instead of generic
aml-build.c which was intended for basic AML primitives
(it 's got polluted over time with device specific functions
but that's not the reason to continue doing that).

Also having those functions along with devices models
goes along with self enumerating ACPI devices (currently
it works for x86 PCI/ISA device but there is no reason
that it can't work with other types as well when
I get there)
   

Thanks!, Igor. Let me add them to device specific files as per your
recommendation.

just be careful and build test other targets (while disabling the rest)
at least no to regress them due to build deps. (I'd pick 2 with ACPI
support that use and not uses affected code) and 1 that  uses device
model but doesn't use ACPI at all (if such exists)


Sunil is already aware of it but I'll also mention here since it seems relevant
to Igor's point.


This patch breaks i386-softmmu build:


FAILED: libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o
cc -m64 -mcx16 -Ilibqemu-i386-softmmu.fa.p -I. -I.. -Itarget/i386 -I../target/i386 -Iqapi -Itrace 
-Iui -Iui/shader -I/usr/include/pixman-1 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
-I/usr/include/sysprof-4 -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g 
-fstack-protector-strong -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wundef -Wwrite-strings 
-Wmissing-prototypes -Wstrict-prototypes -Wredundant-decls -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined 
-Wimplicit-fallthrough=2 -Wmissing-format-attribute -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-psabi -isystem /home/danielhb/work/qemu/linux-headers -isystem 
linux-headers -iquote . -iquote /home/danielhb/work/qemu -iquote /home/danielhb/work/qemu/include 
-iquote /home/danielhb/work/qemu/host/include/x86_64 -iquote 
/home/danielhb/work/qemu/host/include/generic -iquote /home/danielhb/work/qemu/tcg/i386 -pthread 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing -fno-common -fwrapv 
-fPIE -isystem../linux-headers -isystemlinux-headers -DNEED_CPU_H 
'-DCONFIG_TARGET="i386-softmmu-config-target.h"' 
'-DCONFIG_DEVICES="i386-softmmu-config-devices.h"' -MD -MQ 
libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o -MF 
libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o.d -o 
libqemu-i386-softmmu.fa.p/hw_i386_acpi-microvm.c.o -c ../hw/i386/acpi-microvm.c
../hw/i386/acpi-microvm.c:48:13: error: conflicting types for 
‘acpi_dsdt_add_virtio’; have ‘void(Aml *, MicrovmMachineState *)’
   48 | static void acpi_dsdt_add_virtio(Aml *scope,
  | ^~~~
In file included from 
/home/danielhb/work/qemu/include/hw/acpi/acpi_aml_interface.h:5,
 from ../hw/i386/acpi-microvm.c:29:
/home/danielhb/work/qemu/include/hw/acpi/aml-build.h:503:6: note: previous 
declaration of ‘acpi_dsdt_add_virtio’ with type ‘void(Aml *, const MemMapEntry 
*, uint32_t,  int)’ {aka ‘void(Aml *, const MemMapEntry *, unsigned int,  int)’}
  503 | void acpi_dsdt_add_virtio(Aml *scope, const MemMapEntry 
*virtio_mmio_memmap,
  |  ^~~~
[5/714] Compiling C object libqemu-i386-softmmu.fa.p/hw_i386_kvm_clock.c.o

This happens because the common 'acpi_dsdt_add_virtio' function matches a local
function with the same name in hw/i386/acpi-microvm.c. We would need to either
rename the shared helper or rename the local acpi-microvm function or do 
something
like Igor mentioned to avoid this name collision.


Thanks,

Daniel












Thanks!
Sunil





Re: [PATCH v2 4/4] migration/ram: Merge save_zero_page functions

2023-08-16 Thread Peter Xu
On Wed, Aug 16, 2023 at 03:28:17PM -0300, Fabiano Rosas wrote:
> We don't need to do this in two pieces. One single function makes it
> easier to grasp, specially since it removes the indirection on the
> return value handling.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 1/2] virtio: use blk_io_plug_call() in virtio_irqfd_notify()

2023-08-16 Thread Ilya Maximets
On 8/16/23 17:30, Stefan Hajnoczi wrote:
> On Wed, Aug 16, 2023 at 03:36:32PM +0200, Ilya Maximets wrote:
>> On 8/15/23 14:08, Stefan Hajnoczi wrote:
>>> virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
>>> Buffer Notifications from an IOThread. This involves an eventfd
>>> write(2) syscall. Calling this repeatedly when completing multiple I/O
>>> requests in a row is wasteful.
>>
>> Hi, Stefan.  This is an interesting change!
>>
>> There is more or less exactly the same problem with fast network backends
>> and I was playing around with similar ideas in this area while working on
>> af-xdp network backend recently.  Primarily, implementation of the Rx BH
>> for virtio-net device and locking the RCU before passing packets from the
>> backend to the device one by one.
>>
>>>
>>> Use the blk_io_plug_call() API to batch together virtio_irqfd_notify()
>>> calls made during Linux AIO (aio=native) or io_uring (aio=io_uring)
>>> completion processing. Do not modify the thread pool (aio=threads) to
>>> avoid introducing a dependency from util/ onto the block layer.
>>
>> But you're introducing a dependency from generic virtio code onto the
>> block layer in this patch.  This seem to break the module abstraction.
>>
>> It looks like there are 2 options to resolve the semantics issue here:
> 
> Yes, it's a layering violation.
> 
>>
>> 1. Move virtio_notify_irqfd() from virtio.c down to the block layer.
>>Block layer is the only user, so that may be justified, but it
>>doesn't seem like a particularly good solution.  (I'm actually not
>>sure why block devices are the only ones using this function...)
> 
> Yes, this is the easiest way to avoid the layering violation for now.
> 
> The virtio_notify_irqfd() API is necessary when running in an IOThread
> because the normal QEMU irq API must run under the Big QEMU Lock. Block
> devices are the only ones that raise interrupts from IOThreads at the
> moment.

Ack.  Thanks for explanation!

> 
>>
>> 2. Move and rename the block/plug library somewhere generic.  The plug
>>library seems to not have any dependencies on a block layer, other
>>than a name, so it should not be hard to generalize it (well, the
>>naming might be hard).
> 
> Yes, it should be possible to make it generic quite easily. I will give
> this a try in the next version of the patch.

OK.  Sounds good to me.

> 
>> In general, while looking at the plug library, it seems to do what is
>> typically implemented via RCU frameworks - the delayed function call.
>> The only difference is that RCU doesn't check for duplicates and the
>> callbacks are global.  Should not be hard to add some new functionality
>> to RCU framework in order to address these, e.g. rcu_call_local() for
>> calls that should be executed once the current thread exits its own
>> critical section.
> 
> This rcu_call_local() idea is unrelated to Read Copy Update, so I don't
> think it should be part of the RCU API.

Agreed.

> Another deferred function call mechanism is QEMUBH. It already supports
> coalescing. However, BHs are invoked once per AioContext event loop
> iteration and there is no way invoke the BH earlier. Also the BH pointer
> needs to be passed to every function that wishes to schedule a deferred
> call, which can be tedious (e.g. block/linux-aio.c should defer the
> io_submit(2) syscall until the end of virtio-blk request processing -
> there are a lot of layers of code between those two components).
> 
>>
>> Using RCU for non-RCU-protected things might be considered as an abuse.
>> However, we might solve two issues in one shot if instead of entering
>> blk_io_plug/unplug section we will enter an RCU critical section and
>> call callbacks at the exit.  The first issue is the notification batching
>> that this patch is trying to fix, the second is an excessive number of
>> thread fences on RCU exits every time virtio_notify_irqfd() and other
>> virtio functions are invoked.  The second issue can be avoided by using
>> RCU_READ_LOCK_GUARD() in completion functions.  Not sure if that will
>> improve performance, but it definitely removes a lot of noise from the
>> perf top for network backends.  This makes the code a bit less explicit
>> though, the lock guard will definitely need a comment.  Though, the reason
>> for blk_io_plug() calls is not fully clear for a module code alone either.
> 
> util/aio-posix.c:run_poll_handlers() has a top-level
> RCU_READ_LOCK_GUARD() for this reason.

Nice, didn't know that.

> Maybe we should do the same
> around aio_bh_poll() + aio_dispatch_ready_handlers() in
> util/aio-posix.c:aio_poll()? The downside is that latency-sensitive
> call_rcu() callbacks perform worse.

"latency-sensitive call_rcu() callback" is a bit of an oxymoron.
There is no real way to tell when the other thread will exit the
critical section.  But I'm not familiar with the code enough to
make a decision here.

>>
>> I'm not sure what is the best way forward.  I'm trying to figure ou

[PATCH v2 0/4] migration/ram: Merge zero page handling

2023-08-16 Thread Fabiano Rosas
For v2 I fixed patch 3 which had a hunk belonging to patch 5.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/969706915

v1:
https://lore.kernel.org/r/20230815143828.15436-1-faro...@suse.de

Hi,

This is another small series that I extracted from my fixed-ram series
and that could be already considered for merging.

This is just code movement, no functional change. The objective is to
consolidate the zero page handling in the same routine that saves the
page header and does accounting. Then in the future I'll be able to
just return early because fixed-ram ignores zero pages.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/968300062

Fabiano Rosas (4):
  migration/ram: Remove RAMState from xbzrle_cache_zero_page
  migration/ram: Stop passing QEMUFile around in save_zero_page
  migration/ram: Move xbzrle zero page handling into save_zero_page
  migration/ram: Merge save_zero_page functions

 migration/ram.c | 75 -
 1 file changed, 30 insertions(+), 45 deletions(-)

-- 
2.35.3




[PATCH v2 4/4] migration/ram: Merge save_zero_page functions

2023-08-16 Thread Fabiano Rosas
We don't need to do this in two pieces. One single function makes it
easier to grasp, specially since it removes the indirection on the
return value handling.

Signed-off-by: Fabiano Rosas 
---
 migration/ram.c | 46 +-
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 82ff53beec..13935ead1c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1128,32 +1128,6 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
 ram_discard_range(rbname, offset, TARGET_PAGE_SIZE);
 }
 
-/**
- * save_zero_page_to_file: send the zero page to the file
- *
- * Returns the size of data written to the file, 0 means the page is not
- * a zero page
- *
- * @pss: current PSS channel
- * @block: block that contains the page we want to send
- * @offset: offset inside the block for the page
- */
-static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block,
-  ram_addr_t offset)
-{
-uint8_t *p = block->host + offset;
-QEMUFile *file = pss->pss_channel;
-int len = 0;
-
-if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
-len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
-qemu_put_byte(file, 0);
-len += 1;
-ram_release_page(block->idstr, offset);
-}
-return len;
-}
-
 /**
  * save_zero_page: send the zero page to the stream
  *
@@ -1167,12 +1141,19 @@ static int save_zero_page_to_file(PageSearchStatus 
*pss, RAMBlock *block,
 static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block,
   ram_addr_t offset)
 {
-int len = save_zero_page_to_file(pss, block, offset);
+uint8_t *p = block->host + offset;
+QEMUFile *file = pss->pss_channel;
+int len = 0;
 
-if (!len) {
-return -1;
+if (!buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+return 0;
 }
 
+len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
+qemu_put_byte(file, 0);
+len += 1;
+ram_release_page(block->idstr, offset);
+
 stat64_add(&mig_stats.zero_pages, 1);
 ram_transferred_add(len);
 
@@ -1186,7 +1167,7 @@ static int save_zero_page(RAMState *rs, PageSearchStatus 
*pss, RAMBlock *block,
 XBZRLE_cache_unlock();
 }
 
-return 1;
+return len;
 }
 
 /*
@@ -2154,9 +2135,8 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
-res = save_zero_page(rs, pss, block, offset);
-if (res > 0) {
-return res;
+if (save_zero_page(rs, pss, block, offset)) {
+return 1;
 }
 
 /*
-- 
2.35.3




[PATCH v2 3/4] migration/ram: Move xbzrle zero page handling into save_zero_page

2023-08-16 Thread Fabiano Rosas
It makes a bit more sense to have the zero page handling of xbzrle
right where we save the zero page.

Also invert the exit condition to remove one level of indentation
which makes the next patch easier to grasp.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/ram.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 761f43dc34..82ff53beec 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1159,21 +1159,34 @@ static int save_zero_page_to_file(PageSearchStatus 
*pss, RAMBlock *block,
  *
  * Returns the number of pages written.
  *
+ * @rs: current RAM state
  * @pss: current PSS channel
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
+static int save_zero_page(RAMState *rs, PageSearchStatus *pss, RAMBlock *block,
   ram_addr_t offset)
 {
 int len = save_zero_page_to_file(pss, block, offset);
 
-if (len) {
-stat64_add(&mig_stats.zero_pages, 1);
-ram_transferred_add(len);
-return 1;
+if (!len) {
+return -1;
 }
-return -1;
+
+stat64_add(&mig_stats.zero_pages, 1);
+ram_transferred_add(len);
+
+/*
+ * Must let xbzrle know, otherwise a previous (now 0'd) cached
+ * page would be stale.
+ */
+if (rs->xbzrle_started) {
+XBZRLE_cache_lock();
+xbzrle_cache_zero_page(block->offset + offset);
+XBZRLE_cache_unlock();
+}
+
+return 1;
 }
 
 /*
@@ -2141,16 +2154,8 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
-res = save_zero_page(pss, block, offset);
+res = save_zero_page(rs, pss, block, offset);
 if (res > 0) {
-/* Must let xbzrle know, otherwise a previous (now 0'd) cached
- * page would be stale
- */
-if (rs->xbzrle_started) {
-XBZRLE_cache_lock();
-xbzrle_cache_zero_page(block->offset + offset);
-XBZRLE_cache_unlock();
-}
 return res;
 }
 
-- 
2.35.3




[PATCH v2 1/4] migration/ram: Remove RAMState from xbzrle_cache_zero_page

2023-08-16 Thread Fabiano Rosas
'rs' is not used in that function. It's a leftover from commit
9360447d34 ("ram: Use MigrationStats for statistics").

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/ram.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 9040d66e61..87efab72e8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -561,7 +561,6 @@ void mig_throttle_counter_reset(void)
 /**
  * xbzrle_cache_zero_page: insert a zero page in the XBZRLE cache
  *
- * @rs: current RAM state
  * @current_addr: address for the zero page
  *
  * Update the xbzrle cache to reflect a page that's been sent as all 0.
@@ -570,7 +569,7 @@ void mig_throttle_counter_reset(void)
  * As a bonus, if the page wasn't in the cache it gets added so that
  * when a small write is made into the 0'd page it gets XBZRLE sent.
  */
-static void xbzrle_cache_zero_page(RAMState *rs, ram_addr_t current_addr)
+static void xbzrle_cache_zero_page(ram_addr_t current_addr)
 {
 /* We don't care if this fails to allocate a new cache page
  * as long as it updated an old one */
@@ -2148,7 +2147,7 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
  */
 if (rs->xbzrle_started) {
 XBZRLE_cache_lock();
-xbzrle_cache_zero_page(rs, block->offset + offset);
+xbzrle_cache_zero_page(block->offset + offset);
 XBZRLE_cache_unlock();
 }
 return res;
-- 
2.35.3




[PATCH v2 2/4] migration/ram: Stop passing QEMUFile around in save_zero_page

2023-08-16 Thread Fabiano Rosas
We don't need the QEMUFile when we're already passing the
PageSearchStatus.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/ram.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 87efab72e8..761f43dc34 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1138,10 +1138,11 @@ void ram_release_page(const char *rbname, uint64_t 
offset)
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
-  RAMBlock *block, ram_addr_t offset)
+static int save_zero_page_to_file(PageSearchStatus *pss, RAMBlock *block,
+  ram_addr_t offset)
 {
 uint8_t *p = block->host + offset;
+QEMUFile *file = pss->pss_channel;
 int len = 0;
 
 if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
@@ -1162,10 +1163,10 @@ static int save_zero_page_to_file(PageSearchStatus 
*pss, QEMUFile *file,
  * @block: block that contains the page we want to send
  * @offset: offset inside the block for the page
  */
-static int save_zero_page(PageSearchStatus *pss, QEMUFile *f, RAMBlock *block,
+static int save_zero_page(PageSearchStatus *pss, RAMBlock *block,
   ram_addr_t offset)
 {
-int len = save_zero_page_to_file(pss, f, block, offset);
+int len = save_zero_page_to_file(pss, block, offset);
 
 if (len) {
 stat64_add(&mig_stats.zero_pages, 1);
@@ -2140,7 +2141,7 @@ static int ram_save_target_page_legacy(RAMState *rs, 
PageSearchStatus *pss)
 return 1;
 }
 
-res = save_zero_page(pss, pss->pss_channel, block, offset);
+res = save_zero_page(pss, block, offset);
 if (res > 0) {
 /* Must let xbzrle know, otherwise a previous (now 0'd) cached
  * page would be stale
-- 
2.35.3




Re: How to synchronize CPUs on MMIO read?

2023-08-16 Thread Alex Bennée


Igor Lesik  writes:

> Hi.
>
> I need to model some custom HW that synchronizes CPUs when they read MMIO 
> register N: MMIO read does not
> return until another CPU writes to MMIO register M. I modeled this behavior 
> with a) on MMIO read of N, save CPU into
> a list of waiting CPUs and put it asleep with cpu_interrupt(current_cpu, 
> CPU_INTERRUPT_HALT) and b) on MMIO write
> to M, wake all waiting CPUs with cpu->halted = 0; qemu_cpu_kick(cpu). It 
> seems to work fine. However, this HW has a
> twist: MMIO read of N returns a value that was written by MMIO write to M. 
> Can anyone please advise how this could
> be done?

Under TCG all MMIO accesses should be serialised by the BQL so no other
MMIO access can be taking place until you finish the operation.

>
>  
>
> Thanks!
>
> Igor


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



[PATCH 1/6] util/selfmap: Use dev_t and ino_t in MapInfo

2023-08-16 Thread Richard Henderson
Use dev_t instead of a string, and ino_t instead of uint64_t.
The latter is likely to be identical on modern systems but is
more type-correct for usage.

Signed-off-by: Richard Henderson 
---
 include/qemu/selfmap.h |  4 ++--
 linux-user/syscall.c   |  6 --
 util/selfmap.c | 12 +++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/qemu/selfmap.h b/include/qemu/selfmap.h
index 7d938945cb..1690a74f4b 100644
--- a/include/qemu/selfmap.h
+++ b/include/qemu/selfmap.h
@@ -20,10 +20,10 @@ typedef struct {
 bool is_exec;
 bool is_priv;
 
+dev_t dev;
+ino_t inode;
 uint64_t offset;
-uint64_t inode;
 const char *path;
-char dev[];
 } MapInfo;
 
 /**
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 9353268cc1..074262b3ac 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8160,13 +8160,15 @@ static int open_self_maps_1(CPUArchState *cpu_env, int 
fd, bool smaps)
 }
 
 count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
-" %c%c%c%c %08" PRIx64 " %s %"PRId64,
+" %c%c%c%c %08" PRIx64 " %02x:%02x %"PRId64,
 h2g(min), h2g(max - 1) + 1,
 (flags & PAGE_READ) ? 'r' : '-',
 (flags & PAGE_WRITE_ORG) ? 'w' : '-',
 (flags & PAGE_EXEC) ? 'x' : '-',
 e->is_priv ? 'p' : 's',
-(uint64_t) e->offset, e->dev, e->inode);
+(uint64_t)e->offset,
+major(e->dev), minor(e->dev),
+(uint64_t)e->inode);
 if (path) {
 dprintf(fd, "%*s%s\n", 73 - count, "", path);
 } else {
diff --git a/util/selfmap.c b/util/selfmap.c
index 4db5b42651..483cb617e2 100644
--- a/util/selfmap.c
+++ b/util/selfmap.c
@@ -30,19 +30,21 @@ IntervalTreeRoot *read_self_maps(void)
 
 if (nfields > 4) {
 uint64_t start, end, offset, inode;
+unsigned dev_maj, dev_min;
 int errors = 0;
 const char *p;
 
 errors |= qemu_strtou64(fields[0], &p, 16, &start);
 errors |= qemu_strtou64(p + 1, NULL, 16, &end);
 errors |= qemu_strtou64(fields[2], NULL, 16, &offset);
+errors |= qemu_strtoui(fields[3], &p, 16, &dev_maj);
+errors |= qemu_strtoui(p + 1, NULL, 16, &dev_min);
 errors |= qemu_strtou64(fields[4], NULL, 10, &inode);
 
 if (!errors) {
-size_t dev_len, path_len;
+size_t path_len;
 MapInfo *e;
 
-dev_len = strlen(fields[3]) + 1;
 if (nfields == 6) {
 p = fields[5];
 p += strspn(p, " ");
@@ -52,11 +54,12 @@ IntervalTreeRoot *read_self_maps(void)
 path_len = 0;
 }
 
-e = g_malloc0(sizeof(*e) + dev_len + path_len);
+e = g_malloc0(sizeof(*e) + path_len);
 
 e->itree.start = start;
 e->itree.last = end - 1;
 e->offset = offset;
+e->dev = makedev(dev_maj, dev_min);
 e->inode = inode;
 
 e->is_read  = fields[1][0] == 'r';
@@ -64,9 +67,8 @@ IntervalTreeRoot *read_self_maps(void)
 e->is_exec  = fields[1][2] == 'x';
 e->is_priv  = fields[1][3] == 'p';
 
-memcpy(e->dev, fields[3], dev_len);
 if (path_len) {
-e->path = memcpy(e->dev + dev_len, p, path_len);
+e->path = memcpy(e + 1, p, path_len);
 }
 
 interval_tree_insert(&e->itree, root);
-- 
2.34.1




[PATCH 0/6] linux-user: Rewrite open_self_maps

2023-08-16 Thread Richard Henderson
Based-on: 20230816180338.572576-1-richard.hender...@linaro.org
("[PATCH v4 00/18] linux-user: Implement VDSOs")

As promised, a rewrite of /proc/self/{maps,smaps} emulation
using interval trees.

Incorporate Helge's change to mark [heap], and also mark [vdso].


r~


Richard Henderson (6):
  util/selfmap: Use dev_t and ino_t in MapInfo
  linux-user: Use walk_memory_regions for open_self_maps
  linux-user: Adjust brk for load_bias
  linux-user: Show heap address in /proc/pid/maps
  linux-user: Remove ELF_START_MMAP and image_info.start_mmap
  linux-user: Show vdso address in /proc/pid/maps

 include/qemu/selfmap.h |   4 +-
 linux-user/qemu.h  |   2 +-
 linux-user/elfload.c   |  41 +
 linux-user/syscall.c   | 194 +
 util/selfmap.c |  12 +--
 5 files changed, 131 insertions(+), 122 deletions(-)

-- 
2.34.1




[PATCH 4/6] linux-user: Show heap address in /proc/pid/maps

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 658c276e39..5c0fb20e19 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8125,6 +8125,8 @@ static void open_self_maps_4(const struct 
open_self_maps_data *d,
 
 if (test_stack(start, end, info->stack_limit)) {
 path = "[stack]";
+} else if (start == info->brk) {
+path = "[heap]";
 }
 
 /* Except null device (MAP_ANON), adjust offset for this fragment. */
-- 
2.34.1




[PATCH 5/6] linux-user: Remove ELF_START_MMAP and image_info.start_mmap

2023-08-16 Thread Richard Henderson
The start_mmap value is write-only.
Remove the field and the defines that populated it.
Logically, this has been replaced by task_unmapped_base.

Signed-off-by: Richard Henderson 
---
 linux-user/qemu.h|  1 -
 linux-user/elfload.c | 38 --
 2 files changed, 39 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 4f8b55e2fb..12f638336a 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -30,7 +30,6 @@ struct image_info {
 abi_ulong   start_data;
 abi_ulong   end_data;
 abi_ulong   brk;
-abi_ulong   start_mmap;
 abi_ulong   start_stack;
 abi_ulong   stack_limit;
 abi_ulong   entry;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ab11f141c3..a670a7817a 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -156,8 +156,6 @@ static uint32_t get_elf_hwcap(void)
 }
 
 #ifdef TARGET_X86_64
-#define ELF_START_MMAP 0x2ab000ULL
-
 #define ELF_CLASS  ELFCLASS64
 #define ELF_ARCH   EM_X86_64
 
@@ -234,8 +232,6 @@ static bool init_guest_commpage(void)
 #endif
 #else
 
-#define ELF_START_MMAP 0x8000
-
 /*
  * This is used to ensure we don't load something for the wrong architecture.
  */
@@ -333,8 +329,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 #ifndef TARGET_AARCH64
 /* 32 bit ARM definitions */
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_ARCHEM_ARM
 #define ELF_CLASS   ELFCLASS32
 #define EXSTACK_DEFAULT true
@@ -606,7 +600,6 @@ static const VdsoImageInfo *vdso_image_info(void)
 
 #else
 /* 64 bit ARM definitions */
-#define ELF_START_MMAP 0x8000
 
 #define ELF_ARCHEM_AARCH64
 #define ELF_CLASS   ELFCLASS64
@@ -802,7 +795,6 @@ static uint32_t get_elf_hwcap2(void)
 #ifdef TARGET_SPARC
 #ifdef TARGET_SPARC64
 
-#define ELF_START_MMAP 0x8000
 #define ELF_HWCAP  (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR | HWCAP_SPARC_SWAP \
 | HWCAP_SPARC_MULDIV | HWCAP_SPARC_V9)
 #ifndef TARGET_ABI32
@@ -814,7 +806,6 @@ static uint32_t get_elf_hwcap2(void)
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_SPARCV9
 #else
-#define ELF_START_MMAP 0x8000
 #define ELF_HWCAP  (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR | HWCAP_SPARC_SWAP \
 | HWCAP_SPARC_MULDIV)
 #define ELF_CLASS   ELFCLASS32
@@ -836,7 +827,6 @@ static inline void init_thread(struct target_pt_regs *regs,
 #ifdef TARGET_PPC
 
 #define ELF_MACHINEPPC_ELF_MACHINE
-#define ELF_START_MMAP 0x8000
 
 #if defined(TARGET_PPC64)
 
@@ -1048,8 +1038,6 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUPPCState *en
 
 #ifdef TARGET_LOONGARCH64
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_LOONGARCH
 #define EXSTACK_DEFAULT true
@@ -1144,8 +1132,6 @@ static uint32_t get_elf_hwcap(void)
 
 #ifdef TARGET_MIPS
 
-#define ELF_START_MMAP 0x8000
-
 #ifdef TARGET_MIPS64
 #define ELF_CLASS   ELFCLASS64
 #else
@@ -1303,8 +1289,6 @@ static uint32_t get_elf_hwcap(void)
 
 #ifdef TARGET_MICROBLAZE
 
-#define ELF_START_MMAP 0x8000
-
 #define elf_check_arch(x) ( (x) == EM_MICROBLAZE || (x) == EM_MICROBLAZE_OLD)
 
 #define ELF_CLASS   ELFCLASS32
@@ -1345,8 +1329,6 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUMBState *env
 
 #ifdef TARGET_NIOS2
 
-#define ELF_START_MMAP 0x8000
-
 #define elf_check_arch(x) ((x) == EM_ALTERA_NIOS2)
 
 #define ELF_CLASS   ELFCLASS32
@@ -1442,8 +1424,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #ifdef TARGET_OPENRISC
 
-#define ELF_START_MMAP 0x0800
-
 #define ELF_ARCH EM_OPENRISC
 #define ELF_CLASS ELFCLASS32
 #define ELF_DATA  ELFDATA2MSB
@@ -1480,8 +1460,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #ifdef TARGET_SH4
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS ELFCLASS32
 #define ELF_ARCH  EM_SH
 
@@ -1562,8 +1540,6 @@ static uint32_t get_elf_hwcap(void)
 
 #ifdef TARGET_CRIS
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS ELFCLASS32
 #define ELF_ARCH  EM_CRIS
 
@@ -1579,8 +1555,6 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 
 #ifdef TARGET_M68K
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS   ELFCLASS32
 #define ELF_ARCHEM_68K
 
@@ -1630,8 +1604,6 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUM68KState *e
 
 #ifdef TARGET_ALPHA
 
-#define ELF_START_MMAP (0x300ULL)
-
 #define ELF_CLASS  ELFCLASS64
 #define ELF_ARCH   EM_ALPHA
 
@@ -1649,8 +1621,6 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 
 #ifdef TARGET_S390X
 
-#define ELF_START_MMAP (0x200ULL)
-
 #define ELF_CLASS  ELFCLASS64
 #define ELF_DATA   ELFDATA2MSB
 #define ELF_ARCH   EM_S390
@@ -1763,7 +1733,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #ifdef TARGET_RISCV
 
-#define ELF_ST

[PATCH 2/6] linux-user: Use walk_memory_regions for open_self_maps

2023-08-16 Thread Richard Henderson
Replace the by-hand method of region identification with
the official user-exec interface.  Cross-check the region
provided to the callback with the interval tree from
read_self_maps().

Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 192 ++-
 1 file changed, 115 insertions(+), 77 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 074262b3ac..658c276e39 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8095,12 +8095,66 @@ static int open_self_cmdline(CPUArchState *cpu_env, int 
fd)
 return 0;
 }
 
-static void show_smaps(int fd, unsigned long size)
-{
-unsigned long page_size_kb = TARGET_PAGE_SIZE >> 10;
-unsigned long size_kb = size >> 10;
+struct open_self_maps_data {
+TaskState *ts;
+IntervalTreeRoot *host_maps;
+int fd;
+bool smaps;
+};
 
-dprintf(fd, "Size:  %lu kB\n"
+/*
+ * Subroutine to output one line of /proc/self/maps,
+ * or one region of /proc/self/smaps.
+ */
+
+#ifdef TARGET_HPPA
+# define test_stack(S, E, L)  (E == L)
+#else
+# define test_stack(S, E, L)  (S == L)
+#endif
+
+static void open_self_maps_4(const struct open_self_maps_data *d,
+ const MapInfo *mi, abi_ptr start,
+ abi_ptr end, unsigned flags)
+{
+const struct image_info *info = d->ts->info;
+const char *path = mi->path;
+uint64_t offset;
+int fd = d->fd;
+int count;
+
+if (test_stack(start, end, info->stack_limit)) {
+path = "[stack]";
+}
+
+/* Except null device (MAP_ANON), adjust offset for this fragment. */
+offset = mi->offset;
+if (mi->dev) {
+uintptr_t hstart = (uintptr_t)g2h_untagged(start);
+offset += hstart - mi->itree.start;
+}
+
+count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
+" %c%c%c%c %08" PRIx64 " %02x:%02x %"PRId64,
+start, end,
+(flags & PAGE_READ) ? 'r' : '-',
+(flags & PAGE_WRITE_ORG) ? 'w' : '-',
+(flags & PAGE_EXEC) ? 'x' : '-',
+mi->is_priv ? 'p' : 's',
+offset, major(mi->dev), minor(mi->dev),
+(uint64_t)mi->inode);
+if (path) {
+dprintf(fd, "%*s%s\n", 73 - count, "", path);
+} else {
+dprintf(fd, "\n");
+}
+
+if (d->smaps) {
+unsigned long size = end - start;
+unsigned long page_size_kb = TARGET_PAGE_SIZE >> 10;
+unsigned long size_kb = size >> 10;
+
+dprintf(fd, "Size:  %lu kB\n"
 "KernelPageSize:%lu kB\n"
 "MMUPageSize:   %lu kB\n"
 "Rss:   0 kB\n"
@@ -8121,91 +8175,75 @@ static void show_smaps(int fd, unsigned long size)
 "Swap:  0 kB\n"
 "SwapPss:   0 kB\n"
 "Locked:0 kB\n"
-"THPeligible:0\n", size_kb, page_size_kb, page_size_kb);
+"THPeligible:0\n"
+"VmFlags:%s%s%s%s%s%s%s%s\n",
+size_kb, page_size_kb, page_size_kb,
+(flags & PAGE_READ) ? " rd" : "",
+(flags & PAGE_WRITE_ORG) ? " wr" : "",
+(flags & PAGE_EXEC) ? " ex" : "",
+mi->is_priv ? "" : " sh",
+(flags & PAGE_READ) ? " mr" : "",
+(flags & PAGE_WRITE_ORG) ? " mw" : "",
+(flags & PAGE_EXEC) ? " me" : "",
+mi->is_priv ? "" : " ms");
+}
 }
 
-static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
+/*
+ * Callback for walk_memory_regions, when read_self_maps() fails.
+ * Proceed without the benefit of host /proc/self/maps cross-check.
+ */
+static int open_self_maps_3(void *opaque, target_ulong guest_start,
+target_ulong guest_end, unsigned long flags)
 {
-CPUState *cpu = env_cpu(cpu_env);
-TaskState *ts = cpu->opaque;
-IntervalTreeRoot *map_info = read_self_maps();
-IntervalTreeNode *s;
-int count;
+static const MapInfo mi = { .is_priv = true };
 
-for (s = interval_tree_iter_first(map_info, 0, -1); s;
- s = interval_tree_iter_next(s, 0, -1)) {
-MapInfo *e = container_of(s, MapInfo, itree);
+open_self_maps_4(opaque, &mi, guest_start, guest_end, flags);
+return 0;
+}
 
-if (h2g_valid(e->itree.start)) {
-unsigned long min = e->itree.start;
-unsigned long max = e->itree.last + 1;
-int flags = page_get_flags(h2g(min));
-const char *path;
+/*
+ * Callback for walk_memory_regions, when read_self_maps() succeeds.
+ */
+static int open_self_maps_2(void *opaque, target_ulong guest_start,
+target_ulong guest_end, unsigned long flags)
+{
+const struct open_self_maps_data *d = opaque;
+uintp

[PATCH 6/6] linux-user: Show vdso address in /proc/pid/maps

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/qemu.h| 1 +
 linux-user/elfload.c | 1 +
 linux-user/syscall.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 12f638336a..4de9ec783f 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -32,6 +32,7 @@ struct image_info {
 abi_ulong   brk;
 abi_ulong   start_stack;
 abi_ulong   stack_limit;
+abi_ulong   vdso;
 abi_ulong   entry;
 abi_ulong   code_offset;
 abi_ulong   data_offset;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index a670a7817a..12285eae82 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3726,6 +3726,7 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 const VdsoImageInfo *vdso = vdso_image_info();
 if (vdso) {
 load_elf_vdso(&vdso_info, vdso);
+info->vdso = vdso_info.load_bias;
 } else if (TARGET_ARCH_HAS_SIGTRAMP_PAGE) {
 abi_long tramp_page = target_mmap(0, TARGET_PAGE_SIZE,
   PROT_READ | PROT_WRITE,
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 5c0fb20e19..c85cf6ffb9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8127,6 +8127,8 @@ static void open_self_maps_4(const struct 
open_self_maps_data *d,
 path = "[stack]";
 } else if (start == info->brk) {
 path = "[heap]";
+} else if (start == info->vdso) {
+path = "[vdso]";
 }
 
 /* Except null device (MAP_ANON), adjust offset for this fragment. */
-- 
2.34.1




[PATCH 3/6] linux-user: Adjust brk for load_bias

2023-08-16 Thread Richard Henderson
PIE executables are usually linked at offset 0 and are
relocated somewhere during load.  The hiaddr needs to
be adjusted to keep the brk next to the executable.

Cc: qemu-sta...@nongnu.org
Fixes: 1f356e8c013 ("linux-user: Adjust initial brk when interpreter is close 
to executable")
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ccfbf82836..ab11f141c3 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3278,7 +3278,7 @@ static void load_elf_image(const char *image_name, const 
ImageSource *src,
 info->start_data = -1;
 info->end_data = 0;
 /* Usual start for brk is after all sections of the main executable. */
-info->brk = TARGET_PAGE_ALIGN(hiaddr);
+info->brk = TARGET_PAGE_ALIGN(hiaddr + load_bias);
 info->elf_flags = ehdr->e_flags;
 
 prot_exec = PROT_EXEC;
-- 
2.34.1




Re: [PATCH v3 00/15] linux-user: Implement VDSOs

2023-08-16 Thread Richard Henderson

On 8/14/23 02:52, Alex Bennée wrote:


Richard Henderson  writes:


It's time for another round on implementing the VDSO for linux-user.
We are now seeing applications built that absolutely require it,
and have no fallback for the VDSO to be absent.


Something broke configure for me:

   ../../configure --disable-docs --disable-system


Gave:

   Dependency glib-2.0 found: YES 2.74.6 (overridden)
   Program indent found: NO

   ../../linux-user/hppa/meson.build:7:0: ERROR: File vdso.so does not exist.

   A full log can be found at 
/home/alex/lsrc/qemu.git/builds/user/meson-logs/meson-log.txt
   FAILED: build.ninja
   /home/alex/lsrc/qemu.git/builds/user/pyvenv/bin/meson --internal regenerate 
/home/alex/lsrc/qemu.git /home/alex/lsrc/qemu.git/builds/user
   ninja: error: rebuilding 'build.ninja': subcommand failed
   make: Nothing to be done for 'all'.

Will there be linux-user targets that never support vdso?



Something must have gone wrong with the email, or patch apply, because that file should 
definitely exist.  I have just sent a v4.  If that breaks too, let me know but then please 
just pull from my branch.



r~



[PATCH v4 04/18] linux-user: Use ImageSource in load_elf_image

2023-08-16 Thread Richard Henderson
Change parse_elf_properties as well, as the bprm_buf argument
ties the two functions closely.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 128 +--
 1 file changed, 49 insertions(+), 79 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 11bbf4e99b..f3511ae766 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2922,10 +2922,9 @@ static bool parse_elf_property(const uint32_t *data, int 
*off, int datasz,
 }
 
 /* Process NT_GNU_PROPERTY_TYPE_0. */
-static bool parse_elf_properties(int image_fd,
+static bool parse_elf_properties(const ImageSource *src,
  struct image_info *info,
  const struct elf_phdr *phdr,
- char bprm_buf[BPRM_BUF_SIZE],
  Error **errp)
 {
 union {
@@ -2953,14 +2952,8 @@ static bool parse_elf_properties(int image_fd,
 return false;
 }
 
-if (phdr->p_offset + n <= BPRM_BUF_SIZE) {
-memcpy(¬e, bprm_buf + phdr->p_offset, n);
-} else {
-ssize_t len = pread(image_fd, ¬e, n, phdr->p_offset);
-if (len != n) {
-error_setg_errno(errp, errno, "Error reading file header");
-return false;
-}
+if (!imgsrc_read(¬e, phdr->p_offset, n, src, errp)) {
+return false;
 }
 
 /*
@@ -3006,30 +2999,34 @@ static bool parse_elf_properties(int image_fd,
 }
 }
 
-/* Load an ELF image into the address space.
+/**
+ * load_elf_image: Load an ELF image into the address space.
+ * @image_name: the filename of the image, to use in error messages.
+ * @src: the ImageSource from which to read.
+ * @info: info collected from the loaded image.
+ * @ehdr: the ELF header, not yet bswapped.
+ * @pinterp_name: record any PT_INTERP string found.
+ *
+ * On return: @info values will be filled in, as necessary or available.
+ */
 
-   IMAGE_NAME is the filename of the image, to use in error messages.
-   IMAGE_FD is the open file descriptor for the image.
-
-   BPRM_BUF is a copy of the beginning of the file; this of course
-   contains the elf file header at offset 0.  It is assumed that this
-   buffer is sufficiently aligned to present no problems to the host
-   in accessing data at aligned offsets within the buffer.
-
-   On return: INFO values will be filled in, as necessary or available.  */
-
-static void load_elf_image(const char *image_name, int image_fd,
+static void load_elf_image(const char *image_name, const ImageSource *src,
struct image_info *info, struct elfhdr *ehdr,
-   char **pinterp_name,
-   char bprm_buf[BPRM_BUF_SIZE])
+   char **pinterp_name)
 {
-struct elf_phdr *phdr;
+g_autofree struct elf_phdr *phdr = NULL;
 abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
-int i, retval, prot_exec;
+int i, prot_exec;
 Error *err = NULL;
 
-/* First of all, some simple consistency checks */
-memcpy(ehdr, bprm_buf, sizeof(*ehdr));
+/*
+ * First of all, some simple consistency checks.
+ * Note that we rely on the bswapped ehdr staying in bprm_buf,
+ * for later use by load_elf_binary and create_elf_tables.
+ */
+if (!imgsrc_read(ehdr, 0, sizeof(*ehdr), src, &err)) {
+goto exit_errmsg;
+}
 if (!elf_check_ident(ehdr)) {
 error_setg(&err, "Invalid ELF image for this architecture");
 goto exit_errmsg;
@@ -3040,15 +3037,11 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 goto exit_errmsg;
 }
 
-i = ehdr->e_phnum * sizeof(struct elf_phdr);
-if (ehdr->e_phoff + i <= BPRM_BUF_SIZE) {
-phdr = (struct elf_phdr *)(bprm_buf + ehdr->e_phoff);
-} else {
-phdr = (struct elf_phdr *) alloca(i);
-retval = pread(image_fd, phdr, i, ehdr->e_phoff);
-if (retval != i) {
-goto exit_read;
-}
+phdr = imgsrc_read_alloc(ehdr->e_phoff,
+ ehdr->e_phnum * sizeof(struct elf_phdr),
+ src, &err);
+if (phdr == NULL) {
+goto exit_errmsg;
 }
 bswap_phdr(phdr, ehdr->e_phnum);
 
@@ -3085,17 +3078,10 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 goto exit_errmsg;
 }
 
-interp_name = g_malloc(eppnt->p_filesz);
-
-if (eppnt->p_offset + eppnt->p_filesz <= BPRM_BUF_SIZE) {
-memcpy(interp_name, bprm_buf + eppnt->p_offset,
-   eppnt->p_filesz);
-} else {
-retval = pread(image_fd, interp_name, eppnt->p_filesz,
-   eppnt->p_offset);
-if (retval != eppnt->p_filesz) {
-goto exit_read;
-}
+interp_name = imgsrc_read_alloc(eppnt->p_offset, eppnt->p_filesz,
+

[PATCH v4 03/18] linux-user: Do not clobber bprm_buf swapping ehdr

2023-08-16 Thread Richard Henderson
Rearrange the allocation of storage for ehdr between load_elf_image
and load_elf_binary.  The same set of copies are done, but we don't
modify bprm_buf, which will be important later.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ac03beb01b..11bbf4e99b 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3019,16 +3019,17 @@ static bool parse_elf_properties(int image_fd,
On return: INFO values will be filled in, as necessary or available.  */
 
 static void load_elf_image(const char *image_name, int image_fd,
-   struct image_info *info, char **pinterp_name,
+   struct image_info *info, struct elfhdr *ehdr,
+   char **pinterp_name,
char bprm_buf[BPRM_BUF_SIZE])
 {
-struct elfhdr *ehdr = (struct elfhdr *)bprm_buf;
 struct elf_phdr *phdr;
 abi_ulong load_addr, load_bias, loaddr, hiaddr, error;
 int i, retval, prot_exec;
 Error *err = NULL;
 
 /* First of all, some simple consistency checks */
+memcpy(ehdr, bprm_buf, sizeof(*ehdr));
 if (!elf_check_ident(ehdr)) {
 error_setg(&err, "Invalid ELF image for this architecture");
 goto exit_errmsg;
@@ -3343,6 +3344,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 static void load_elf_interp(const char *filename, struct image_info *info,
 char bprm_buf[BPRM_BUF_SIZE])
 {
+struct elfhdr ehdr;
 int fd, retval;
 Error *err = NULL;
 
@@ -3364,7 +3366,7 @@ static void load_elf_interp(const char *filename, struct 
image_info *info,
 memset(bprm_buf + retval, 0, BPRM_BUF_SIZE - retval);
 }
 
-load_elf_image(filename, fd, info, NULL, bprm_buf);
+load_elf_image(filename, fd, info, &ehdr, NULL, bprm_buf);
 }
 
 static int symfind(const void *s0, const void *s1)
@@ -3557,8 +3559,14 @@ uint32_t get_elf_eflags(int fd)
 
 int load_elf_binary(struct linux_binprm *bprm, struct image_info *info)
 {
+/*
+ * We need a copy of the elf header for passing to create_elf_tables.
+ * We will have overwritten the original when we re-use bprm->buf
+ * while loading the interpreter.  Allocate the storage for this now
+ * and let elf_load_image do any swapping that may be required.
+ */
+struct elfhdr ehdr;
 struct image_info interp_info;
-struct elfhdr elf_ex;
 char *elf_interpreter = NULL;
 char *scratch;
 
@@ -3570,12 +3578,7 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 info->start_mmap = (abi_ulong)ELF_START_MMAP;
 
 load_elf_image(bprm->filename, bprm->fd, info,
-   &elf_interpreter, bprm->buf);
-
-/* ??? We need a copy of the elf header for passing to create_elf_tables.
-   If we do nothing, we'll have overwritten this when we re-use bprm->buf
-   when we load the interpreter.  */
-elf_ex = *(struct elfhdr *)bprm->buf;
+   &ehdr, &elf_interpreter, bprm->buf);
 
 /* Do this so that we can load the interpreter, if need be.  We will
change some of these later */
@@ -3662,7 +3665,7 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 target_mprotect(tramp_page, TARGET_PAGE_SIZE, PROT_READ | PROT_EXEC);
 }
 
-bprm->p = create_elf_tables(bprm->p, bprm->argc, bprm->envc, &elf_ex,
+bprm->p = create_elf_tables(bprm->p, bprm->argc, bprm->envc, &ehdr,
 info, (elf_interpreter ? &interp_info : NULL));
 info->start_stack = bprm->p;
 
-- 
2.34.1




[PATCH v4 09/18] linux-user/i386: Add vdso

2023-08-16 Thread Richard Henderson
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1267
Signed-off-by: Richard Henderson 
---
 linux-user/i386/vdso-asmoffset.h |   6 ++
 linux-user/elfload.c |  16 +++-
 linux-user/i386/signal.c |  11 +++
 linux-user/i386/Makefile.vdso|   5 ++
 linux-user/i386/meson.build  |   7 ++
 linux-user/i386/vdso.S   | 143 +++
 linux-user/i386/vdso.ld  |  76 
 linux-user/i386/vdso.so  | Bin 0 -> 2672 bytes
 8 files changed, 262 insertions(+), 2 deletions(-)
 create mode 100644 linux-user/i386/vdso-asmoffset.h
 create mode 100644 linux-user/i386/Makefile.vdso
 create mode 100644 linux-user/i386/vdso.S
 create mode 100644 linux-user/i386/vdso.ld
 create mode 100755 linux-user/i386/vdso.so

diff --git a/linux-user/i386/vdso-asmoffset.h b/linux-user/i386/vdso-asmoffset.h
new file mode 100644
index 00..4e5ee0dd49
--- /dev/null
+++ b/linux-user/i386/vdso-asmoffset.h
@@ -0,0 +1,6 @@
+/*
+ * offsetof(struct sigframe, sc.eip)
+ * offsetof(struct rt_sigframe, uc.tuc_mcontext.eip)
+ */
+#define SIGFRAME_SIGCONTEXT_eip  64
+#define RT_SIGFRAME_SIGCONTEXT_eip  220
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f94963638a..7e02765954 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -309,12 +309,24 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUX86State *en
 (*regs)[15] = tswapreg(env->regs[R_ESP]);
 (*regs)[16] = tswapreg(env->segs[R_SS].selector & 0x);
 }
-#endif
+
+/*
+ * i386 is the only target which supplies AT_SYSINFO for the vdso.
+ * All others only supply AT_SYSINFO_EHDR.
+ */
+#define DLINFO_ARCH_ITEMS 1
+#define ARCH_DLINFO   NEW_AUX_ENT(AT_SYSINFO, vdso_info->entry);
+
+#include "vdso.c.inc"
+
+#define vdso_image_info()&vdso_image_info
+
+#endif /* TARGET_X86_64 */
 
 #define USE_ELF_CORE_DUMP
 #define ELF_EXEC_PAGESIZE   4096
 
-#endif
+#endif /* TARGET_I386 */
 
 #ifdef TARGET_ARM
 
diff --git a/linux-user/i386/signal.c b/linux-user/i386/signal.c
index 60fa07d6f9..bc5d45302e 100644
--- a/linux-user/i386/signal.c
+++ b/linux-user/i386/signal.c
@@ -214,6 +214,17 @@ struct rt_sigframe {
 };
 #define TARGET_RT_SIGFRAME_FXSAVE_OFFSET ( \
 offsetof(struct rt_sigframe, fpstate) + TARGET_FPSTATE_FXSAVE_OFFSET)
+
+/*
+ * Verify that vdso-asmoffset.h constants match.
+ */
+#include "i386/vdso-asmoffset.h"
+
+QEMU_BUILD_BUG_ON(offsetof(struct sigframe, sc.eip)
+  != SIGFRAME_SIGCONTEXT_eip);
+QEMU_BUILD_BUG_ON(offsetof(struct rt_sigframe, uc.tuc_mcontext.eip)
+  != RT_SIGFRAME_SIGCONTEXT_eip);
+
 #else
 
 struct rt_sigframe {
diff --git a/linux-user/i386/Makefile.vdso b/linux-user/i386/Makefile.vdso
new file mode 100644
index 00..26bc993cda
--- /dev/null
+++ b/linux-user/i386/Makefile.vdso
@@ -0,0 +1,5 @@
+CROSS_CC ?= i686-linux-gnu-gcc
+
+vdso.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC) -m32 -nostdlib -shared -Wl,-T,vdso.ld -Wl,--build-id=sha1 \
+ -Wl,-h,linux-gate.so.1 -Wl,--hash-style=both vdso.S -o $@
diff --git a/linux-user/i386/meson.build b/linux-user/i386/meson.build
index ee523019a5..b729d73686 100644
--- a/linux-user/i386/meson.build
+++ b/linux-user/i386/meson.build
@@ -3,3 +3,10 @@ syscall_nr_generators += {
 arguments: [ meson.current_source_dir() / 'syscallhdr.sh', 
'@INPUT@', '@OUTPUT@', '@EXTRA_ARGS@' ],
 output: '@BASENAME@_nr.h')
 }
+
+gen = [
+  gen_vdso.process('vdso.so', extra_args: ['-s', '__kernel_sigreturn',
+   '-r', '__kernel_rt_sigreturn'])
+]
+
+linux_user_ss.add(when: 'TARGET_I386', if_true: gen)
diff --git a/linux-user/i386/vdso.S b/linux-user/i386/vdso.S
new file mode 100644
index 00..e7a1f333a1
--- /dev/null
+++ b/linux-user/i386/vdso.S
@@ -0,0 +1,143 @@
+/*
+ * i386 linux replacement vdso.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include "vdso-asmoffset.h"
+
+.macro endf name
+   .globl  \name
+   .type   \name, @function
+   .size   \name, . - \name
+.endm
+
+.macro vdso_syscall1 name, nr
+\name:
+   .cfi_startproc
+   mov %ebx, %edx
+   .cfi_register %ebx, %edx
+   mov 4(%esp), %ebx
+   mov $\nr, %eax
+   int $0x80
+   mov %edx, %ebx
+   ret
+   .cfi_endproc
+endf   \name
+.endm
+
+.macro vdso_syscall2 name, nr
+\name:
+   .cfi_startproc
+   mov %ebx, %edx
+   .cfi_register %ebx, %edx
+   mov 4(%esp), %ebx
+   mov 8(%esp), %ecx
+   mov $\nr, %eax
+   int $0x80
+   mov %edx, %ebx
+   ret
+   .cfi_endproc
+endf   \name
+.endm
+
+.macro vdso_syscall3 name, nr
+\name:
+   .cfi_startproc
+   push%ebx
+   .cfi_adjust_cfa_offset 4
+   .cfi_rel_offset %ebx, 0
+   mov 8(%esp), %ebx
+   mov 12(%esp), %ecx
+

[PATCH v4 13/18] linux-user/hppa: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/hppa/vdso-asmoffset.h |  12 +++
 linux-user/elfload.c |   4 +
 linux-user/hppa/signal.c |  24 +++--
 linux-user/hppa/Makefile.vdso|   6 ++
 linux-user/hppa/meson.build  |   6 ++
 linux-user/hppa/vdso.S   | 165 +++
 linux-user/hppa/vdso.ld  |  77 +++
 linux-user/hppa/vdso.so  | Bin 0 -> 2104 bytes
 8 files changed, 284 insertions(+), 10 deletions(-)
 create mode 100644 linux-user/hppa/vdso-asmoffset.h
 create mode 100644 linux-user/hppa/Makefile.vdso
 create mode 100644 linux-user/hppa/vdso.S
 create mode 100644 linux-user/hppa/vdso.ld
 create mode 100755 linux-user/hppa/vdso.so

diff --git a/linux-user/hppa/vdso-asmoffset.h b/linux-user/hppa/vdso-asmoffset.h
new file mode 100644
index 00..c8b40c0332
--- /dev/null
+++ b/linux-user/hppa/vdso-asmoffset.h
@@ -0,0 +1,12 @@
+#define sizeof_rt_sigframe  584
+#define offsetof_sigcontext 160
+#define offsetof_sigcontext_gr  0x4
+#define offsetof_sigcontext_fr  0x88
+#define offsetof_sigcontext_iaoq0x190
+#define offsetof_sigcontext_sar 0x198
+
+/* arch/parisc/include/asm/rt_sigframe.h */
+#define SIGFRAME64
+#define FUNCTIONCALLFRAME   48
+#define PARISC_RT_SIGFRAME_SIZE32 \
+(((sizeof_rt_sigframe) + FUNCTIONCALLFRAME + SIGFRAME) & -SIGFRAME)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 8c2ca3520f..b15d247746 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1790,6 +1790,10 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 #define STACK_GROWS_DOWN 0
 #define STACK_ALIGNMENT  64
 
+#include "vdso.c.inc"
+
+#define vdso_image_info()&vdso_image_info
+
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
 {
diff --git a/linux-user/hppa/signal.c b/linux-user/hppa/signal.c
index f253a15864..ada22556c1 100644
--- a/linux-user/hppa/signal.c
+++ b/linux-user/hppa/signal.c
@@ -21,6 +21,7 @@
 #include "user-internals.h"
 #include "signal-common.h"
 #include "linux-user/trace.h"
+#include "vdso-asmoffset.h"
 
 struct target_sigcontext {
 abi_ulong sc_flags;
@@ -47,6 +48,19 @@ struct target_rt_sigframe {
 /* hidden location of upper halves of pa2.0 64-bit gregs */
 };
 
+QEMU_BUILD_BUG_ON(sizeof(struct target_rt_sigframe) != sizeof_rt_sigframe);
+QEMU_BUILD_BUG_ON(offsetof(struct target_rt_sigframe, uc.tuc_mcontext)
+  != offsetof_sigcontext);
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, sc_gr)
+  != offsetof_sigcontext_gr);
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, sc_fr)
+  != offsetof_sigcontext_fr);
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, sc_iaoq)
+  != offsetof_sigcontext_iaoq);
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, sc_sar)
+  != offsetof_sigcontext_sar);
+
+
 static void setup_sigcontext(struct target_sigcontext *sc, CPUArchState *env)
 {
 int i;
@@ -91,16 +105,6 @@ static void restore_sigcontext(CPUArchState *env, struct 
target_sigcontext *sc)
 __get_user(env->cr[CR_SAR], &sc->sc_sar);
 }
 
-#if TARGET_ABI_BITS == 32
-#define SIGFRAME64
-#define FUNCTIONCALLFRAME   48
-#else
-#define SIGFRAME128
-#define FUNCTIONCALLFRAME   96
-#endif
-#define PARISC_RT_SIGFRAME_SIZE32 \
-((sizeof(struct target_rt_sigframe) + FUNCTIONCALLFRAME + SIGFRAME) & 
-SIGFRAME)
-
 void setup_rt_frame(int sig, struct target_sigaction *ka,
 target_siginfo_t *info,
 target_sigset_t *set, CPUArchState *env)
diff --git a/linux-user/hppa/Makefile.vdso b/linux-user/hppa/Makefile.vdso
new file mode 100644
index 00..cf96f6430c
--- /dev/null
+++ b/linux-user/hppa/Makefile.vdso
@@ -0,0 +1,6 @@
+CROSS_CC ?= hppa-linux-gnu-gcc
+
+vdso.so: vdso.S vdso.ld vdso-asmoffset.h Makefile.vdso
+   $(CROSS_CC) -nostdlib -shared -Wl,-T,vdso.ld \
+ -Wl,-h,linux-vdso32.so.1 -Wl,--build-id=sha1 \
+ -Wl,--hash-style=both vdso.S -o $@
diff --git a/linux-user/hppa/meson.build b/linux-user/hppa/meson.build
index 4709508a09..da447da745 100644
--- a/linux-user/hppa/meson.build
+++ b/linux-user/hppa/meson.build
@@ -3,3 +3,9 @@ syscall_nr_generators += {
 arguments: [ meson.current_source_dir() / 'syscallhdr.sh', 
'@INPUT@', '@OUTPUT@', '@EXTRA_ARGS@' ],
 output: '@BASENAME@_nr.h')
 }
+
+gen = [
+  gen_vdso.process('vdso.so', extra_args: ['-r', '__kernel_sigtramp_rt'])
+]
+
+linux_user_ss.add(when: 'TARGET_HPPA', if_true: gen)
diff --git a/linux-user/hppa/vdso.S b/linux-user/hppa/vdso.S
new file mode 100644
index 00..5be14d2f70
--- /dev/null
+++ b/linux-user/hppa/vdso.S
@@ -0,0 +1,165 @@
+/*
+ * hppa linux kernel vdso replacement.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ 

[PATCH v4 14/18] linux-user/riscv: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/riscv/vdso-asmoffset.h |   9 ++
 linux-user/elfload.c  |   4 +
 linux-user/riscv/signal.c |   8 ++
 linux-user/meson.build|   1 +
 linux-user/riscv/Makefile.vdso|  11 ++
 linux-user/riscv/meson.build  |   9 ++
 linux-user/riscv/vdso-32.so   | Bin 0 -> 2652 bytes
 linux-user/riscv/vdso-64.so   | Bin 0 -> 3528 bytes
 linux-user/riscv/vdso.S   | 186 ++
 linux-user/riscv/vdso.ld  |  74 
 10 files changed, 302 insertions(+)
 create mode 100644 linux-user/riscv/vdso-asmoffset.h
 create mode 100644 linux-user/riscv/Makefile.vdso
 create mode 100644 linux-user/riscv/meson.build
 create mode 100755 linux-user/riscv/vdso-32.so
 create mode 100755 linux-user/riscv/vdso-64.so
 create mode 100644 linux-user/riscv/vdso.S
 create mode 100644 linux-user/riscv/vdso.ld

diff --git a/linux-user/riscv/vdso-asmoffset.h 
b/linux-user/riscv/vdso-asmoffset.h
new file mode 100644
index 00..123902ef61
--- /dev/null
+++ b/linux-user/riscv/vdso-asmoffset.h
@@ -0,0 +1,9 @@
+#ifdef TARGET_ABI32
+# define sizeof_rt_sigframe 0x2b0
+# define offsetof_uc_mcontext   0x120
+# define offsetof_freg0 0x80
+#else
+# define sizeof_rt_sigframe 0x340
+# define offsetof_uc_mcontext   0x130
+# define offsetof_freg0 0x100
+#endif
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index b15d247746..c9cba730de 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1752,8 +1752,10 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs,
 
 #ifdef TARGET_RISCV32
 #define ELF_CLASS ELFCLASS32
+#include "vdso-32.c.inc"
 #else
 #define ELF_CLASS ELFCLASS64
+#include "vdso-64.c.inc"
 #endif
 
 #define ELF_HWCAP get_elf_hwcap()
@@ -1770,6 +1772,8 @@ static uint32_t get_elf_hwcap(void)
 #undef MISA_BIT
 }
 
+#define vdso_image_info()&vdso_image_info
+
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
 {
diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index eaa168199a..5449c7618a 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -21,6 +21,7 @@
 #include "user-internals.h"
 #include "signal-common.h"
 #include "linux-user/trace.h"
+#include "vdso-asmoffset.h"
 
 /* Signal handler invocation must be transparent for the code being
interrupted. Complete CPU (hart) state is saved on entry and restored
@@ -37,6 +38,8 @@ struct target_sigcontext {
 uint32_t fcsr;
 }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
 
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, fpr) != offsetof_freg0);
+
 struct target_ucontext {
 unsigned long uc_flags;
 struct target_ucontext *uc_link;
@@ -51,6 +54,11 @@ struct target_rt_sigframe {
 struct target_ucontext uc;
 };
 
+QEMU_BUILD_BUG_ON(sizeof(struct target_rt_sigframe)
+  != sizeof_rt_sigframe);
+QEMU_BUILD_BUG_ON(offsetof(struct target_rt_sigframe, uc.uc_mcontext)
+  != offsetof_uc_mcontext);
+
 static abi_ulong get_sigframe(struct target_sigaction *ka,
   CPURISCVState *regs, size_t framesize)
 {
diff --git a/linux-user/meson.build b/linux-user/meson.build
index dd24389052..3ff3bc5bbc 100644
--- a/linux-user/meson.build
+++ b/linux-user/meson.build
@@ -45,6 +45,7 @@ subdir('microblaze')
 subdir('mips64')
 subdir('mips')
 subdir('ppc')
+subdir('riscv')
 subdir('s390x')
 subdir('sh4')
 subdir('sparc')
diff --git a/linux-user/riscv/Makefile.vdso b/linux-user/riscv/Makefile.vdso
new file mode 100644
index 00..5ea6166191
--- /dev/null
+++ b/linux-user/riscv/Makefile.vdso
@@ -0,0 +1,11 @@
+CROSS_CC ?= riscv64-linux-gnu-gcc
+LDFLAGS := -nostdlib -shared -Wl,-T,vdso.ld \
+  -Wl,-h,linux-vdso.so.1 -Wl,--hash-style=both -Wl,--build-id=sha1
+
+all: vdso-64.so vdso-32.so
+
+vdso-64.so: vdso.S vdso.ld vdso-asmoffset.h Makefile.vdso
+   $(CROSS_CC) $(LDFLAGS) -mabi=lp64d -march=rv64g -fpic -o $@ vdso.S
+
+vdso-32.so: vdso.S vdso.ld vdso-asmoffset.h Makefile.vdso
+   $(CROSS_CC) $(LDFLAGS) -mabi=ilp32d -march=rv32g -fpic -o $@ vdso.S
diff --git a/linux-user/riscv/meson.build b/linux-user/riscv/meson.build
new file mode 100644
index 00..475b816da1
--- /dev/null
+++ b/linux-user/riscv/meson.build
@@ -0,0 +1,9 @@
+gen32 = [
+  gen_vdso.process('vdso-32.so', extra_args: ['-r', '__vdso_rt_sigreturn']),
+]
+gen64 = [
+  gen_vdso.process('vdso-64.so', extra_args: ['-r', '__vdso_rt_sigreturn'])
+]
+
+linux_user_ss.add(when: 'TARGET_RISCV32', if_true: gen32)
+linux_user_ss.add(when: 'TARGET_RISCV64', if_true: gen64)
diff --git a/linux-user/riscv/vdso-32.so b/linux-user/riscv/vdso-32.so
new file mode 100755
index 
..d6067c0dc8a1d5ccedb113bbe5ea5c3f6839bc64
GIT binary patch
literal 2652
zcmb_eU2GIp6h5=V;!;apS}
z#TY5JA}9h{M66XyQN;gB6~#Xud@#|(pU9&Rz8HNl24gfNBK7-b=GLk9!3WRox96

[PATCH v4 05/18] linux-user: Use ImageSource in load_symbols

2023-08-16 Thread Richard Henderson
Aside from the section headers, we're unlikely to hit the
ImageSource cache on guest executables.  But the interface
for imgsrc_read_* is better.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 87 
 1 file changed, 48 insertions(+), 39 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f3511ae766..19d3cac039 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2048,7 +2048,8 @@ static inline void 
bswap_mips_abiflags(Mips_elf_abiflags_v0 *abiflags) { }
 #ifdef USE_ELF_CORE_DUMP
 static int elf_core_dump(int, const CPUArchState *);
 #endif /* USE_ELF_CORE_DUMP */
-static void load_symbols(struct elfhdr *hdr, int fd, abi_ulong load_bias);
+static void load_symbols(struct elfhdr *hdr, const ImageSource *src,
+ abi_ulong load_bias);
 
 /* Verify the portions of EHDR within E_IDENT for the target.
This can be performed before bswapping the entire header.  */
@@ -3293,7 +3294,7 @@ static void load_elf_image(const char *image_name, const 
ImageSource *src,
 }
 
 if (qemu_log_enabled()) {
-load_symbols(ehdr, src->fd, load_bias);
+load_symbols(ehdr, src, load_bias);
 }
 
 debuginfo_report_elf(image_name, src->fd, load_bias);
@@ -3384,19 +3385,20 @@ static int symcmp(const void *s0, const void *s1)
 }
 
 /* Best attempt to load symbols from this ELF object. */
-static void load_symbols(struct elfhdr *hdr, int fd, abi_ulong load_bias)
+static void load_symbols(struct elfhdr *hdr, const ImageSource *src,
+ abi_ulong load_bias)
 {
 int i, shnum, nsyms, sym_idx = 0, str_idx = 0;
-uint64_t segsz;
-struct elf_shdr *shdr;
+g_autofree struct elf_shdr *shdr = NULL;
 char *strings = NULL;
-struct syminfo *s = NULL;
-struct elf_sym *new_syms, *syms = NULL;
+struct elf_sym *syms = NULL;
+struct elf_sym *new_syms;
+uint64_t segsz;
 
 shnum = hdr->e_shnum;
-i = shnum * sizeof(struct elf_shdr);
-shdr = (struct elf_shdr *)alloca(i);
-if (pread(fd, shdr, i, hdr->e_shoff) != i) {
+shdr = imgsrc_read_alloc(hdr->e_shoff, shnum * sizeof(struct elf_shdr),
+ src, NULL);
+if (shdr == NULL) {
 return;
 }
 
@@ -3414,31 +3416,33 @@ static void load_symbols(struct elfhdr *hdr, int fd, 
abi_ulong load_bias)
 
  found:
 /* Now know where the strtab and symtab are.  Snarf them.  */
-s = g_try_new(struct syminfo, 1);
-if (!s) {
-goto give_up;
-}
 
 segsz = shdr[str_idx].sh_size;
-s->disas_strtab = strings = g_try_malloc(segsz);
-if (!strings ||
-pread(fd, strings, segsz, shdr[str_idx].sh_offset) != segsz) {
+strings = g_try_malloc(segsz);
+if (!strings) {
+goto give_up;
+}
+if (!imgsrc_read(strings, shdr[str_idx].sh_offset, segsz, src, NULL)) {
 goto give_up;
 }
 
 segsz = shdr[sym_idx].sh_size;
-syms = g_try_malloc(segsz);
-if (!syms || pread(fd, syms, segsz, shdr[sym_idx].sh_offset) != segsz) {
-goto give_up;
-}
-
 if (segsz / sizeof(struct elf_sym) > INT_MAX) {
-/* Implausibly large symbol table: give up rather than ploughing
- * on with the number of symbols calculation overflowing
+/*
+ * Implausibly large symbol table: give up rather than ploughing
+ * on with the number of symbols calculation overflowing.
  */
 goto give_up;
 }
 nsyms = segsz / sizeof(struct elf_sym);
+syms = g_try_malloc(segsz);
+if (!syms) {
+goto give_up;
+}
+if (!imgsrc_read(syms, shdr[sym_idx].sh_offset, segsz, src, NULL)) {
+goto give_up;
+}
+
 for (i = 0; i < nsyms; ) {
 bswap_sym(syms + i);
 /* Throw away entries which we do not need.  */
@@ -3463,10 +3467,12 @@ static void load_symbols(struct elfhdr *hdr, int fd, 
abi_ulong load_bias)
 goto give_up;
 }
 
-/* Attempt to free the storage associated with the local symbols
-   that we threw away.  Whether or not this has any effect on the
-   memory allocation depends on the malloc implementation and how
-   many symbols we managed to discard.  */
+/*
+ * Attempt to free the storage associated with the local symbols
+ * that we threw away.  Whether or not this has any effect on the
+ * memory allocation depends on the malloc implementation and how
+ * many symbols we managed to discard.
+ */
 new_syms = g_try_renew(struct elf_sym, syms, nsyms);
 if (new_syms == NULL) {
 goto give_up;
@@ -3475,20 +3481,23 @@ static void load_symbols(struct elfhdr *hdr, int fd, 
abi_ulong load_bias)
 
 qsort(syms, nsyms, sizeof(*syms), symcmp);
 
-s->disas_num_syms = nsyms;
-#if ELF_CLASS == ELFCLASS32
-s->disas_symtab.elf32 = syms;
-#else
-s->disas_symtab.elf64 = syms;
-#endif
-s->lookup_symbol = lookup_symbolxx;
-s->next = syminfos;
-symi

[PATCH v4 16/18] linux-user/ppc: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/ppc/vdso-asmoffset.h |  20 +++
 linux-user/elfload.c|   9 ++
 linux-user/ppc/signal.c |  31 +++--
 linux-user/gen-vdso-elfn.c.inc  |   7 +
 linux-user/ppc/Makefile.vdso|  18 +++
 linux-user/ppc/meson.build  |  12 ++
 linux-user/ppc/vdso-32.ld   |  70 ++
 linux-user/ppc/vdso-32.so   | Bin 0 -> 3020 bytes
 linux-user/ppc/vdso-64.ld   |  68 +
 linux-user/ppc/vdso-64.so   | Bin 0 -> 3896 bytes
 linux-user/ppc/vdso-64le.so | Bin 0 -> 3896 bytes
 linux-user/ppc/vdso.S   | 239 
 12 files changed, 466 insertions(+), 8 deletions(-)
 create mode 100644 linux-user/ppc/vdso-asmoffset.h
 create mode 100644 linux-user/ppc/Makefile.vdso
 create mode 100644 linux-user/ppc/vdso-32.ld
 create mode 100755 linux-user/ppc/vdso-32.so
 create mode 100644 linux-user/ppc/vdso-64.ld
 create mode 100755 linux-user/ppc/vdso-64.so
 create mode 100755 linux-user/ppc/vdso-64le.so
 create mode 100644 linux-user/ppc/vdso.S

diff --git a/linux-user/ppc/vdso-asmoffset.h b/linux-user/ppc/vdso-asmoffset.h
new file mode 100644
index 00..6844c8c81c
--- /dev/null
+++ b/linux-user/ppc/vdso-asmoffset.h
@@ -0,0 +1,20 @@
+/*
+ * Size of dummy stack frame allocated when calling signal handler.
+ * See arch/powerpc/include/asm/ptrace.h.
+ */
+#ifdef TARGET_ABI32
+# define SIGNAL_FRAMESIZE   64
+#else
+# define SIGNAL_FRAMESIZE   128
+#endif
+
+#ifdef TARGET_ABI32
+# define offsetof_sigframe_mcontext 0x20
+# define offsetof_rt_sigframe_mcontext  0x140
+# define offsetof_mcontext_fregs0xc0
+# define offsetof_mcontext_vregs0x1d0
+#else
+# define offsetof_rt_sigframe_mcontext  0xe8
+# define offsetof_mcontext_fregs0x180
+# define offsetof_mcontext_vregs_ptr0x288
+#endif
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 498f5ed07e..48d30caafe 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1035,6 +1035,15 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUPPCState *en
 #define USE_ELF_CORE_DUMP
 #define ELF_EXEC_PAGESIZE   4096
 
+#ifndef TARGET_PPC64
+# include "vdso-32.c.inc"
+#elif TARGET_BIG_ENDIAN
+# include "vdso-64.c.inc"
+#else
+# include "vdso-64le.c.inc"
+#endif
+#define vdso_image_info()&vdso_image_info
+
 #endif
 
 #ifdef TARGET_LOONGARCH64
diff --git a/linux-user/ppc/signal.c b/linux-user/ppc/signal.c
index a616f20efb..7e7302823b 100644
--- a/linux-user/ppc/signal.c
+++ b/linux-user/ppc/signal.c
@@ -21,14 +21,7 @@
 #include "user-internals.h"
 #include "signal-common.h"
 #include "linux-user/trace.h"
-
-/* Size of dummy stack frame allocated when calling signal handler.
-   See arch/powerpc/include/asm/ptrace.h.  */
-#if defined(TARGET_PPC64)
-#define SIGNAL_FRAMESIZE 128
-#else
-#define SIGNAL_FRAMESIZE 64
-#endif
+#include "vdso-asmoffset.h"
 
 /* See arch/powerpc/include/asm/ucontext.h.  Only used for 32-bit PPC;
on 64-bit PPC, sigcontext and mcontext are one and the same.  */
@@ -73,6 +66,16 @@ struct target_mcontext {
 #endif
 };
 
+QEMU_BUILD_BUG_ON(offsetof(struct target_mcontext, mc_fregs)
+  != offsetof_mcontext_fregs);
+#if defined(TARGET_PPC64)
+QEMU_BUILD_BUG_ON(offsetof(struct target_mcontext, v_regs)
+  != offsetof_mcontext_vregs_ptr);
+#else
+QEMU_BUILD_BUG_ON(offsetof(struct target_mcontext, mc_vregs)
+  != offsetof_mcontext_vregs);
+#endif
+
 /* See arch/powerpc/include/asm/sigcontext.h.  */
 struct target_sigcontext {
 target_ulong _unused[4];
@@ -161,6 +164,7 @@ struct target_ucontext {
 #endif
 };
 
+#if !defined(TARGET_PPC64)
 /* See arch/powerpc/kernel/signal_32.c.  */
 struct target_sigframe {
 struct target_sigcontext sctx;
@@ -168,6 +172,10 @@ struct target_sigframe {
 int32_t abigap[56];
 };
 
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigframe, mctx)
+  != offsetof_sigframe_mcontext);
+#endif
+
 #if defined(TARGET_PPC64)
 
 #define TARGET_TRAMP_SIZE 6
@@ -184,6 +192,10 @@ struct target_rt_sigframe {
 char abigap[288];
 } __attribute__((aligned(16)));
 
+QEMU_BUILD_BUG_ON(offsetof(struct target_rt_sigframe,
+   uc.tuc_sigcontext.mcontext)
+  != offsetof_rt_sigframe_mcontext);
+
 #else
 
 struct target_rt_sigframe {
@@ -192,6 +204,9 @@ struct target_rt_sigframe {
 int32_t abigap[56];
 };
 
+QEMU_BUILD_BUG_ON(offsetof(struct target_rt_sigframe, uc.tuc_mcontext)
+  != offsetof_rt_sigframe_mcontext);
+
 #endif
 
 #if defined(TARGET_PPC64)
diff --git a/linux-user/gen-vdso-elfn.c.inc b/linux-user/gen-vdso-elfn.c.inc
index 7034c36d5e..95856eb839 100644
--- a/linux-user/gen-vdso-elfn.c.inc
+++ b/linux-user/gen-vdso-elfn.c.inc
@@ -273,7 +273,14 @@ static void elfN(process)(FILE *outf, void *buf, bool 
need_bswap)
 errors++;
 break;
 
+c

[PATCH v4 00/18] linux-user: Implement VDSOs

2023-08-16 Thread Richard Henderson
Still no integrated cross-compile, however:

Changes for v4:
  * Force all vdso to have a single load segment.
This will prevent problems with varying host/guest page size.
  * Tidy some of the assembly with macros.
  * Implement loongarch, ppc, s390x.

Just in case the list eats a binary:
  https://gitlab.com/rth7680/qemu/-/tree/lu-vdso


r~


Richard Henderson (18):
  linux-user: Introduce imgsrc_read, imgsrc_read_alloc
  linux-user: Tidy loader_exec
  linux-user: Do not clobber bprm_buf swapping ehdr
  linux-user: Use ImageSource in load_elf_image
  linux-user: Use ImageSource in load_symbols
  linux-user: Replace bprm->fd with bprm->src.fd
  linux-user: Load vdso image if available
  linux-user: Add gen-vdso tool
  linux-user/i386: Add vdso
  linux-user/x86_64: Add vdso
  linux-user/aarch64: Add vdso
  linux-user/arm: Add vdso
  linux-user/hppa: Add vdso
  linux-user/riscv: Add vdso
  linux-user/loongarch64: Add vdso
  linux-user/ppc: Add vdso
  linux-user/s390x: Rename __SIGNAL_FRAMESIZE to STACK_FRAME_OVERHEAD
  linux-user/s390x: Add vdso

 linux-user/hppa/vdso-asmoffset.h|  12 +
 linux-user/i386/vdso-asmoffset.h|   6 +
 linux-user/loader.h |  60 +++-
 linux-user/loongarch64/vdso-asmoffset.h |   8 +
 linux-user/ppc/vdso-asmoffset.h |  20 ++
 linux-user/riscv/vdso-asmoffset.h   |   9 +
 linux-user/s390x/vdso-asmoffset.h   |   2 +
 linux-user/arm/signal.c |  30 +-
 linux-user/elfload.c| 381 +++-
 linux-user/flatload.c   |   8 +-
 linux-user/gen-vdso.c   | 223 ++
 linux-user/hppa/signal.c|  24 +-
 linux-user/i386/signal.c|  11 +
 linux-user/linuxload.c  | 137 +++--
 linux-user/loongarch64/signal.c |  17 +-
 linux-user/ppc/signal.c |  31 +-
 linux-user/riscv/signal.c   |   8 +
 linux-user/s390x/signal.c   |   7 +-
 linux-user/gen-vdso-elfn.c.inc  | 314 +++
 linux-user/aarch64/Makefile.vdso|  12 +
 linux-user/aarch64/meson.build  |  12 +
 linux-user/aarch64/vdso-be.so   | Bin 0 -> 3216 bytes
 linux-user/aarch64/vdso-le.so   | Bin 0 -> 3216 bytes
 linux-user/aarch64/vdso.S   |  73 +
 linux-user/aarch64/vdso.ld  |  72 +
 linux-user/arm/Makefile.vdso|  17 ++
 linux-user/arm/meson.build  |  18 ++
 linux-user/arm/vdso-arm-be.so   | Bin 0 -> 2712 bytes
 linux-user/arm/vdso-arm-le.so   | Bin 0 -> 2712 bytes
 linux-user/arm/vdso-thm-be.so   | Bin 0 -> 2684 bytes
 linux-user/arm/vdso-thm-le.so   | Bin 0 -> 2684 bytes
 linux-user/arm/vdso.S   | 193 
 linux-user/arm/vdso.ld  |  67 +
 linux-user/hppa/Makefile.vdso   |   6 +
 linux-user/hppa/meson.build |   6 +
 linux-user/hppa/vdso.S  | 165 ++
 linux-user/hppa/vdso.ld |  77 +
 linux-user/hppa/vdso.so | Bin 0 -> 2104 bytes
 linux-user/i386/Makefile.vdso   |   5 +
 linux-user/i386/meson.build |   7 +
 linux-user/i386/vdso.S  | 143 +
 linux-user/i386/vdso.ld |  76 +
 linux-user/i386/vdso.so | Bin 0 -> 2672 bytes
 linux-user/loongarch64/Makefile.vdso|   7 +
 linux-user/loongarch64/meson.build  |   4 +
 linux-user/loongarch64/vdso.S   | 130 
 linux-user/loongarch64/vdso.ld  |  73 +
 linux-user/loongarch64/vdso.so  | Bin 0 -> 3560 bytes
 linux-user/meson.build  |   9 +-
 linux-user/ppc/Makefile.vdso|  18 ++
 linux-user/ppc/meson.build  |  12 +
 linux-user/ppc/vdso-32.ld   |  70 +
 linux-user/ppc/vdso-32.so   | Bin 0 -> 3020 bytes
 linux-user/ppc/vdso-64.ld   |  68 +
 linux-user/ppc/vdso-64.so   | Bin 0 -> 3896 bytes
 linux-user/ppc/vdso-64le.so | Bin 0 -> 3896 bytes
 linux-user/ppc/vdso.S   | 239 +++
 linux-user/riscv/Makefile.vdso  |  11 +
 linux-user/riscv/meson.build|   9 +
 linux-user/riscv/vdso-32.so | Bin 0 -> 2652 bytes
 linux-user/riscv/vdso-64.so | Bin 0 -> 3528 bytes
 linux-user/riscv/vdso.S | 186 
 linux-user/riscv/vdso.ld|  74 +
 linux-user/s390x/Makefile.vdso  |   5 +
 linux-user/s390x/meson.build|   6 +
 linux-user/s390x/vdso.S |  61 
 linux-user/s390x/vdso.ld|  69 +
 linux-user/s390x/vdso.so| Bin 0 -> 3464 bytes
 linux-user/x86_64/Makefile.vdso |   5 +
 linux-user/x86_64/meson.build   |   6 +
 linux-user/x86_64/vdso.S|  78 +
 linux-user/x86_64/vdso.ld   |  73 +
 linux-u

[PATCH v4 12/18] linux-user/arm: Add vdso

2023-08-16 Thread Richard Henderson
The thumb vdso will only be used for m-profile,
as all of our a-profile cpus support arm mode.

Signed-off-by: Richard Henderson 
---
 linux-user/arm/signal.c   |  30 +++---
 linux-user/elfload.c  |  24 +
 linux-user/arm/Makefile.vdso  |  17 +++
 linux-user/arm/meson.build|  18 
 linux-user/arm/vdso-arm-be.so | Bin 0 -> 2712 bytes
 linux-user/arm/vdso-arm-le.so | Bin 0 -> 2712 bytes
 linux-user/arm/vdso-thm-be.so | Bin 0 -> 2684 bytes
 linux-user/arm/vdso-thm-le.so | Bin 0 -> 2684 bytes
 linux-user/arm/vdso.S | 193 ++
 linux-user/arm/vdso.ld|  67 
 10 files changed, 332 insertions(+), 17 deletions(-)
 create mode 100644 linux-user/arm/Makefile.vdso
 create mode 100755 linux-user/arm/vdso-arm-be.so
 create mode 100755 linux-user/arm/vdso-arm-le.so
 create mode 100755 linux-user/arm/vdso-thm-be.so
 create mode 100755 linux-user/arm/vdso-thm-le.so
 create mode 100644 linux-user/arm/vdso.S
 create mode 100644 linux-user/arm/vdso.ld

diff --git a/linux-user/arm/signal.c b/linux-user/arm/signal.c
index cf99fd7b8a..bd160b113b 100644
--- a/linux-user/arm/signal.c
+++ b/linux-user/arm/signal.c
@@ -167,9 +167,8 @@ setup_return(CPUARMState *env, struct target_sigaction *ka, 
int usig,
 abi_ulong handler = 0;
 abi_ulong handler_fdpic_GOT = 0;
 abi_ulong retcode;
-int thumb, retcode_idx;
-int is_fdpic = info_is_fdpic(((TaskState *)thread_cpu->opaque)->info);
-bool copy_retcode;
+bool thumb;
+bool is_fdpic = info_is_fdpic(((TaskState *)thread_cpu->opaque)->info);
 
 if (is_fdpic) {
 /* In FDPIC mode, ka->_sa_handler points to a function
@@ -184,9 +183,7 @@ setup_return(CPUARMState *env, struct target_sigaction *ka, 
int usig,
 } else {
 handler = ka->_sa_handler;
 }
-
 thumb = handler & 1;
-retcode_idx = thumb + (ka->sa_flags & TARGET_SA_SIGINFO ? 2 : 0);
 
 uint32_t cpsr = cpsr_read(env);
 
@@ -202,24 +199,23 @@ setup_return(CPUARMState *env, struct target_sigaction 
*ka, int usig,
 cpsr &= ~CPSR_E;
 }
 
+/* Our vdso default_sigreturn label is a table of entry points. */
+int idx = is_fdpic * 2 + ((ka->sa_flags & TARGET_SA_SIGINFO) != 0);
+retcode = default_sigreturn + idx * 16;
+
+/*
+ * Put the sigreturn code on the stack no matter which return
+ * mechanism we use in order to remain ABI compliant.
+ */
+memcpy(frame->retcode, g2h_untagged(retcode & ~1), 16);
+
 if (ka->sa_flags & TARGET_SA_RESTORER) {
 if (is_fdpic) {
+/* Place the function descriptor in slot 3. */
 __put_user((abi_ulong)ka->sa_restorer, &frame->retcode[3]);
-retcode = (sigreturn_fdpic_tramp +
-   retcode_idx * RETCODE_BYTES + thumb);
-copy_retcode = true;
 } else {
 retcode = ka->sa_restorer;
-copy_retcode = false;
 }
-} else {
-retcode = default_sigreturn + retcode_idx * RETCODE_BYTES + thumb;
-copy_retcode = true;
-}
-
-/* Copy the code to the stack slot for ABI compatibility. */
-if (copy_retcode) {
-memcpy(frame->retcode, g2h_untagged(retcode & ~1), RETCODE_BYTES);
 }
 
 env->regs[0] = usig;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 98cb1ff053..8c2ca3520f 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -580,6 +580,30 @@ static const char *get_elf_platform(void)
 #undef END
 }
 
+#if TARGET_BIG_ENDIAN
+# include "vdso-arm-be.c.inc"
+# include "vdso-thm-be.c.inc"
+#else
+# include "vdso-arm-le.c.inc"
+# include "vdso-thm-le.c.inc"
+#endif
+
+static const VdsoImageInfo *vdso_image_info(void)
+{
+ARMCPU *cpu = ARM_CPU(thread_cpu);
+
+/*
+ * The only cpus we support that do *not* have arm mode are m-profile.
+ * It's not really possible to run Linux on these, but this config is
+ * useful for testing gcc.  In any case, choose the vdso image that
+ * will work for the target cpu.
+ */
+return (arm_feature(&cpu->env, ARM_FEATURE_M)
+? &vdso_thm_image_info
+: &vdso_arm_image_info);
+}
+#define vdso_image_info vdso_image_info
+
 #else
 /* 64 bit ARM definitions */
 #define ELF_START_MMAP 0x8000
diff --git a/linux-user/arm/Makefile.vdso b/linux-user/arm/Makefile.vdso
new file mode 100644
index 00..e031a3d549
--- /dev/null
+++ b/linux-user/arm/Makefile.vdso
@@ -0,0 +1,17 @@
+CROSS_CC ?= arm-linux-gnueabihf-gcc
+LDFLAGS := -nostdlib -shared -Wl,-T,vdso.ld \
+  -Wl,-h,linux-vdso.so.1 -Wl,--hash-style=both -Wl,--build-id=sha1
+
+all: vdso-arm-le.so vdso-arm-be.so vdso-thm-le.so vdso-thm-be.so
+
+vdso-arm-le.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC) $(LDFLAGS) -mlittle-endian -marm vdso.S -o $@
+
+vdso-arm-be.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC) $(LDFLAGS) -mbig-endian -marm vdso.S -o $@
+
+vdso-thm-le.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC) $(LDFL

[PATCH v4 06/18] linux-user: Replace bprm->fd with bprm->src.fd

2023-08-16 Thread Richard Henderson
There are only a couple of uses of bprm->fd remaining.
Migrate to the other field.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/loader.h| 1 -
 linux-user/flatload.c  | 8 
 linux-user/linuxload.c | 5 ++---
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/linux-user/loader.h b/linux-user/loader.h
index 311d20f5d1..5b4c50 100644
--- a/linux-user/loader.h
+++ b/linux-user/loader.h
@@ -74,7 +74,6 @@ struct linux_binprm {
 char buf[BPRM_BUF_SIZE] __attribute__((aligned));
 ImageSource src;
 abi_ulong p;
-int fd;
 int e_uid, e_gid;
 int argc, envc;
 char **argv;
diff --git a/linux-user/flatload.c b/linux-user/flatload.c
index 8f5e9f489b..15e3ec5f6b 100644
--- a/linux-user/flatload.c
+++ b/linux-user/flatload.c
@@ -463,7 +463,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 DBG_FLT("BINFMT_FLAT: ROM mapping of file (we hope)\n");
 
 textpos = target_mmap(0, text_len, PROT_READ|PROT_EXEC,
-  MAP_PRIVATE, bprm->fd, 0);
+  MAP_PRIVATE, bprm->src.fd, 0);
 if (textpos == -1) {
 fprintf(stderr, "Unable to mmap process text\n");
 return -1;
@@ -490,7 +490,7 @@ static int load_flat_file(struct linux_binprm * bprm,
 } else
 #endif
 {
-result = target_pread(bprm->fd, datapos,
+result = target_pread(bprm->src.fd, datapos,
   data_len + (relocs * sizeof(abi_ulong)),
   fpos);
 }
@@ -540,10 +540,10 @@ static int load_flat_file(struct linux_binprm * bprm,
 else
 #endif
 {
-result = target_pread(bprm->fd, textpos,
+result = target_pread(bprm->src.fd, textpos,
   text_len, 0);
 if (result >= 0) {
-result = target_pread(bprm->fd, datapos,
+result = target_pread(bprm->src.fd, datapos,
 data_len + (relocs * sizeof(abi_ulong)),
 ntohl(hdr->data_start));
 }
diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c
index 5b7e9ab983..4a794f8cea 100644
--- a/linux-user/linuxload.c
+++ b/linux-user/linuxload.c
@@ -39,7 +39,7 @@ static int prepare_binprm(struct linux_binprm *bprm)
 int mode;
 int retval;
 
-if (fstat(bprm->fd, &st) < 0) {
+if (fstat(bprm->src.fd, &st) < 0) {
 return -errno;
 }
 
@@ -69,7 +69,7 @@ static int prepare_binprm(struct linux_binprm *bprm)
 bprm->e_gid = st.st_gid;
 }
 
-retval = read(bprm->fd, bprm->buf, BPRM_BUF_SIZE);
+retval = read(bprm->src.fd, bprm->buf, BPRM_BUF_SIZE);
 if (retval < 0) {
 perror("prepare_binprm");
 exit(-1);
@@ -144,7 +144,6 @@ int loader_exec(int fdexec, const char *filename, char 
**argv, char **envp,
 {
 int retval;
 
-bprm->fd = fdexec;
 bprm->src.fd = fdexec;
 bprm->filename = (char *)filename;
 bprm->argc = count(argv);
-- 
2.34.1




[PATCH v4 18/18] linux-user/s390x: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/s390x/vdso-asmoffset.h |   2 +
 linux-user/elfload.c  |   3 ++
 linux-user/s390x/signal.c |   4 +-
 linux-user/s390x/Makefile.vdso|   5 +++
 linux-user/s390x/meson.build  |   6 +++
 linux-user/s390x/vdso.S   |  61 ++
 linux-user/s390x/vdso.ld  |  69 ++
 linux-user/s390x/vdso.so  | Bin 0 -> 3464 bytes
 8 files changed, 147 insertions(+), 3 deletions(-)
 create mode 100644 linux-user/s390x/vdso-asmoffset.h
 create mode 100644 linux-user/s390x/Makefile.vdso
 create mode 100644 linux-user/s390x/vdso.S
 create mode 100644 linux-user/s390x/vdso.ld
 create mode 100755 linux-user/s390x/vdso.so

diff --git a/linux-user/s390x/vdso-asmoffset.h 
b/linux-user/s390x/vdso-asmoffset.h
new file mode 100644
index 00..27a062d6c1
--- /dev/null
+++ b/linux-user/s390x/vdso-asmoffset.h
@@ -0,0 +1,2 @@
+/* Minimum stack frame size */
+#define STACK_FRAME_OVERHEAD160
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 48d30caafe..ccfbf82836 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1756,6 +1756,9 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 #define USE_ELF_CORE_DUMP
 #define ELF_EXEC_PAGESIZE 4096
 
+#include "vdso.c.inc"
+#define vdso_image_info()&vdso_image_info
+
 #endif /* TARGET_S390X */
 
 #ifdef TARGET_RISCV
diff --git a/linux-user/s390x/signal.c b/linux-user/s390x/signal.c
index 0f8b8e04bf..b40f738a70 100644
--- a/linux-user/s390x/signal.c
+++ b/linux-user/s390x/signal.c
@@ -21,14 +21,12 @@
 #include "user-internals.h"
 #include "signal-common.h"
 #include "linux-user/trace.h"
+#include "vdso-asmoffset.h"
 
 #define __NUM_GPRS 16
 #define __NUM_FPRS 16
 #define __NUM_ACRS 16
 
-/* Minimum stack frame size */
-#define STACK_FRAME_OVERHEAD160
-
 #define _SIGCONTEXT_NSIG64
 #define _SIGCONTEXT_NSIG_BPW64 /* FIXME: 31-bit mode -> 32 */
 #define _SIGCONTEXT_NSIG_WORDS  (_SIGCONTEXT_NSIG / _SIGCONTEXT_NSIG_BPW)
diff --git a/linux-user/s390x/Makefile.vdso b/linux-user/s390x/Makefile.vdso
new file mode 100644
index 00..6b3b7bb426
--- /dev/null
+++ b/linux-user/s390x/Makefile.vdso
@@ -0,0 +1,5 @@
+CROSS_CC ?= s390x-linux-gnu-gcc
+
+vdso.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC) -nostdlib -shared -Wl,-T,vdso.ld -Wl,--build-id=sha1 \
+ -Wl,-h,linux-vdso64.so.1 -Wl,--hash-style=both vdso.S -o $@
diff --git a/linux-user/s390x/meson.build b/linux-user/s390x/meson.build
index 0781ccea1d..3ea8c8ea9d 100644
--- a/linux-user/s390x/meson.build
+++ b/linux-user/s390x/meson.build
@@ -3,3 +3,9 @@ syscall_nr_generators += {
  arguments: [ meson.current_source_dir() / 
'syscallhdr.sh', '@INPUT@', '@OUTPUT@', '@EXTRA_ARGS@' ],
  output: '@BASENAME@_nr.h')
 }
+
+gen = [
+  gen_vdso.process('vdso.so', extra_args: ['-s', '__kernel_sigreturn',
+   '-r', '__kernel_rt_sigreturn'])
+]
+linux_user_ss.add(when: 'TARGET_S390X', if_true: gen)
diff --git a/linux-user/s390x/vdso.S b/linux-user/s390x/vdso.S
new file mode 100644
index 00..3332492477
--- /dev/null
+++ b/linux-user/s390x/vdso.S
@@ -0,0 +1,61 @@
+/*
+ * s390x linux replacement vdso.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include "vdso-asmoffset.h"
+
+.macro endf name
+   .globl  \name
+   .type   \name, @function
+   .size   \name, . - \name
+.endm
+
+.macro raw_syscall n
+.ifne  \n < 0x100
+   svc \n
+   .else
+   lghi%r1, \n
+   svc 0
+.endif
+.endm
+
+.macro vdso_syscall name, nr
+\name:
+   .cfi_startproc
+   aghi%r15, -(STACK_FRAME_OVERHEAD + 16)
+   .cfi_adjust_cfa_offset STACK_FRAME_OVERHEAD + 16
+   stg %r14, STACK_FRAME_OVERHEAD(%r15)
+   .cfi_rel_offset %r14, STACK_FRAME_OVERHEAD
+   raw_syscall \nr
+   lg  %r14, STACK_FRAME_OVERHEAD(%r15)
+   aghi%r15, STACK_FRAME_OVERHEAD + 16
+   .cfi_restore %r14
+   .cfi_adjust_cfa_offset -(STACK_FRAME_OVERHEAD + 16)
+   br  %r14
+   .cfi_endproc
+endf   \name
+.endm
+
+vdso_syscall __kernel_gettimeofday, __NR_gettimeofday
+vdso_syscall __kernel_clock_gettime, __NR_clock_gettime
+vdso_syscall __kernel_clock_getres, __NR_clock_getres
+vdso_syscall __kernel_getcpu, __NR_getcpu
+
+/*
+ * TODO unwind info, though we're ok without it.
+ * The kernel supplies bogus empty unwind info, and it is likely ignored
+ * by all users.  Without it we get the fallback signal frame handling.
+ */
+
+__kernel_sigreturn:
+   raw_syscall __NR_sigreturn
+endf   __kernel_sigreturn
+
+__kernel_rt_sigreturn:
+   raw_syscall __NR_rt_sigreturn
+endf   __kernel_rt_sigreturn
diff --git a/linux-user/s390x/vdso.ld b/linux-user/s390x/vdso.ld
new file mode 100644
index 00..2a30ff382a
--- /dev/null
+++ b/linux-user/s390x/vdso.ld
@@ -0

[PATCH v4 10/18] linux-user/x86_64: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c|   4 +-
 linux-user/x86_64/Makefile.vdso |   5 ++
 linux-user/x86_64/meson.build   |   6 +++
 linux-user/x86_64/vdso.S|  78 
 linux-user/x86_64/vdso.ld   |  73 ++
 linux-user/x86_64/vdso.so   | Bin 0 -> 2968 bytes
 6 files changed, 164 insertions(+), 2 deletions(-)
 create mode 100644 linux-user/x86_64/Makefile.vdso
 create mode 100644 linux-user/x86_64/vdso.S
 create mode 100644 linux-user/x86_64/vdso.ld
 create mode 100755 linux-user/x86_64/vdso.so

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 7e02765954..e8a2375ba8 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -317,12 +317,12 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUX86State *en
 #define DLINFO_ARCH_ITEMS 1
 #define ARCH_DLINFO   NEW_AUX_ENT(AT_SYSINFO, vdso_info->entry);
 
+#endif /* TARGET_X86_64 */
+
 #include "vdso.c.inc"
 
 #define vdso_image_info()&vdso_image_info
 
-#endif /* TARGET_X86_64 */
-
 #define USE_ELF_CORE_DUMP
 #define ELF_EXEC_PAGESIZE   4096
 
diff --git a/linux-user/x86_64/Makefile.vdso b/linux-user/x86_64/Makefile.vdso
new file mode 100644
index 00..6de038dcfb
--- /dev/null
+++ b/linux-user/x86_64/Makefile.vdso
@@ -0,0 +1,5 @@
+CROSS_CC ?= x86_64-linux-gnu-gcc
+
+vdso.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC) -nostdlib -shared -Wl,-T,vdso.ld -Wl,--build-id=sha1 \
+ -Wl,-h,linux-vdso.so.1 -Wl,--hash-style=both vdso.S -o $@
diff --git a/linux-user/x86_64/meson.build b/linux-user/x86_64/meson.build
index 203af9a60c..f6a0015953 100644
--- a/linux-user/x86_64/meson.build
+++ b/linux-user/x86_64/meson.build
@@ -3,3 +3,9 @@ syscall_nr_generators += {
   arguments: [ meson.current_source_dir() / 
'syscallhdr.sh', '@INPUT@', '@OUTPUT@', '@EXTRA_ARGS@' ],
   output: '@BASENAME@_nr.h')
 }
+
+gen = [
+  gen_vdso.process('vdso.so')
+]
+
+linux_user_ss.add(when: 'TARGET_X86_64', if_true: gen)
diff --git a/linux-user/x86_64/vdso.S b/linux-user/x86_64/vdso.S
new file mode 100644
index 00..47d16c00ab
--- /dev/null
+++ b/linux-user/x86_64/vdso.S
@@ -0,0 +1,78 @@
+/*
+ * x86-64 linux replacement vdso.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+
+.macro endf name
+   .globl  \name
+   .type   \name, @function
+   .size   \name, . - \name
+.endm
+
+.macro weakalias name
+\name  = __vdso_\name
+   .weak   \name
+.endm
+
+.macro vdso_syscall name, nr
+__vdso_\name:
+   mov $\nr, %eax
+   syscall
+   ret
+endf   __vdso_\name
+weakalias \name
+.endm
+
+   .cfi_startproc
+
+vdso_syscall clock_gettime, __NR_clock_gettime
+vdso_syscall clock_getres, __NR_clock_getres
+vdso_syscall gettimeofday, __NR_gettimeofday
+vdso_syscall time, __NR_time
+
+__vdso_getcpu:
+   /*
+ * There is no syscall number for this allocated on x64.
+* We can handle this several ways:
+ *
+* (1) Invent a syscall number for use within qemu.
+ * It should be easy enough to pick a number that
+ * is well out of the way of the kernel numbers.
+ *
+ * (2) Force the emulated cpu to support the rdtscp insn,
+* and initialize the TSC_AUX value the appropriate value.
+ *
+* (3) Pretend that we're always running on cpu 0.
+ *
+* This last is the one that's implemented here, with the
+* tiny bit of extra code to support rdtscp in place.
+ */
+   xor %ecx, %ecx  /* rdtscp w/ tsc_aux = 0 */
+
+   /* if (cpu != NULL) *cpu = (ecx & 0xfff); */
+   test%rdi, %rdi
+   jz  1f
+   mov %ecx, %eax
+   and $0xfff, %eax
+   mov %eax, (%rdi)
+
+   /* if (node != NULL) *node = (ecx >> 12); */
+1: test%rsi, %rsi
+   jz  2f
+   shr $12, %ecx
+   mov %ecx, (%rsi)
+
+2: xor %eax, %eax
+   ret
+endf   __vdso_getcpu
+
+weakalias getcpu
+
+   .cfi_endproc
+
+/* TODO: Add elf note for LINUX_VERSION_CODE */
diff --git a/linux-user/x86_64/vdso.ld b/linux-user/x86_64/vdso.ld
new file mode 100644
index 00..ca6001cc3c
--- /dev/null
+++ b/linux-user/x86_64/vdso.ld
@@ -0,0 +1,73 @@
+/*
+ * Linker script for linux x86-64 replacement vdso.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+VERSION {
+LINUX_2.6 {
+global:
+clock_gettime;
+__vdso_clock_gettime;
+gettimeofday;
+__vdso_gettimeofday;
+getcpu;
+__vdso_getcpu;
+time;
+__vdso_time;
+clock_getres;
+__vdso_clock_getres;
+
+local: *;
+};
+}
+
+
+PHDRS {
+phdrPT_PHDR 

[PATCH v4 01/18] linux-user: Introduce imgsrc_read, imgsrc_read_alloc

2023-08-16 Thread Richard Henderson
Introduced and initialized, but not yet really used.
These will tidy the current tests vs BPRM_BUF_SIZE.

Signed-off-by: Richard Henderson 
---
 linux-user/loader.h| 61 +++-
 linux-user/linuxload.c | 90 ++
 2 files changed, 142 insertions(+), 9 deletions(-)

diff --git a/linux-user/loader.h b/linux-user/loader.h
index 59cbeacf24..311d20f5d1 100644
--- a/linux-user/loader.h
+++ b/linux-user/loader.h
@@ -18,6 +18,48 @@
 #ifndef LINUX_USER_LOADER_H
 #define LINUX_USER_LOADER_H
 
+typedef struct {
+const void *cache;
+unsigned int cache_size;
+int fd;
+} ImageSource;
+
+/**
+ * imgsrc_read: Read from ImageSource
+ * @dst: destination for read
+ * @offset: offset within file for read
+ * @len: size of the read
+ * @img: ImageSource to read from
+ * @errp: Error details.
+ *
+ * Read into @dst, using the cache when possible.
+ */
+bool imgsrc_read(void *dst, off_t offset, size_t len,
+ const ImageSource *img, Error **errp);
+
+/**
+ * imgsrc_read_alloc: Read from ImageSource
+ * @offset: offset within file for read
+ * @size: size of the read
+ * @img: ImageSource to read from
+ * @errp: Error details.
+ *
+ * Read into newly allocated memory, using the cache when possible.
+ */
+void *imgsrc_read_alloc(off_t offset, size_t len,
+const ImageSource *img, Error **errp);
+
+/**
+ * imgsrc_mmap: Map from ImageSource
+ *
+ * If @src has a file descriptor, pass on to target_mmap.  Otherwise,
+ * this is "mapping" from a host buffer, which resolves to memcpy.
+ * Therefore, flags must be MAP_PRIVATE | MAP_FIXED; the argument is
+ * retained for clarity.
+ */
+abi_long imgsrc_mmap(abi_ulong start, abi_ulong len, int prot,
+ int flags, const ImageSource *src, abi_ulong offset);
+
 /*
  * Read a good amount of data initially, to hopefully get all the
  * program headers loaded.
@@ -29,15 +71,16 @@
  * used when loading binaries.
  */
 struct linux_binprm {
-char buf[BPRM_BUF_SIZE] __attribute__((aligned));
-abi_ulong p;
-int fd;
-int e_uid, e_gid;
-int argc, envc;
-char **argv;
-char **envp;
-char *filename;/* Name of binary */
-int (*core_dump)(int, const CPUArchState *); /* coredump routine */
+char buf[BPRM_BUF_SIZE] __attribute__((aligned));
+ImageSource src;
+abi_ulong p;
+int fd;
+int e_uid, e_gid;
+int argc, envc;
+char **argv;
+char **envp;
+char *filename;/* Name of binary */
+int (*core_dump)(int, const CPUArchState *); /* coredump routine */
 };
 
 void do_init_thread(struct target_pt_regs *regs, struct image_info *infop);
diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c
index 745cce70ab..3536dd8104 100644
--- a/linux-user/linuxload.c
+++ b/linux-user/linuxload.c
@@ -3,7 +3,9 @@
 #include "qemu/osdep.h"
 #include "qemu.h"
 #include "user-internals.h"
+#include "user-mmap.h"
 #include "loader.h"
+#include "qapi/error.h"
 
 #define NGROUPS 32
 
@@ -76,6 +78,10 @@ static int prepare_binprm(struct linux_binprm *bprm)
 /* Make sure the rest of the loader won't read garbage.  */
 memset(bprm->buf + retval, 0, BPRM_BUF_SIZE - retval);
 }
+
+bprm->src.cache = bprm->buf;
+bprm->src.cache_size = retval;
+
 return retval;
 }
 
@@ -139,6 +145,7 @@ int loader_exec(int fdexec, const char *filename, char 
**argv, char **envp,
 int retval;
 
 bprm->fd = fdexec;
+bprm->src.fd = fdexec;
 bprm->filename = (char *)filename;
 bprm->argc = count(argv);
 bprm->argv = argv;
@@ -173,3 +180,86 @@ int loader_exec(int fdexec, const char *filename, char 
**argv, char **envp,
 
 return retval;
 }
+
+bool imgsrc_read(void *dst, off_t offset, size_t len,
+ const ImageSource *img, Error **errp)
+{
+ssize_t ret;
+
+if (offset + len <= img->cache_size) {
+memcpy(dst, img->cache + offset, len);
+return true;
+}
+
+if (img->fd < 0) {
+error_setg(errp, "read past end of buffer");
+return false;
+}
+
+ret = pread(img->fd, dst, len, offset);
+if (ret == len) {
+return true;
+}
+if (ret < 0) {
+error_setg_errno(errp, errno, "Error reading file header");
+} else {
+error_setg(errp, "Incomplete read of file header");
+}
+return false;
+}
+
+void *imgsrc_read_alloc(off_t offset, size_t len,
+const ImageSource *img, Error **errp)
+{
+void *alloc = g_malloc(len);
+bool ok = imgsrc_read(alloc, offset, len, img, errp);
+
+if (!ok) {
+g_free(alloc);
+alloc = NULL;
+}
+return alloc;
+}
+
+abi_long imgsrc_mmap(abi_ulong start, abi_ulong len, int prot,
+ int flags, const ImageSource *src, abi_ulong offset)
+{
+const int prot_write = PROT_READ | PROT_WRITE;
+abi_long ret;
+void *haddr;
+
+assert(flags == (MAP_PRIVATE 

[PATCH v4 11/18] linux-user/aarch64: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c |   4 ++
 linux-user/aarch64/Makefile.vdso |  12 +
 linux-user/aarch64/meson.build   |  12 +
 linux-user/aarch64/vdso-be.so| Bin 0 -> 3216 bytes
 linux-user/aarch64/vdso-le.so| Bin 0 -> 3216 bytes
 linux-user/aarch64/vdso.S|  73 +++
 linux-user/aarch64/vdso.ld   |  72 ++
 linux-user/meson.build   |   1 +
 8 files changed, 174 insertions(+)
 create mode 100644 linux-user/aarch64/Makefile.vdso
 create mode 100644 linux-user/aarch64/meson.build
 create mode 100755 linux-user/aarch64/vdso-be.so
 create mode 100755 linux-user/aarch64/vdso-le.so
 create mode 100644 linux-user/aarch64/vdso.S
 create mode 100644 linux-user/aarch64/vdso.ld

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index e8a2375ba8..98cb1ff053 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -588,10 +588,14 @@ static const char *get_elf_platform(void)
 #define ELF_CLASS   ELFCLASS64
 #if TARGET_BIG_ENDIAN
 # define ELF_PLATFORM"aarch64_be"
+# include "vdso-be.c.inc"
 #else
 # define ELF_PLATFORM"aarch64"
+# include "vdso-le.c.inc"
 #endif
 
+#define vdso_image_info()&vdso_image_info
+
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
 {
diff --git a/linux-user/aarch64/Makefile.vdso b/linux-user/aarch64/Makefile.vdso
new file mode 100644
index 00..53c19e1ce9
--- /dev/null
+++ b/linux-user/aarch64/Makefile.vdso
@@ -0,0 +1,12 @@
+CROSS_CC ?= aarch64-linux-gnu-gcc
+LDFLAGS := -nostdlib -shared -Wl,-T,vdso.ld \
+  -Wl,-h,linux-vdso.so.1 -Wl,--hash-style=both -Wl,--build-id=sha1 \
+  -Wl,-z,max-page-size=4096
+
+all: vdso-le.so vdso-be.so
+
+vdso-le.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC)  $(LDFLAGS) -mlittle-endian vdso.S -o $@
+
+vdso-be.so: vdso.S vdso.ld Makefile.vdso
+   $(CROSS_CC)  $(LDFLAGS) -mbig-endian vdso.S -o $@
diff --git a/linux-user/aarch64/meson.build b/linux-user/aarch64/meson.build
new file mode 100644
index 00..35e50c9b2c
--- /dev/null
+++ b/linux-user/aarch64/meson.build
@@ -0,0 +1,12 @@
+# ??? There does not seem to be a way to do
+#   when: ['TARGET_AARCH64', !'TARGET_WORDS_BIGENDIAN']
+# so we'd need to add TARGET_WORDS_LITTLEENDIAN.
+# In the meantime, build both files for aarch64 and aarch64_be,
+# only one of which will be included.
+
+gen = [
+  gen_vdso.process('vdso-be.so', extra_args: ['-r', '__kernel_rt_sigreturn']),
+  gen_vdso.process('vdso-le.so', extra_args: ['-r', '__kernel_rt_sigreturn'])
+]
+
+linux_user_ss.add(when: 'TARGET_AARCH64', if_true: gen)
diff --git a/linux-user/aarch64/vdso-be.so b/linux-user/aarch64/vdso-be.so
new file mode 100755
index 
..311f192d88149f744ee05c2492e8e8de6c5b5ec4
GIT binary patch
literal 3216
zcmc&$L2nyH6rS~N;}lA9Dk>rcL6IUEgoN5dk&tN3_h#N3uPqxPao{Dh-+bS^
zH#0jkZ@h2k3rne#r=0of3)B;WY0295@hi;c)jpL~8TBCe7)oH(wY8;%#
zNLzi>Gq`121T#caC_J9mApu==4D`^_-fN7P<6l=Ha$L~14uhy3;y^+2mx8LdZrcY@
zbb^uJ_Idjv~$3@CLUuRf3-)YZWF1H#X
z{-08XQ!D4z^<4N&`1Es_4dtKw5%cWd1HXU$2^gK{M^NlJr8i0`FRj0Sm)ihplOEob
ze;h#m?p&=(Oh7X-_`q4Rlh`D>b?
zr>&pMW6|k6!}{|N{`*;(y3udLuzpCB
zaX&@f*@*rRd2Zak^`AoEu=fgk{{FJ-9QVSA{C_c#=gE7dlF)B*KeX;|CUjY!4D=5Y
z`Ze~?s8T6cd3k=d8aC8)SxvW<&b_j-aQf8qNIzWsI=`%!))r3Z^>S`a>63*siwgyP
zW@+iA{8@c=VX=_c{Tb-;-evgcRCd1q=8jNdYj-;?z6y++7OPULH{wWz<=uL5yH+w4
z_yUV@QH7QIPFO9rs#cGiR=FHCTeU{LkLa=rqpIF$7Pli^EyGio3gf69^P>GnpmV`_
zzZJW%$Q^@v1Bds$%~hE%`bfY((WH;K6-HF?wRYK{qh=%Xq!Bk79tkf
z$@I87`um?&P1YB5E%Ah|>RRqcs3`}>G!QZRK>48ZQRNR?k-xNTkm-@G)tokTJ)!bQ
z>;Hec9C8N@(QVy_rUzve|3h)y$NP+r(7l$uQ@K|0bJg;*OYxsA+_uLWc__z@GnYB;
zL-aI-2std6=m;)D?0%qhiu$AU7Cn)FIbG?S@OML)vwZWrMlYKcfM4wx%|JM@<(S+
z&YJ`7>+Zofcr(0v{K+b%f0uvy>siU?-8_)1@=iT0zjVX&p|0c}gi>S9VZlTv&Er)b
zLb%7&hjY~1cs?~Y^g@V9eh%@N5G5uv?)i-?allsV#;&vbTSaG{we5v@Reo-u*I#;RBF1s}vFVT2|
zO!mFm%*@fZxcs>Bo7}XSn#;`O%yf2+%#r-D$xPlHo0@tzcfy>=Oy+Z@y8_#F->&e-
zmvUS8zw`)l>uW1P;SBS@k9ZlgYAx`{Ev;1xE0tn6VO{2hpg`_Yb=55w>g5Oz{0KSk
z`SnVz+O;qxa=o%y^b0GVDVJCiD{=#`5%3v4ElFO;cLmB4c?m9g5GVXT0DF-q(ua{g
z$UAUOu8{EU0(9~)TYiR=mXkNZx9KF$qm$>F#?`YZTeC@Qz(f1>vI
zhCrVI$Nu5QjvtQ=9pM|QcurqTzAJa~K>SfrV%o|BcN+rwIIRi)T^!`OeSag2CD{*X
Jpo>qi{}(kpCh`CP

literal 0
HcmV?d1

diff --git a/linux-user/aarch64/vdso.S b/linux-user/aarch64/vdso.S
new file mode 100644
index 00..e436e60fd9
--- /dev/null
+++ b/linux-user/aarch64/vdso.S
@@ -0,0 +1,73 @@
+/*
+ * aarch64 linux replacement vdso.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+
+/* ??? These are in include/elf.h, which is not ready for inclusion in asm. */
+#define NT_GNU_PROPERTY_TYPE_0  5
+#define GNU_PROPERTY_AARCH64_FEATURE_1_AND  0xc000
+#define GNU_PROPERTY_AARCH64_FEATURE_1_BTI  (1U << 0)
+#define GNU_PROPERTY_AARCH64_FEATURE_1_PAC  (1U << 1)
+
+#define GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT \
+(GNU_PROPERTY_AARC

[PATCH v4 02/18] linux-user: Tidy loader_exec

2023-08-16 Thread Richard Henderson
Reorg the if cases to reduce indentation.
Test for 4 bytes in the file before checking the signatures.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/linuxload.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c
index 3536dd8104..5b7e9ab983 100644
--- a/linux-user/linuxload.c
+++ b/linux-user/linuxload.c
@@ -154,31 +154,31 @@ int loader_exec(int fdexec, const char *filename, char 
**argv, char **envp,
 
 retval = prepare_binprm(bprm);
 
-if (retval >= 0) {
-if (bprm->buf[0] == 0x7f
-&& bprm->buf[1] == 'E'
-&& bprm->buf[2] == 'L'
-&& bprm->buf[3] == 'F') {
-retval = load_elf_binary(bprm, infop);
-#if defined(TARGET_HAS_BFLT)
-} else if (bprm->buf[0] == 'b'
-&& bprm->buf[1] == 'F'
-&& bprm->buf[2] == 'L'
-&& bprm->buf[3] == 'T') {
-retval = load_flt_binary(bprm, infop);
-#endif
-} else {
-return -ENOEXEC;
-}
+if (retval < 4) {
+return -ENOEXEC;
 }
-
-if (retval >= 0) {
-/* success.  Initialize important registers */
-do_init_thread(regs, infop);
+if (bprm->buf[0] == 0x7f
+&& bprm->buf[1] == 'E'
+&& bprm->buf[2] == 'L'
+&& bprm->buf[3] == 'F') {
+retval = load_elf_binary(bprm, infop);
+#if defined(TARGET_HAS_BFLT)
+} else if (bprm->buf[0] == 'b'
+   && bprm->buf[1] == 'F'
+   && bprm->buf[2] == 'L'
+   && bprm->buf[3] == 'T') {
+retval = load_flt_binary(bprm, infop);
+#endif
+} else {
+return -ENOEXEC;
+}
+if (retval < 0) {
 return retval;
 }
 
-return retval;
+/* Success.  Initialize important registers. */
+do_init_thread(regs, infop);
+return 0;
 }
 
 bool imgsrc_read(void *dst, off_t offset, size_t len,
-- 
2.34.1




[PATCH v4 17/18] linux-user/s390x: Rename __SIGNAL_FRAMESIZE to STACK_FRAME_OVERHEAD

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/s390x/signal.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/linux-user/s390x/signal.c b/linux-user/s390x/signal.c
index f72165576f..0f8b8e04bf 100644
--- a/linux-user/s390x/signal.c
+++ b/linux-user/s390x/signal.c
@@ -26,7 +26,8 @@
 #define __NUM_FPRS 16
 #define __NUM_ACRS 16
 
-#define __SIGNAL_FRAMESIZE  160 /* FIXME: 31-bit mode -> 96 */
+/* Minimum stack frame size */
+#define STACK_FRAME_OVERHEAD160
 
 #define _SIGCONTEXT_NSIG64
 #define _SIGCONTEXT_NSIG_BPW64 /* FIXME: 31-bit mode -> 32 */
@@ -63,7 +64,7 @@ typedef struct {
 } target_sigcontext;
 
 typedef struct {
-uint8_t callee_used_stack[__SIGNAL_FRAMESIZE];
+uint8_t callee_used_stack[STACK_FRAME_OVERHEAD];
 target_sigcontext sc;
 target_sigregs sregs;
 int signo;
@@ -83,7 +84,7 @@ struct target_ucontext {
 };
 
 typedef struct {
-uint8_t callee_used_stack[__SIGNAL_FRAMESIZE];
+uint8_t callee_used_stack[STACK_FRAME_OVERHEAD];
 /*
  * This field is no longer initialized by the kernel, but it's still a part
  * of the ABI.
-- 
2.34.1




[PATCH v4 07/18] linux-user: Load vdso image if available

2023-08-16 Thread Richard Henderson
The vdso image will be pre-processed into a C data array, with
a simple list of relocations to perform, and identifying the
location of signal trampolines.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 87 +++-
 1 file changed, 78 insertions(+), 9 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 19d3cac039..f94963638a 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -33,6 +33,19 @@
 #undef ELF_ARCH
 #endif
 
+#ifndef TARGET_ARCH_HAS_SIGTRAMP_PAGE
+#define TARGET_ARCH_HAS_SIGTRAMP_PAGE 0
+#endif
+
+typedef struct {
+const uint8_t *image;
+const uint32_t *relocs;
+unsigned image_size;
+unsigned reloc_count;
+unsigned sigreturn_ofs;
+unsigned rt_sigreturn_ofs;
+} VdsoImageInfo;
+
 #define ELF_OSABI   ELFOSABI_SYSV
 
 /* from personality.h */
@@ -2291,7 +2304,8 @@ static abi_ulong loader_build_fdpic_loadmap(struct 
image_info *info, abi_ulong s
 static abi_ulong create_elf_tables(abi_ulong p, int argc, int envc,
struct elfhdr *exec,
struct image_info *info,
-   struct image_info *interp_info)
+   struct image_info *interp_info,
+   struct image_info *vdso_info)
 {
 abi_ulong sp;
 abi_ulong u_argc, u_argv, u_envp, u_auxv;
@@ -2379,10 +2393,15 @@ static abi_ulong create_elf_tables(abi_ulong p, int 
argc, int envc,
 }
 
 size = (DLINFO_ITEMS + 1) * 2;
-if (k_base_platform)
+if (k_base_platform) {
 size += 2;
-if (k_platform)
+}
+if (k_platform) {
 size += 2;
+}
+if (vdso_info) {
+size += 2;
+}
 #ifdef DLINFO_ARCH_ITEMS
 size += DLINFO_ARCH_ITEMS * 2;
 #endif
@@ -2464,6 +2483,9 @@ static abi_ulong create_elf_tables(abi_ulong p, int argc, 
int envc,
 if (u_platform) {
 NEW_AUX_ENT(AT_PLATFORM, u_platform);
 }
+if (vdso_info) {
+NEW_AUX_ENT(AT_SYSINFO_EHDR, vdso_info->load_addr);
+}
 NEW_AUX_ENT (AT_NULL, 0);
 #undef NEW_AUX_ENT
 
@@ -3341,6 +3363,49 @@ static void load_elf_interp(const char *filename, struct 
image_info *info,
 load_elf_image(filename, &src, info, &ehdr, NULL);
 }
 
+#ifndef vdso_image_info
+#define vdso_image_info()NULL
+#endif
+
+static void load_elf_vdso(struct image_info *info, const VdsoImageInfo *vdso)
+{
+ImageSource src;
+struct elfhdr ehdr;
+abi_ulong load_bias, load_addr;
+
+src.fd = -1;
+src.cache = vdso->image;
+src.cache_size = vdso->image_size;
+
+load_elf_image("", &src, info, &ehdr, NULL);
+load_addr = info->load_addr;
+load_bias = info->load_bias;
+
+/*
+ * We need to relocate the VDSO image.  The one built into the kernel
+ * is built for a fixed address.  The one built for QEMU is not, since
+ * that requires close control of the guest address space.
+ * We pre-processed the image to locate all of the addresses that need
+ * to be updated.
+ */
+for (unsigned i = 0, n = vdso->reloc_count; i < n; i++) {
+abi_ulong *addr = g2h_untagged(load_addr + vdso->relocs[i]);
+*addr = tswapal(tswapal(*addr) + load_bias);
+}
+
+/* Install signal trampolines, if present. */
+if (vdso->sigreturn_ofs) {
+default_sigreturn = load_addr + vdso->sigreturn_ofs;
+}
+if (vdso->rt_sigreturn_ofs) {
+default_rt_sigreturn = load_addr + vdso->rt_sigreturn_ofs;
+}
+
+/* Remove write from VDSO segment. */
+target_mprotect(info->start_data, info->end_data - info->start_data,
+PROT_READ | PROT_EXEC);
+}
+
 static int symfind(const void *s0, const void *s1)
 {
 struct elf_sym *sym = (struct elf_sym *)s1;
@@ -3546,7 +3611,7 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
  * and let elf_load_image do any swapping that may be required.
  */
 struct elfhdr ehdr;
-struct image_info interp_info;
+struct image_info interp_info, vdso_info;
 char *elf_interpreter = NULL;
 char *scratch;
 
@@ -3629,10 +3694,13 @@ int load_elf_binary(struct linux_binprm *bprm, struct 
image_info *info)
 }
 
 /*
- * TODO: load a vdso, which would also contain the signal trampolines.
- * Otherwise, allocate a private page to hold them.
+ * Load a vdso if available, which will amongst other things contain the
+ * signal trampolines.  Otherwise, allocate a separate page for them.
  */
-if (TARGET_ARCH_HAS_SIGTRAMP_PAGE) {
+const VdsoImageInfo *vdso = vdso_image_info();
+if (vdso) {
+load_elf_vdso(&vdso_info, vdso);
+} else if (TARGET_ARCH_HAS_SIGTRAMP_PAGE) {
 abi_long tramp_page = target_mmap(0, TARGET_PAGE_SIZE,
   PROT_READ | PROT_WRITE,
   MAP_PRIVATE | MAP_ANON, -1, 0);
@@ -3644,8 +3712,9 @

[PATCH v4 15/18] linux-user/loongarch64: Add vdso

2023-08-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/loongarch64/vdso-asmoffset.h |   8 ++
 linux-user/elfload.c|   4 +
 linux-user/loongarch64/signal.c |  17 +++-
 linux-user/loongarch64/Makefile.vdso|   7 ++
 linux-user/loongarch64/meson.build  |   4 +
 linux-user/loongarch64/vdso.S   | 130 
 linux-user/loongarch64/vdso.ld  |  73 +
 linux-user/loongarch64/vdso.so  | Bin 0 -> 3560 bytes
 linux-user/meson.build  |   1 +
 9 files changed, 243 insertions(+), 1 deletion(-)
 create mode 100644 linux-user/loongarch64/vdso-asmoffset.h
 create mode 100644 linux-user/loongarch64/Makefile.vdso
 create mode 100644 linux-user/loongarch64/meson.build
 create mode 100644 linux-user/loongarch64/vdso.S
 create mode 100644 linux-user/loongarch64/vdso.ld
 create mode 100755 linux-user/loongarch64/vdso.so

diff --git a/linux-user/loongarch64/vdso-asmoffset.h 
b/linux-user/loongarch64/vdso-asmoffset.h
new file mode 100644
index 00..60d113822f
--- /dev/null
+++ b/linux-user/loongarch64/vdso-asmoffset.h
@@ -0,0 +1,8 @@
+#define sizeof_rt_sigframe 0x240
+#define sizeof_sigcontext  0x110
+#define sizeof_sctx_info   0x10
+
+#define offsetof_sigcontext0x130
+#define offsetof_sigcontext_pc 0
+#define offsetof_sigcontext_gr 8
+#define offsetof_fpucontext_fr 0
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index c9cba730de..498f5ed07e 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1047,6 +1047,10 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUPPCState *en
 
 #define elf_check_arch(x) ((x) == EM_LOONGARCH)
 
+#include "vdso.c.inc"
+
+#define vdso_image_info()&vdso_image_info
+
 static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
 {
diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index bb8efb1172..b9d0a4cad7 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -10,8 +10,8 @@
 #include "user-internals.h"
 #include "signal-common.h"
 #include "linux-user/trace.h"
-
 #include "target/loongarch/internals.h"
+#include "vdso-asmoffset.h"
 
 /* FP context was used */
 #define SC_USED_FP  (1 << 0)
@@ -23,6 +23,11 @@ struct target_sigcontext {
 uint64_t sc_extcontext[0]   QEMU_ALIGNED(16);
 };
 
+QEMU_BUILD_BUG_ON(sizeof(struct target_sigcontext) != sizeof_sigcontext);
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, sc_pc)
+  != offsetof_sigcontext_pc);
+QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, sc_regs)
+  != offsetof_sigcontext_gr);
 
 #define FPU_CTX_MAGIC   0x46505501
 #define FPU_CTX_ALIGN   8
@@ -32,6 +37,9 @@ struct target_fpu_context {
 uint32_t fcsr;
 } QEMU_ALIGNED(FPU_CTX_ALIGN);
 
+QEMU_BUILD_BUG_ON(offsetof(struct target_fpu_context, regs)
+  != offsetof_fpucontext_fr);
+
 #define CONTEXT_INFO_ALIGN  16
 struct target_sctx_info {
 uint32_t magic;
@@ -39,6 +47,8 @@ struct target_sctx_info {
 uint64_t padding;
 } QEMU_ALIGNED(CONTEXT_INFO_ALIGN);
 
+QEMU_BUILD_BUG_ON(sizeof(struct target_sctx_info) != sizeof_sctx_info);
+
 struct target_ucontext {
 abi_ulong tuc_flags;
 abi_ptr tuc_link;
@@ -53,6 +63,11 @@ struct target_rt_sigframe {
 struct target_ucontext   rs_uc;
 };
 
+QEMU_BUILD_BUG_ON(sizeof(struct target_rt_sigframe)
+  != sizeof_rt_sigframe);
+QEMU_BUILD_BUG_ON(offsetof(struct target_rt_sigframe, rs_uc.tuc_mcontext)
+  != offsetof_sigcontext);
+
 /*
  * These two structures are not present in guest memory, are private
  * to the signal implementation, but are largely copied from the
diff --git a/linux-user/loongarch64/Makefile.vdso 
b/linux-user/loongarch64/Makefile.vdso
new file mode 100644
index 00..dc266a65cf
--- /dev/null
+++ b/linux-user/loongarch64/Makefile.vdso
@@ -0,0 +1,7 @@
+CROSS_CC ?= loongarch64-linux-gnu-gcc
+
+all: vdso.so
+
+vdso.so: vdso.S vdso.ld vdso-asmoffset.h Makefile.vdso
+   $(CROSS_CC) -nostdlib -fpic -shared -Wl,-T,vdso.ld -Wl,--build-id=sha1 \
+  -Wl,-h,linux-vdso.so.1 -Wl,--hash-style=both vdso.S -o $@
diff --git a/linux-user/loongarch64/meson.build 
b/linux-user/loongarch64/meson.build
new file mode 100644
index 00..7ae2ea13c0
--- /dev/null
+++ b/linux-user/loongarch64/meson.build
@@ -0,0 +1,4 @@
+gen = [
+  gen_vdso.process('vdso.so', extra_args: ['-r', '__vdso_rt_sigreturn'])
+]
+linux_user_ss.add(when: 'TARGET_LOONGARCH64', if_true: gen)
diff --git a/linux-user/loongarch64/vdso.S b/linux-user/loongarch64/vdso.S
new file mode 100644
index 00..780a5fda12
--- /dev/null
+++ b/linux-user/loongarch64/vdso.S
@@ -0,0 +1,130 @@
+/*
+ * Loongarch64 linux replacement vdso.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#incl

Re: [PATCH v2 0/4] Add full zoned storage emulation to qcow2 driver

2023-08-16 Thread Stefan Hajnoczi
On Wed, Aug 16, 2023 at 04:14:08PM +0800, Sam Li wrote:
> Klaus Jensen  于2023年8月16日周三 15:37写道:
> >
> > On Aug 14 16:57, Sam Li wrote:
> > > This patch series add a new extension - zoned format - to the
> > > qcow2 driver thereby allowing full zoned storage emulation on
> > > the qcow2 img file. Users can attach such a qcow2 file to the
> > > guest as a zoned device.
> > >
> > > To create a qcow2 file with zoned format, use command like this:
> > > $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> > > zone_size=64M -o zone_capacity=64M -o zone_nr_conv=0 -o
> > > max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> > > -o zoned_profile=zbc
> > >
> > > Then add it to the QEMU command line:
> > > -blockdev 
> > > node-name=drive1,driver=qcow2,file.driver=file,file.filename=../qemu/test.qcow2
> > >  \
> > > -device virtio-blk-pci,drive=drive1 \
> > >
> > > v1->v2:
> > > - add more tests to qemu-io zoned commands
> > > - make zone append change state to full when wp reaches end
> > > - add documentation to qcow2 zoned extension header
> > > - address review comments (Stefan):
> > >   * fix zoned_mata allocation size
> > >   * use bitwise or than addition
> > >   * fix wp index overflow and locking
> > >   * cleanups: comments, naming
> > >
> > > Sam Li (4):
> > >   docs/qcow2: add the zoned format feature
> > >   qcow2: add configurations for zoned format extension
> > >   qcow2: add zoned emulation capability
> > >   iotests: test the zoned format feature for qcow2 file
> > >
> > >  block/qcow2.c| 799 ++-
> > >  block/qcow2.h|  23 +
> > >  docs/interop/qcow2.txt   |  26 +
> > >  docs/system/qemu-block-drivers.rst.inc   |  39 ++
> > >  include/block/block-common.h |   5 +
> > >  include/block/block_int-common.h |  16 +
> > >  qapi/block-core.json |  46 +-
> > >  tests/qemu-iotests/tests/zoned-qcow2 | 135 
> > >  tests/qemu-iotests/tests/zoned-qcow2.out | 140 
> > >  9 files changed, 1214 insertions(+), 15 deletions(-)
> > >  create mode 100755 tests/qemu-iotests/tests/zoned-qcow2
> > >  create mode 100644 tests/qemu-iotests/tests/zoned-qcow2.out
> > >
> >
> > Hi Sam,
> >
> > Thanks for this and for the RFC for hw/nvme - this is an awesome
> > improvement.
> >
> > Can you explain the need for the zoned_profile? I understand that only
> > ZNS requires potentially setting zone_capacity and configuring extended
> > descriptors. When an image is hooked up to a block emulation device that
> > doesnt understand cap < size or extended descriptors, it could just
> > would fail on the cap < size and just ignore the extended descriptor
> > space. Do we really need to add the complexity of the user explicitly
> > having to set the profile? I also think it is fair for the QEMU zoned
> > block api to accomodate both variations - if a particular configuration
> > is supported or not is up to the emulating device.
> >
> > Checking the profile from hw/nvme or hw/block/virtio is the same as
> > checking if cap < size or possibly the presence of extended descriptors.
> 
> Hi Klaus,
> 
> Thanks for your feedback.
> 
> The zoned_profile is for users to choose the emulating device type,
> either zbc or zns. It implies using virtio-blk or nvme pass through.
> The zoned block api does accommodate both variations. Since the cap <
> size and extended descriptor config can also infer zoned_profile, this
> option can be dropped. Then the device type is determined by the
> configurations. When cap = size and no extended descriptor, the img
> can be used both in virtio-blk and nvme zns depending on the QEMU
> command line.

Dropping zoned_profile would be a nice simplification.

Stefan


signature.asc
Description: PGP signature


[PATCH v4 08/18] linux-user: Add gen-vdso tool

2023-08-16 Thread Richard Henderson
This tool will be used for post-processing the linked vdso image,
turning it into something that is easy to include into elfload.c.

Signed-off-by: Richard Henderson 
---
 linux-user/gen-vdso.c  | 223 
 linux-user/gen-vdso-elfn.c.inc | 307 +
 linux-user/meson.build |   6 +-
 3 files changed, 535 insertions(+), 1 deletion(-)
 create mode 100644 linux-user/gen-vdso.c
 create mode 100644 linux-user/gen-vdso-elfn.c.inc

diff --git a/linux-user/gen-vdso.c b/linux-user/gen-vdso.c
new file mode 100644
index 00..a6c61d2f6e
--- /dev/null
+++ b/linux-user/gen-vdso.c
@@ -0,0 +1,223 @@
+/*
+ * Post-process a vdso elf image for inclusion into qemu.
+ *
+ * Copyright 2023 Linaro, Ltd.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "elf.h"
+
+
+#define bswap_(p)  _Generic(*(p), \
+uint16_t: __builtin_bswap16,   \
+uint32_t: __builtin_bswap32,   \
+uint64_t: __builtin_bswap64,   \
+int16_t: __builtin_bswap16,\
+int32_t: __builtin_bswap32,\
+int64_t: __builtin_bswap64)
+#define bswaps(p) (*(p) = bswap_(p)(*(p)))
+
+static void output_reloc(FILE *outf, void *buf, void *loc)
+{
+fprintf(outf, "0x%08lx,\n", (unsigned long)(loc - buf));
+}
+
+static const char *sigreturn_sym;
+static const char *rt_sigreturn_sym;
+
+static unsigned sigreturn_addr;
+static unsigned rt_sigreturn_addr;
+
+#define N 32
+#define elfN(x)  elf32_##x
+#define ElfN(x)  Elf32_##x
+#include "gen-vdso-elfn.c.inc"
+#undef N
+#undef elfN
+#undef ElfN
+
+#define N 64
+#define elfN(x)  elf64_##x
+#define ElfN(x)  Elf64_##x
+#include "gen-vdso-elfn.c.inc"
+#undef N
+#undef elfN
+#undef ElfN
+
+
+int main(int argc, char **argv)
+{
+FILE *inf, *outf;
+long total_len;
+const char *prefix = "vdso";
+const char *inf_name;
+const char *outf_name = NULL;
+unsigned char *buf;
+bool need_bswap;
+
+while (1) {
+int opt = getopt(argc, argv, "o:p:r:s:");
+if (opt < 0) {
+break;
+}
+switch (opt) {
+case 'o':
+outf_name = optarg;
+break;
+case 'p':
+prefix = optarg;
+break;
+case 'r':
+rt_sigreturn_sym = optarg;
+break;
+case 's':
+sigreturn_sym = optarg;
+break;
+default:
+usage:
+fprintf(stderr, "usage: [-p prefix] [-r rt-sigreturn-name] "
+"[-s sigreturn-name] -o output-file input-file\n");
+return EXIT_FAILURE;
+}
+}
+
+if (optind >= argc || outf_name == NULL) {
+goto usage;
+}
+inf_name = argv[optind];
+
+/*
+ * Open the input and output files.
+ */
+inf = fopen(inf_name, "rb");
+if (inf == NULL) {
+goto perror_inf;
+}
+outf = fopen(outf_name, "w");
+if (outf == NULL) {
+goto perror_outf;
+}
+
+/*
+ * Read the input file into a buffer.
+ * We expect the vdso to be small, on the order of one page,
+ * therefore we do not expect a partial read.
+ */
+fseek(inf, 0, SEEK_END);
+total_len = ftell(inf);
+fseek(inf, 0, SEEK_SET);
+
+buf = malloc(total_len);
+if (buf == NULL) {
+goto perror_inf;
+}
+
+errno = 0;
+if (fread(buf, 1, total_len, inf) != total_len) {
+if (errno) {
+goto perror_inf;
+}
+fprintf(stderr, "%s: incomplete read\n", inf_name);
+return EXIT_FAILURE;
+}
+fclose(inf);
+
+/*
+ * Write out the vdso image now, before we make local changes.
+ */
+
+fprintf(outf,
+"/* Automatically generated from linux-user/gen-vdso.c. */\n"
+"\n"
+"static const uint8_t %s_image[] = {",
+prefix);
+for (long i = 0; i < total_len; ++i) {
+if (i % 12 == 0) {
+fputs("\n   ", outf);
+}
+fprintf(outf, " 0x%02x,", buf[i]);
+}
+fprintf(outf, "\n};\n\n");
+
+/*
+ * Identify which elf flavor we're processing.
+ * The first 16 bytes of the file are e_ident.
+ */
+
+if (buf[EI_MAG0] != ELFMAG0 || buf[EI_MAG1] != ELFMAG1 ||
+buf[EI_MAG2] != ELFMAG2 || buf[EI_MAG3] != ELFMAG3) {
+fprintf(stderr, "%s: not an elf file\n", inf_name);
+return EXIT_FAILURE;
+}
+switch (buf[EI_DATA]) {
+case ELFDATA2LSB:
+need_bswap = BYTE_ORDER != LITTLE_ENDIAN;
+break;
+case ELFDATA2MSB:
+need_bswap = BYTE_ORDER != BIG_ENDIAN;
+break;
+default:
+fprintf(stderr, "%s: invalid elf EI_DATA (%u)\n",
+inf_name, buf[EI_DATA]);
+return EXIT_FAILURE;
+

[PATCH v4 06/25] hw/core/cpu: Replace gdb_core_xml_file with gdb_core_feature

2023-08-16 Thread Akihiko Odaki
This is a tree-wide change to replace gdb_core_xml_file, the path to
GDB XML file with gdb_core_feature, the pointer to GDBFeature. This
also replaces the values assigned to gdb_num_core_regs with the
num_regs member of GDBFeature where applicable to remove magic numbers.

A following change will utilize additional information provided by
GDBFeature to simplify XML file lookup.

Signed-off-by: Akihiko Odaki 
---
 include/hw/core/cpu.h   | 5 +++--
 target/s390x/cpu.h  | 2 --
 gdbstub/gdbstub.c   | 6 +++---
 target/arm/cpu.c| 4 ++--
 target/arm/cpu64.c  | 4 ++--
 target/arm/tcg/cpu32.c  | 3 ++-
 target/avr/cpu.c| 4 ++--
 target/hexagon/cpu.c| 2 +-
 target/i386/cpu.c   | 7 +++
 target/loongarch/cpu.c  | 4 ++--
 target/m68k/cpu.c   | 7 ---
 target/microblaze/cpu.c | 4 ++--
 target/ppc/cpu_init.c   | 4 ++--
 target/riscv/cpu.c  | 7 ---
 target/rx/cpu.c | 4 ++--
 target/s390x/cpu.c  | 4 ++--
 16 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index fdcbe87352..84219c1885 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -23,6 +23,7 @@
 #include "hw/qdev-core.h"
 #include "disas/dis-asm.h"
 #include "exec/cpu-common.h"
+#include "exec/gdbstub.h"
 #include "exec/hwaddr.h"
 #include "exec/memattrs.h"
 #include "qapi/qapi-types-run-state.h"
@@ -127,7 +128,7 @@ struct SysemuCPUOps;
  *   breakpoint.  Used by AVR to handle a gdb mis-feature with
  *   its Harvard architecture split code and data.
  * @gdb_num_core_regs: Number of core registers accessible to GDB.
- * @gdb_core_xml_file: File name for core registers GDB XML description.
+ * @gdb_core_feature: GDB core feature description.
  * @gdb_stop_before_watchpoint: Indicates whether GDB expects the CPU to stop
  *   before the insn which triggers a watchpoint rather than after it.
  * @gdb_arch_name: Optional callback that returns the architecture name known
@@ -163,7 +164,7 @@ struct CPUClass {
 int (*gdb_write_register)(CPUState *cpu, uint8_t *buf, int reg);
 vaddr (*gdb_adjust_breakpoint)(CPUState *cpu, vaddr addr);
 
-const char *gdb_core_xml_file;
+const GDBFeature *gdb_core_feature;
 gchar * (*gdb_arch_name)(CPUState *cpu);
 const char * (*gdb_get_dynamic_xml)(CPUState *cpu, const char *xmlname);
 
diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index eb5b65b7d3..c5bac3230c 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -451,8 +451,6 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState *env, 
vaddr *pc,
 #define S390_R13_REGNUM 15
 #define S390_R14_REGNUM 16
 #define S390_R15_REGNUM 17
-/* Total Core Registers. */
-#define S390_NUM_CORE_REGS 18
 
 static inline void setcc(S390CPU *cpu, uint64_t cc)
 {
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 5829e82073..293e8ea439 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -386,7 +386,7 @@ static const char *get_feature_xml(const char *p, const 
char **newp,
 g_free(arch);
 }
 pstrcat(buf, buf_sz, "gdb_core_xml_file);
+pstrcat(buf, buf_sz, cc->gdb_core_feature->xmlname);
 pstrcat(buf, buf_sz, "\"/>");
 for (r = cpu->gdb_regs; r; r = r->next) {
 pstrcat(buf, buf_sz, "gdb_core_xml_file) {
+if (cc->gdb_core_feature) {
 g_string_append(gdbserver_state.str_buf, ";qXfer:features:read+");
 }
 
@@ -1548,7 +1548,7 @@ static void handle_query_xfer_features(GArray *params, 
void *user_ctx)
 
 process = gdb_get_cpu_process(gdbserver_state.g_cpu);
 cc = CPU_GET_CLASS(gdbserver_state.g_cpu);
-if (!cc->gdb_core_xml_file) {
+if (!cc->gdb_core_feature) {
 gdb_put_packet("");
 return;
 }
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index d71a162070..a206ab6b1b 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2353,7 +2353,6 @@ static void arm_cpu_class_init(ObjectClass *oc, void 
*data)
 #ifndef CONFIG_USER_ONLY
 cc->sysemu_ops = &arm_sysemu_ops;
 #endif
-cc->gdb_num_core_regs = 26;
 cc->gdb_arch_name = arm_gdb_arch_name;
 cc->gdb_get_dynamic_xml = arm_gdb_get_dynamic_xml;
 cc->gdb_stop_before_watchpoint = true;
@@ -2378,7 +2377,8 @@ static void cpu_register_class_init(ObjectClass *oc, void 
*data)
 CPUClass *cc = CPU_CLASS(acc);
 
 acc->info = data;
-cc->gdb_core_xml_file = "arm-core.xml";
+cc->gdb_core_feature = gdb_find_static_feature("arm-core.xml");
+cc->gdb_num_core_regs = cc->gdb_core_feature->num_regs;
 }
 
 void arm_cpu_register(const ARMCPUInfo *info)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 96158093cc..9c2a226159 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -754,8 +754,8 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void 
*data)
 
 cc->gdb_read_register = aarch64_cpu_gdb_read_register;
 cc->gdb_write_register = aarch64_cpu_gdb_write_register;
-cc->gdb_n

[PATCH 3/3] tcg/i386: Allow immediate as input to deposit_*

2023-08-16 Thread Richard Henderson
We can use MOVB and MOVW with an immediate just as easily
as with a register input.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target-con-set.h |  2 +-
 tcg/i386/tcg-target.c.inc | 26 ++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index 3949d49538..7d00a7dde8 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -33,7 +33,7 @@ C_O1_I1(r, q)
 C_O1_I1(r, r)
 C_O1_I1(x, r)
 C_O1_I1(x, x)
-C_O1_I2(q, 0, q)
+C_O1_I2(q, 0, qi)
 C_O1_I2(q, r, re)
 C_O1_I2(r, 0, ci)
 C_O1_I2(r, 0, r)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index ba40dd0f4d..3045b56002 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -276,6 +276,7 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define OPC_MOVL_GvEv  (0x8b)  /* loads, more or less */
 #define OPC_MOVB_EvIz   (0xc6)
 #define OPC_MOVL_EvIz  (0xc7)
+#define OPC_MOVB_Ib (0xb0)
 #define OPC_MOVL_Iv (0xb8)
 #define OPC_MOVBE_GyMy  (0xf0 | P_EXT38)
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
@@ -2750,13 +2751,30 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(deposit):
 if (args[3] == 0 && args[4] == 8) {
 /* load bits 0..7 */
-tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
+if (const_a2) {
+tcg_out_opc(s, OPC_MOVB_Ib | P_REXB_RM | LOWREGMASK(a0),
+0, a0, 0);
+tcg_out8(s, a2);
+} else {
+tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
+}
 } else if (TCG_TARGET_REG_BITS == 32 && args[3] == 8 && args[4] == 8) {
 /* load bits 8..15 */
-tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
+if (const_a2) {
+tcg_out8(s, OPC_MOVB_Ib + a0 + 4);
+tcg_out8(s, a2);
+} else {
+tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
+}
 } else if (args[3] == 0 && args[4] == 16) {
 /* load bits 0..15 */
-tcg_out_modrm(s, OPC_MOVL_EvGv | P_DATA16, a2, a0);
+if (const_a2) {
+tcg_out_opc(s, OPC_MOVL_Iv | P_DATA16 | LOWREGMASK(a0),
+0, a0, 0);
+tcg_out16(s, a2);
+} else {
+tcg_out_modrm(s, OPC_MOVL_EvGv | P_DATA16, a2, a0);
+}
 } else {
 g_assert_not_reached();
 }
@@ -3311,7 +3329,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-return C_O1_I2(q, 0, q);
+return C_O1_I2(q, 0, qi);
 
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
-- 
2.34.1




Re: [PATCH V1 2/3] migration: fix suspended runstate

2023-08-16 Thread Steven Sistare
On 8/14/2023 3:37 PM, Peter Xu wrote:
> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>> Can we just call vm_state_notify() earlier?
>>
>> We cannot.  The guest is not running yet, and will not be until later.
>> We cannot call notifiers that perform actions that complete, or react to, 
>> the guest entering a running state.
> 
> I tried to look at a few examples of the notifees and most of them I read
> do not react to "vcpu running" but "vm running" (in which case I think
> "suspended" mode falls into "vm running" case); most of them won't care on
> the RunState parameter passed in, but only the bool "running".
> 
> In reality, when running=true, it must be RUNNING so far.
> 
> In that case does it mean we should notify right after the switchover,
> since after migration the vm is indeed running only if the vcpus are not
> during suspend?

I cannot parse your question, but maybe this answers it.
If the outgoing VM is running and not suspended, then the incoming side
tests for autostart==true and calls vm_start, which calls the notifiers,
right after the switchover.

> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
> device; this kind of prove to me that SUSPEND is actually one of
> running=true states.
> 
> If we postpone all notifiers here even after we switched over to dest qemu
> to the next upcoming suspend wakeup, I think it means these devices will
> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
> VFIO_DEVICE_STATE_STOP.

or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
AFAIK it is OK to remain in that state until wakeup is called later.

> Ideally I think we should here call vm_state_notify() with running=true and
> state=SUSPEND, but since I do see some hooks are not well prepared for
> SUSPEND over running=true, I'd think we should on the safe side call
> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
> over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
> later we just need to call no notifiers).

Notifiers are just one piece, all the code in vm_prepare_start must be called.
Is it correct to call all of that long before we actually resume the CPUs in
wakeup?  I don't know, but what is the point?  The wakeup code still needs
modification to conditionally resume the vcpus.  The scheme would be roughly:

loadvm_postcopy_handle_run_bh()
runstat = global_state_get_runstate();
if (runstate == RUN_STATE_RUNNING) {
vm_start()
} else if (runstate == RUN_STATE_SUSPENDED)
vm_prepare_start();   // the start of vm_start()
}

qemu_system_wakeup_request()
if (some condition)
resume_all_vcpus();   // the remainder of vm_start()
else
runstate_set(RUN_STATE_RUNNING)

How is that better than my patches
[PATCH V3 01/10] vl: start on wakeup request
[PATCH V3 02/10] migration: preserve suspended runstate

loadvm_postcopy_handle_run_bh()
runstate = global_state_get_runstate();
if (runstate == RUN_STATE_RUNNING)
vm_start()
else
runstate_set(runstate);// eg RUN_STATE_SUSPENDED

qemu_system_wakeup_request()
if (!vm_started)
vm_start();
else
runstate_set(RUN_STATE_RUNNING);

Recall this thread started with your comment "It then can avoid touching the 
system wakeup code which seems cleaner".  We still need to touch the wakeup
code.

- Steve



Re: [RFC PATCH 15/24] target/arm: Fill new members of GDBFeature

2023-08-16 Thread Alex Bennée


Akihiko Odaki  writes:

> On 2023/08/14 23:56, Alex Bennée wrote:
>> Akihiko Odaki  writes:
>> 
>>> These members will be used to help plugins to identify registers.
>>>
>>> Signed-off-by: Akihiko Odaki 
>>> ---
>>>   target/arm/gdbstub.c   | 46 +++---
>>>   target/arm/gdbstub64.c | 42 +-
>>>   2 files changed, 58 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/target/arm/gdbstub.c b/target/arm/gdbstub.c
>>> index 100a6eed15..56d24028f6 100644
>>> --- a/target/arm/gdbstub.c
>>> +++ b/target/arm/gdbstub.c
>>> @@ -270,6 +270,7 @@ static void arm_gen_one_feature_sysreg(GString *s,
>>>   g_string_append_printf(s, " regnum=\"%d\"", regnum);
>>>   g_string_append_printf(s, " group=\"cp_regs\"/>");
>>>   dyn_feature->data.cpregs.keys[dyn_feature->desc.num_regs] = ri_key;
>>> +((const char **)dyn_feature->desc.regs)[dyn_feature->desc.num_regs] = 
>>> ri->name;
>>>   dyn_feature->desc.num_regs++;
>>>   }
>>>   @@ -316,6 +317,8 @@ static GDBFeature
>>> *arm_gen_dynamic_sysreg_feature(CPUState *cs, int base_reg)
>>>   DynamicGDBFeatureInfo *dyn_feature = &cpu->dyn_sysreg_feature;
>>>   gsize num_regs = g_hash_table_size(cpu->cp_regs);
>>>   +dyn_feature->desc.name = "org.qemu.gdb.arm.sys.regs";
>>> +dyn_feature->desc.regs = g_new(const char *, num_regs);
>> AIUI this means we now have an array of register names which mirrors
>> the
>> names embedded in the XML. This smells like a few steps away from just
>> abstracting the whole XML away from the targets and generating them
>> inside gdbstub when we need them. As per my stalled attempt I referenced
>> earlier.
>
> The abstraction is strictly limited for identifiers. Most plugin
> should already have some knowledge of how registers are used. For
> example, a plugin that tracks stack frame for RISC-V should know sp is
> the stack pointer register. Similarly, a cycle simulator plugin should
> know how registers are used in a program. Only identifiers matter in
> such cases.
>
> I'm definitely *not* in favor of abstracting the whole XML for
> plugins. It will be too hard to maintain ABI compatibility when a new
> attribute emerges, for example.

No I agree the XML shouldn't go near the plugins. I was just looking to
avoid having an XML builder for every target.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v5 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-16 Thread Akihiko Odaki

On 2023/08/16 0:50, Gurchetan Singh wrote:



On Tue, Aug 15, 2023 at 8:07 AM Akihiko Odaki > wrote:


On 2023/08/15 9:35, Gurchetan Singh wrote:
 > This adds initial support for gfxstream and cross-domain.  Both
 > features rely on virtio-gpu blob resources and context types, which
 > are also implemented in this patch.
 >
 > gfxstream has a long and illustrious history in Android graphics
 > paravirtualization.  It has been powering graphics in the Android
 > Studio Emulator for more than a decade, which is the main developer
 > platform.
 >
 > Originally conceived by Jesse Hall, it was first known as "EmuGL"
[a].
 > The key design characteristic was a 1:1 threading model and
 > auto-generation, which fit nicely with the OpenGLES spec.  It also
 > allowed easy layering with ANGLE on the host, which provides the GLES
 > implementations on Windows or MacOS enviroments.
 >
 > gfxstream has traditionally been maintained by a single engineer, and
 > between 2015 to 2021, the goldfish throne passed to Frank Yang.
 > Historians often remark this glorious reign ("pax gfxstreama" is the
 > academic term) was comparable to that of Augustus and both Queen
 > Elizabeths.  Just to name a few accomplishments in a resplendent
 > panoply: higher versions of GLES, address space graphics, snapshot
 > support and CTS compliant Vulkan [b].
 >
 > One major drawback was the use of out-of-tree goldfish drivers.
 > Android engineers didn't know much about DRM/KMS and especially
TTM so
 > a simple guest to host pipe was conceived.
 >
 > Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
 > the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
 > port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
 > It was a symbol compatible replacement of virglrenderer [c] and named
 > "AVDVirglrenderer".  This implementation forms the basis of the
 > current gfxstream host implementation still in use today.
 >
 > cross-domain support follows a similar arc.  Originally conceived by
 > Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
 > 2018, it initially relied on the downstream "virtio-wl" device.
 >
 > In 2020 and 2021, virtio-gpu was extended to include blob resources
 > and multiple timelines by yours truly, features
gfxstream/cross-domain
 > both require to function correctly.
 >
 > Right now, we stand at the precipice of a truly fantastic
possibility:
 > the Android Emulator powered by upstream QEMU and upstream Linux
 > kernel.  gfxstream will then be packaged properfully, and app
 > developers can even fix gfxstream bugs on their own if they encounter
 > them.
 >
 > It's been quite the ride, my friends.  Where will gfxstream head
next,
 > nobody really knows.  I wouldn't be surprised if it's around for
 > another decade, maintained by a new generation of Android graphics
 > enthusiasts.
 >
 > Technical details:
 >    - Very simple initial display integration: just used Pixman
 >    - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga
function
 >      calls
 >
 > Next steps for Android VMs:
 >    - The next step would be improving display integration and UI
interfaces
 >      with the goal of the QEMU upstream graphics being in an emulator
 >      release [d].
 >
 > Next steps for Linux VMs for display virtualization:
 >    - For widespread distribution, someone needs to package
Sommelier or the
 >      wayland-proxy-virtwl [e] ideally into Debian main. In
addition, newer
 >      versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS
option,
 >      which allows disabling KMS hypercalls.  If anyone cares
enough, it'll
 >      probably be possible to build a custom VM variant that uses
this display
 >      virtualization strategy.
 >
 > [a]
https://android-review.googlesource.com/c/platform/development/+/34470 

 > [b]
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22 

 > [c]
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927 

 > [d] https://developer.android.com/studio/releases/emulator

 > [e] https://github.com/talex5/wayland-proxy-virtwl

 >
 > Signed-off-by: Gurchetan Singh mailto:gurchetansi...@chromium.org>>
 > Tested-by: Alyssa Ross mailto:h.

How to synchronize CPUs on MMIO read?

2023-08-16 Thread Igor Lesik
Hi.
I need to model some custom HW that synchronizes CPUs when they read MMIO 
register N: MMIO read does not return until another CPU writes to MMIO register 
M. I modeled this behavior with a) on MMIO read of N, save CPU into a list of 
waiting CPUs and put it asleep with cpu_interrupt(current_cpu, 
CPU_INTERRUPT_HALT) and b) on MMIO write to M, wake all waiting CPUs with 
cpu->halted = 0; qemu_cpu_kick(cpu). It seems to work fine. However, this HW 
has a twist: MMIO read of N returns a value that was written by MMIO write to 
M. Can anyone please advise how this could be done?

Thanks!
Igor


Re: [PATCH v4 06/25] hw/core/cpu: Replace gdb_core_xml_file with gdb_core_feature

2023-08-16 Thread Akihiko Odaki

On 2023/08/17 0:58, Richard Henderson wrote:

On 8/16/23 07:51, Akihiko Odaki wrote:

diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index f155936289..b54162cbeb 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -391,7 +391,7 @@ static void hexagon_cpu_class_init(ObjectClass *c, 
void *data)

  cc->gdb_write_register = hexagon_gdb_write_register;
  cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
  cc->gdb_stop_before_watchpoint = true;
-    cc->gdb_core_xml_file = "hexagon-core.xml";
+    cc->gdb_core_feature = gdb_find_static_feature("hexagon-core.xml");


Missing the change to init cc->gdb_num_core_regs.
(Which presumably itself will go away at some point.)


It is initialized earlier with:
cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;

I had no motivation to change this since it has a macro definition for 
the number of registers used elsewhere.





diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 02b7aad9b0..eb56226865 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -7381,9 +7381,9 @@ static void ppc_cpu_class_init(ObjectClass *oc, 
void *data)

  cc->gdb_arch_name = ppc_gdb_arch_name;
  #if defined(TARGET_PPC64)
-    cc->gdb_core_xml_file = "power64-core.xml";
+    cc->gdb_core_feature = gdb_find_static_feature("power64-core.xml");
  #else
-    cc->gdb_core_xml_file = "power-core.xml";
+    cc->gdb_core_feature = gdb_find_static_feature("power-core.xml");
  #endif
  cc->disas_set_info = ppc_disas_set_info;


Likewise.


It is initialized earlier too but with values different from what the 
XMLs say for compatibility with old GDB.






r~




[PATCH v4 21/25] cpu: Call plugin hooks only when ready

2023-08-16 Thread Akihiko Odaki
The initialization and exit hooks will not affect the state of vCPU,
but they may depend on the state of vCPU. Therefore, it's better to
call plugin hooks after the vCPU state is fully initialized and before
it gets uninitialized.

Signed-off-by: Akihiko Odaki 
---
 cpu.c| 11 ---
 hw/core/cpu-common.c | 10 ++
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/cpu.c b/cpu.c
index 1c948d1161..2552c85249 100644
--- a/cpu.c
+++ b/cpu.c
@@ -42,7 +42,6 @@
 #include "hw/core/accel-cpu.h"
 #include "trace/trace-root.h"
 #include "qemu/accel.h"
-#include "qemu/plugin.h"
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
@@ -148,11 +147,6 @@ void cpu_exec_realizefn(CPUState *cpu, Error **errp)
 /* Wait until cpu initialization complete before exposing cpu. */
 cpu_list_add(cpu);
 
-/* Plugin initialization must wait until cpu_index assigned. */
-if (tcg_enabled()) {
-qemu_plugin_vcpu_init_hook(cpu);
-}
-
 #ifdef CONFIG_USER_ONLY
 assert(qdev_get_vmsd(DEVICE(cpu)) == NULL ||
qdev_get_vmsd(DEVICE(cpu))->unmigratable);
@@ -179,11 +173,6 @@ void cpu_exec_unrealizefn(CPUState *cpu)
 }
 #endif
 
-/* Call the plugin hook before clearing cpu->cpu_index in cpu_list_remove 
*/
-if (tcg_enabled()) {
-qemu_plugin_vcpu_exit_hook(cpu);
-}
-
 cpu_list_remove(cpu);
 /*
  * Now that the vCPU has been removed from the RCU list, we can call
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 549f52f46f..e06a70007a 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -211,6 +211,11 @@ static void cpu_common_realizefn(DeviceState *dev, Error 
**errp)
 cpu_resume(cpu);
 }
 
+/* Plugin initialization must wait until the cpu is fully realized. */
+if (tcg_enabled()) {
+qemu_plugin_vcpu_init_hook(cpu);
+}
+
 /* NOTE: latest generic point where the cpu is fully realized */
 }
 
@@ -218,6 +223,11 @@ static void cpu_common_unrealizefn(DeviceState *dev)
 {
 CPUState *cpu = CPU(dev);
 
+/* Call the plugin hook before clearing the cpu is fully unrealized */
+if (tcg_enabled()) {
+qemu_plugin_vcpu_exit_hook(cpu);
+}
+
 /* NOTE: latest generic point before the cpu is fully unrealized */
 cpu_exec_unrealizefn(cpu);
 }
-- 
2.41.0




[PATCH] target/riscv: Allocate itrigger timers only once

2023-08-16 Thread Akihiko Odaki
riscv_trigger_init() had been called on reset events that can happen
several times for a CPU and it allocated timers for itrigger. If old
timers were present, they were simply overwritten by the new timers,
resulting in a memory leak.

Divide riscv_trigger_init() into two functions, namely
riscv_trigger_realize() and riscv_trigger_reset() and call them in
appropriate timing. The timer allocation will happen only once for a
CPU in riscv_trigger_realize().

Fixes: 5a4ae64cac ("target/riscv: Add itrigger support when icount is enabled")
Signed-off-by: Akihiko Odaki 
---
 target/riscv/debug.h |  3 ++-
 target/riscv/cpu.c   |  8 +++-
 target/riscv/debug.c | 15 ---
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/target/riscv/debug.h b/target/riscv/debug.h
index c471748d5a..7edc31e7cc 100644
--- a/target/riscv/debug.h
+++ b/target/riscv/debug.h
@@ -143,7 +143,8 @@ void riscv_cpu_debug_excp_handler(CPUState *cs);
 bool riscv_cpu_debug_check_breakpoint(CPUState *cs);
 bool riscv_cpu_debug_check_watchpoint(CPUState *cs, CPUWatchpoint *wp);
 
-void riscv_trigger_init(CPURISCVState *env);
+void riscv_trigger_realize(CPURISCVState *env);
+void riscv_trigger_reset(CPURISCVState *env);
 
 bool riscv_itrigger_enabled(CPURISCVState *env);
 void riscv_itrigger_update_priv(CPURISCVState *env);
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index e12b6ef7f6..3bc3f96a58 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -904,7 +904,7 @@ static void riscv_cpu_reset_hold(Object *obj)
 
 #ifndef CONFIG_USER_ONLY
 if (cpu->cfg.debug) {
-riscv_trigger_init(env);
+riscv_trigger_reset(env);
 }
 
 if (kvm_enabled()) {
@@ -1475,6 +1475,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 
 riscv_cpu_register_gdb_regs_for_features(cs);
 
+#ifndef CONFIG_USER_ONLY
+if (cpu->cfg.debug) {
+riscv_trigger_realize(&cpu->env);
+}
+#endif
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
diff --git a/target/riscv/debug.c b/target/riscv/debug.c
index 75ee1c4971..1c44403205 100644
--- a/target/riscv/debug.c
+++ b/target/riscv/debug.c
@@ -903,7 +903,17 @@ bool riscv_cpu_debug_check_watchpoint(CPUState *cs, 
CPUWatchpoint *wp)
 return false;
 }
 
-void riscv_trigger_init(CPURISCVState *env)
+void riscv_trigger_realize(CPURISCVState *env)
+{
+int i;
+
+for (i = 0; i < RV_MAX_TRIGGERS; i++) {
+env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  riscv_itrigger_timer_cb, env);
+}
+}
+
+void riscv_trigger_reset(CPURISCVState *env)
 {
 target_ulong tdata1 = build_tdata1(env, TRIGGER_TYPE_AD_MATCH, 0, 0);
 int i;
@@ -928,7 +938,6 @@ void riscv_trigger_init(CPURISCVState *env)
 env->tdata3[i] = 0;
 env->cpu_breakpoint[i] = NULL;
 env->cpu_watchpoint[i] = NULL;
-env->itrigger_timer[i] = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-  riscv_itrigger_timer_cb, env);
+timer_del(env->itrigger_timer[i]);
 }
 }
-- 
2.41.0




Re: [PATCH v4 10/25] target/riscv: Use GDBFeature for dynamic XML

2023-08-16 Thread Richard Henderson

On 8/16/23 07:51, Akihiko Odaki wrote:

-if (csr_ops[i].name) {
-g_string_append_printf(s, "

You are now leaking name.


r~



Re: [PATCH v4 09/25] target/ppc: Use GDBFeature for dynamic XML

2023-08-16 Thread Richard Henderson

On 8/16/23 07:51, Akihiko Odaki wrote:

In preparation for a change to use GDBFeature as a parameter of
gdb_register_coprocessor(), convert the internal representation of
dynamic feature from plain XML to GDBFeature.

Signed-off-by: Akihiko Odaki
---
  target/ppc/cpu-qom.h  |  3 +--
  target/ppc/cpu.h  |  2 +-
  target/ppc/cpu_init.c |  2 +-
  target/ppc/gdbstub.c  | 45 ++-
  4 files changed, 17 insertions(+), 35 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 08/25] target/arm: Use GDBFeature for dynamic XML

2023-08-16 Thread Richard Henderson

On 8/16/23 07:51, Akihiko Odaki wrote:

In preparation for a change to use GDBFeature as a parameter of
gdb_register_coprocessor(), convert the internal representation of
dynamic feature from plain XML to GDBFeature.

Signed-off-by: Akihiko Odaki
---
  target/arm/cpu.h   |  20 +++---
  target/arm/internals.h |   2 +-
  target/arm/gdbstub.c   | 134 ++---
  target/arm/gdbstub64.c |  90 ---
  4 files changed, 108 insertions(+), 138 deletions(-)


This is quite large, and I think you could have converted the different subsystems one at 
a time (especially since you renamed the structure, and so both could exist side-by-side). 
 But I won't insist.


Acked-by: Richard Henderson 


r~



Re: [PATCH] subprojects/berkeley-testfloat-3: Update to fix a problem with compiler warnings

2023-08-16 Thread Alex Bennée


Peter Maydell  writes:

> On Wed, 16 Aug 2023 at 10:16, Thomas Huth  wrote:
>>
>> Update the berkeley-testfloat-3 wrap to include a patch provided by
>> Olaf Hering. This fixes a problem with "control reaches end of non-void
>> function [-Werror=return-type]" compiler warning/errors that are now
>> enabled by default in certain versions of GCC.
>>
>> Reported-by: Olaf Hering 
>> Signed-off-by: Thomas Huth 
>> ---
>>  subprojects/berkeley-testfloat-3.wrap | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> This seems like a reasonable place to ask: should we just pull
> in the testfloat and softfloat repos to be part of the main
> qemu repo?

We've definitely forked the softfloat inside QEMU with the refactor some
time ago. For the testing repos we have lightly modified them to build
the test code but only by a few patches. We might want to keep the
ability to re-base on a new release if say test float gains fp16 or
bfloat16 support. 

> AIUI we've definitively forked both of these, so
> we don't care about trying to make it easy to resync with
> upstream. Having them in separate git repos seems to have some
> clear disadvantages:
>  * it's harder to update them
>  * changes to them can end up skipping the usual code
>review process, because it's a different patch flow
>to the normal one
>  * we get extra meson subproject infrastructure to deal with
>
> Are there any reasons to keep them separate ?
>
> thanks
> -- PMM


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



  1   2   3   >