date:20221215

On Wed, Dec 14, 2022 at 5:43 PM Fiona Ebner  wrote:
>
> Am 25.08.22 um 11:29 schrieb Fiona Ebner:
> > Currently, VMXNET3_MAX_MTU itself (being 9000) is not considered a
> > valid value for the MTU, but a guest running ESXi 7.0 might try to
> > set it and fail the assert [0].
> >
> > In the Linux kernel, dev->max_mtu itself is a valid value for the MTU
> > and for the vmxnet3 driver it's 9000, so a guest running Linux will
> > also fail the assert when trying to set an MTU of 9000.
> >
> > VMXNET3_MAX_MTU and s->mtu don't seem to be used in relation to buffer
> > allocations/accesses, so allowing the upper limit itself as a value
> > should be fine.
> >
> > [0]: https://forum.proxmox.com/threads/114011/
> >
> > Fixes: d05dcd94ae ("net: vmxnet3: validate configuration values during 
> > activate (CVE-2021-20203)")
> > Signed-off-by: Fiona Ebner 
> > ---
> >
> > Feel free to adapt the commit message as you see fit.
> >
> > v1 -> v2:
> > * Add commit message with some rationale.
> >
> >  hw/net/vmxnet3.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> > index 0b7acf7f89..a2037583bf 100644
> > --- a/hw/net/vmxnet3.c
> > +++ b/hw/net/vmxnet3.c
> > @@ -1441,7 +1441,7 @@ static void vmxnet3_activate_device(VMXNET3State *s)
> >  vmxnet3_setup_rx_filtering(s);
> >  /* Cache fields from shared memory */
> >  s->mtu = VMXNET3_READ_DRV_SHARED32(d, s->drv_shmem, devRead.misc.mtu);
> > -assert(VMXNET3_MIN_MTU <= s->mtu && s->mtu < VMXNET3_MAX_MTU);
> > +assert(VMXNET3_MIN_MTU <= s->mtu && s->mtu <= VMXNET3_MAX_MTU);
> >  VMW_CFPRN("MTU is %u", s->mtu);
> >
> >  s->max_rx_frags =
>
> Ping

I've queued this patch.

Thanks

>

Re: [PATCH v2 0/3] Fix the "-nic help" option

On Thu, Dec 15, 2022 at 11:23 PM Thomas Huth  wrote:
>
> On 10/11/2022 13.52, Thomas Huth wrote:
> > Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier
> > versions, but since QEMU 6.0 it just complains that "help" is not
> > a valid value here. This patch series fixes this problem and also
> > extends the help output here to list the available NIC models, too.
> >
> > v2:
> >   - Add function comment in the first patch
> >   - Add Reviewed-by in the third patch
> >
> > Thomas Huth (3):
> >net: Move the code to collect available NIC models to a separate
> >  function
> >net: Restore printing of the help text with "-nic help"
> >net: Replace "Supported NIC models" with "Available NIC models"
> >
> >   include/net/net.h | 14 +
> >   hw/pci/pci.c  | 29 +--
> >   net/net.c | 50 ---
> >   3 files changed, 62 insertions(+), 31 deletions(-)
> >
>
> Friendly ping!

I've queued this series.

Thanks

>
>   Thomas
>

Re: [PATCH v9 12/12] vdpa: always start CVQ in SVQ mode if possible

On Thu, Dec 15, 2022 at 7:32 PM Eugenio Pérez  wrote:
>
> Isolate control virtqueue in its own group, allowing to intercept control
> commands but letting dataplane run totally passthrough to the guest.
>
> Signed-off-by: Eugenio Pérez 

Acked-by: Jason Wang 

Thanks

> ---
> v9:
> * Reuse iova_range fetched from the device at initialization, instead of
>   fetch it again at vhost_vdpa_net_cvq_start.
> * Add comment about how migration is blocked in case ASID does not met
>   our expectations.
> * Delete warning about CVQ group not being independent.
>
> v8:
> * Do not allocate iova_tree on net_init_vhost_vdpa if only CVQ is
>   shadowed. Move the iova_tree handling in this case to
>   vhost_vdpa_net_cvq_start and vhost_vdpa_net_cvq_stop.
>
> v7:
> * Never ask for number of address spaces, just react if isolation is not
>   possible.
> * Return ASID ioctl errors instead of masking them as if the device has
>   no asid.
> * Simplify net_init_vhost_vdpa logic
> * Add "if possible" suffix
>
> v6:
> * Disable control SVQ if the device does not support it because of
> features.
>
> v5:
> * Fixing the not adding cvq buffers when x-svq=on is specified.
> * Move vring state in vhost_vdpa_get_vring_group instead of using a
>   parameter.
> * Rename VHOST_VDPA_NET_CVQ_PASSTHROUGH to VHOST_VDPA_NET_DATA_ASID
>
> v4:
> * Squash vhost_vdpa_cvq_group_is_independent.
> * Rebased on last CVQ start series, that allocated CVQ cmd bufs at load
> * Do not check for cvq index on vhost_vdpa_net_prepare, we only have one
>   that callback registered in that NetClientInfo.
>
> v3:
> * Make asid related queries print a warning instead of returning an
>   error and stop the start of qemu.
> ---
>  hw/virtio/vhost-vdpa.c |   3 +-
>  net/vhost-vdpa.c   | 110 -
>  2 files changed, 111 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 48d8c60e76..8cd00f5a96 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -638,7 +638,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev 
> *dev)
>  {
>  uint64_t features;
>  uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
> -0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
> +0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
> +0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
>  int r;
>
>  if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, &features)) {
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 710c5efe96..d36664f33a 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -102,6 +102,8 @@ static const uint64_t vdpa_svq_device_features =
>  BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
>  BIT_ULL(VIRTIO_NET_F_STANDBY);
>
> +#define VHOST_VDPA_NET_CVQ_ASID 1
> +
>  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
>  {
>  VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
> @@ -243,6 +245,40 @@ static NetClientInfo net_vhost_vdpa_info = {
>  .check_peer_type = vhost_vdpa_check_peer_type,
>  };
>
> +static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned vq_index)
> +{
> +struct vhost_vring_state state = {
> +.index = vq_index,
> +};
> +int r = ioctl(device_fd, VHOST_VDPA_GET_VRING_GROUP, &state);
> +
> +if (unlikely(r < 0)) {
> +error_report("Cannot get VQ %u group: %s", vq_index,
> + g_strerror(errno));
> +return r;
> +}
> +
> +return state.num;
> +}
> +
> +static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v,
> +   unsigned vq_group,
> +   unsigned asid_num)
> +{
> +struct vhost_vring_state asid = {
> +.index = vq_group,
> +.num = asid_num,
> +};
> +int r;
> +
> +r = ioctl(v->device_fd, VHOST_VDPA_SET_GROUP_ASID, &asid);
> +if (unlikely(r < 0)) {
> +error_report("Can't set vq group %u asid %u, errno=%d (%s)",
> + asid.index, asid.num, errno, g_strerror(errno));
> +}
> +return r;
> +}
> +
>  static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
>  {
>  VhostIOVATree *tree = v->iova_tree;
> @@ -317,11 +353,75 @@ dma_map_err:
>  static int vhost_vdpa_net_cvq_start(NetClientState *nc)
>  {
>  VhostVDPAState *s;
> -int r;
> +struct vhost_vdpa *v;
> +uint64_t backend_features;
> +int64_t cvq_group;
> +int cvq_index, r;
>
>  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
>  s = DO_UPCAST(VhostVDPAState, nc, nc);
> +v = &s->vhost_vdpa;
> +
> +v->shadow_data = s->always_svq;
> +v->shadow_vqs_enabled = s->always_svq;
> +s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID;
> +
> +if (s->always_svq) {
> +/* SVQ is already configured for all virtqueues */
> +goto out;
> +}
> +
> +/*
> + * If we early return in these cases SVQ will not be enabled. The 
> migration
> + * will

Re: [PATCH v9 06/12] vdpa: request iova_range only once

On Thu, Dec 15, 2022 at 7:32 PM Eugenio Pérez  wrote:
>
> Currently iova range is requested once per queue pair in the case of
> net. Reduce the number of ioctls asking it once at initialization and
> reusing that value for each vhost_vdpa.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  hw/virtio/vhost-vdpa.c | 15 ---
>  net/vhost-vdpa.c   | 27 ++-
>  2 files changed, 14 insertions(+), 28 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 691bcc811a..9b7f4ef083 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -365,19 +365,6 @@ static int vhost_vdpa_add_status(struct vhost_dev *dev, 
> uint8_t status)
>  return 0;
>  }
>
> -static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> -{
> -int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
> -  &v->iova_range);
> -if (ret != 0) {
> -v->iova_range.first = 0;
> -v->iova_range.last = UINT64_MAX;
> -}
> -
> -trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> -v->iova_range.last);
> -}
> -
>  /*
>   * The use of this function is for requests that only need to be
>   * applied once. Typically such request occurs at the beginning
> @@ -465,8 +452,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void 
> *opaque, Error **errp)
>  goto err;
>  }
>
> -vhost_vdpa_get_iova_range(v);
> -
>  if (!vhost_vdpa_first_dev(dev)) {
>  return 0;
>  }
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 2c0ff6d7b0..b6462f0192 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -541,14 +541,15 @@ static const VhostShadowVirtqueueOps 
> vhost_vdpa_net_svq_ops = {
>  };
>
>  static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -   const char *device,
> -   const char *name,
> -   int vdpa_device_fd,
> -   int queue_pair_index,
> -   int nvqs,
> -   bool is_datapath,
> -   bool svq,
> -   VhostIOVATree *iova_tree)
> +   const char *device,
> +   const char *name,
> +   int vdpa_device_fd,
> +   int queue_pair_index,
> +   int nvqs,
> +   bool is_datapath,
> +   bool svq,
> +   struct vhost_vdpa_iova_range 
> iova_range,
> +   VhostIOVATree *iova_tree)

Nit: it's better not mix style changes.

Other than this:

Acked-by: Jason Wang 

Thanks

>  {
>  NetClientState *nc = NULL;
>  VhostVDPAState *s;
> @@ -567,6 +568,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
> *peer,
>  s->vhost_vdpa.device_fd = vdpa_device_fd;
>  s->vhost_vdpa.index = queue_pair_index;
>  s->vhost_vdpa.shadow_vqs_enabled = svq;
> +s->vhost_vdpa.iova_range = iova_range;
>  s->vhost_vdpa.iova_tree = iova_tree;
>  if (!is_datapath) {
>  s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
> @@ -646,6 +648,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
> *name,
>  int vdpa_device_fd;
>  g_autofree NetClientState **ncs = NULL;
>  g_autoptr(VhostIOVATree) iova_tree = NULL;
> +struct vhost_vdpa_iova_range iova_range;
>  NetClientState *nc;
>  int queue_pairs, r, i = 0, has_cvq = 0;
>
> @@ -689,14 +692,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> char *name,
>  return queue_pairs;
>  }
>
> +vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
>  if (opts->x_svq) {
> -struct vhost_vdpa_iova_range iova_range;
> -
>  if (!vhost_vdpa_net_valid_svq_features(features, errp)) {
>  goto err_svq;
>  }
>
> -vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
>  iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
>  }
>
> @@ -705,7 +706,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
> *name,
>  for (i = 0; i < queue_pairs; i++) {
>  ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>   vdpa_device_fd, i, 2, true, opts->x_svq,
> - iova_tree);
> + iova_range, iova_tree);
>  if (!ncs[i])
>  goto err;
>  }
> @@ -713,7 +714,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
> *name,
>  if (has_cvq) {
>  nc = net_vhost

[PATCH] virtio-mem: Fix the bitmap index of the section offset

2022-12-15 Thread Chenyi Qiang

vmem->bitmap indexes the memory region of the virtio-mem backend at a
granularity of block_size. To calculate the index of target section offset,
the block_size should be divided instead of the bitmap_size.

Fixes: 2044969f0b ("virtio-mem: Implement RamDiscardManager interface")
Signed-off-by: Chenyi Qiang 
---
 hw/virtio/virtio-mem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ed170def48..e19ee817fe 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -235,7 +235,7 @@ static int virtio_mem_for_each_plugged_section(const 
VirtIOMEM *vmem,
 uint64_t offset, size;
 int ret = 0;
 
-first_bit = s->offset_within_region / vmem->bitmap_size;
+first_bit = s->offset_within_region / vmem->block_size;
 first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
 while (first_bit < vmem->bitmap_size) {
 MemoryRegionSection tmp = *s;
@@ -267,7 +267,7 @@ static int virtio_mem_for_each_unplugged_section(const 
VirtIOMEM *vmem,
 uint64_t offset, size;
 int ret = 0;
 
-first_bit = s->offset_within_region / vmem->bitmap_size;
+first_bit = s->offset_within_region / vmem->block_size;
 first_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, first_bit);
 while (first_bit < vmem->bitmap_size) {
 MemoryRegionSection tmp = *s;
-- 
2.17.1

Re: qemu no sound

2022-12-15 Thread Helge Konetzka


Hello Andreas,

Am 15.12.22 um 08:36 schrieb andschl...@freenet.de:


Hello dear Qemu community,

I installed qemu under Windows 11 home and downloaded the following 
file. kali-linux-2022.4-qemu-amd64.qcow2 and started it with the 
following command. unfortunately without sound what kind of command do I 
have to add for sound support.


  C:\Users\andsc\Desktop\qemu\qemu-system-x86_64.exe -accel whpx -smp 4 
-hda net nic,model=virtio -net user --vga qxl -boot strict=on -usbdevice 
tablet Thank you for your support




I'm focussing on your sound problem, so I did not test with kali image. 
This command works for me on Windows 10 22H2 using a 
Msys2/Mingw64-Bash-Shell with Qemu 7.1.94:


qemu-system-x86_64 \
 -M q35 \
 -accel whpx,kernel-irqchip=off \
 -m 1536 \
 -audiodev id=audio0,driver=dsound \
 -device ich9-intel-hda -device hda-duplex,audiodev=audio0 \
 -cdrom openSUSE-Leap-15.3-GNOME-Live-x86_64-Media.iso

"-audiodev id=audio0,driver=dsound" defines, how the host provides the sound
"-device ich9-intel-hda -device hda-duplex,audiodev=audio0" refers to 
the sound provider and creates sound devices for the guest.


Maybe the whole command can be a starting point, too.

I did not use "-vga qxl" here, because this implies spice usage which 
adds more complexity because the spice client needs to communicate with 
qemu.

"-M q35" defines a more current computer than leaving it out

Regards,
Helge.

Re: [PATCH v1 22/24] vfio-user: add 'x-msg-timeout' option that specifies msg wait times

2022-12-15 Thread John Johnson



> On Dec 15, 2022, at 4:56 AM, Cédric Le Goater  wrote:
> 
> On 11/9/22 00:13, John Johnson wrote:
>> 
>> +DEFINE_PROP_UINT32("x-msg-timeout", VFIOUserPCIDevice, wait_time, 0),
> 
> I see that patch 9 introduced :
> 
> +static int wait_time = 5000;   /* wait up to 5 sec for busy servers */
> 
> May be use a define instead and assign  "x-msg-timeout" to this default
> value.
> 
> how do you plan to use the "x-msg-timeout" property ?
> 

It was originally used to empirically discover a value that wouldn't
timeout with the device servers we’re using.  I kept in case new devices with
longer response times are encountered.

JJ

[PATCH V3] vhost: fix vq dirty bitmap syncing when vIOMMU is enabled

When vIOMMU is enabled, the vq->used_phys is actually the IOVA not
GPA. So we need to translate it to GPA before the syncing otherwise we
may hit the following crash since IOVA could be out of the scope of
the GPA log size. This could be noted when using virtio-IOMMU with
vhost using 1G memory.

Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support")
Cc: qemu-sta...@nongnu.org
Tested-by: Lei Yang 
Reported-by: Yalan Zhang 
Signed-off-by: Jason Wang 
---
Changes since V2:
- use "used_iova" instead of "used_phys" in log
- store the offset in a local variable
- add comment to explain the one adding outsize MIN()
- silent checkpatch
Changes since V1:
- Fix the address calculation when used ring is not page aligned
- Fix the length for each round of dirty bitmap syncing
- Use LOG_GUEST_ERROR to log wrong used adddress
- Various other tweaks
---
 hw/virtio/vhost.c | 84 ---
 1 file changed, 64 insertions(+), 20 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 7fb008bc9e..fdcd1a8fdf 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -20,6 +20,7 @@
 #include "qemu/range.h"
 #include "qemu/error-report.h"
 #include "qemu/memfd.h"
+#include "qemu/log.h"
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
@@ -106,6 +107,24 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
 }
 }
 
+static bool vhost_dev_has_iommu(struct vhost_dev *dev)
+{
+VirtIODevice *vdev = dev->vdev;
+
+/*
+ * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
+ * incremental memory mapping API via IOTLB API. For platform that
+ * does not have IOMMU, there's no need to enable this feature
+ * which may cause unnecessary IOTLB miss/update transactions.
+ */
+if (vdev) {
+return virtio_bus_device_iommu_enabled(vdev) &&
+virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+} else {
+return false;
+}
+}
+
 static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
MemoryRegionSection *section,
hwaddr first,
@@ -137,8 +156,51 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev *dev,
 continue;
 }
 
-vhost_dev_sync_region(dev, section, start_addr, end_addr, 
vq->used_phys,
-  range_get_last(vq->used_phys, vq->used_size));
+if (vhost_dev_has_iommu(dev)) {
+IOMMUTLBEntry iotlb;
+hwaddr used_phys = vq->used_phys, used_size = vq->used_size;
+hwaddr phys, s, offset;
+
+while (used_size) {
+rcu_read_lock();
+iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as,
+  used_phys,
+  true,
+  MEMTXATTRS_UNSPECIFIED);
+rcu_read_unlock();
+
+if (!iotlb.target_as) {
+qemu_log_mask(LOG_GUEST_ERROR, "translation "
+  "failure for used_iova %"PRIx64"\n",
+  used_phys);
+return -EINVAL;
+}
+
+offset = used_phys & iotlb.addr_mask;
+phys = iotlb.translated_addr + offset;
+
+/*
+ * Distance from start of used ring until last byte of
+ * IOMMU page.
+ */
+s = iotlb.addr_mask - offset;
+/*
+ * Size of used ring, or of the part of it until end
+ * of IOMMU page. To avoid zero result, do the adding
+ * outside of MIN().
+ */
+s = MIN(s, used_size - 1) + 1;
+
+vhost_dev_sync_region(dev, section, start_addr, end_addr, phys,
+  range_get_last(phys, s));
+used_size -= s;
+used_phys += s;
+}
+} else {
+vhost_dev_sync_region(dev, section, start_addr,
+  end_addr, vq->used_phys,
+  range_get_last(vq->used_phys, 
vq->used_size));
+}
 }
 return 0;
 }
@@ -306,24 +368,6 @@ static inline void vhost_dev_log_resize(struct vhost_dev 
*dev, uint64_t size)
 dev->log_size = size;
 }
 
-static bool vhost_dev_has_iommu(struct vhost_dev *dev)
-{
-VirtIODevice *vdev = dev->vdev;
-
-/*
- * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support
- * incremental memory mapping API via IOTLB API. For platform that
- * does not have IOMMU, there's no need to enable this feature
- * which may cause unnecessary IOTLB miss/update transactions.
- */
-if (vdev) {
-return virtio_bus_device_iommu_enabled(vdev) &&

Re: [PATCH v11 5/5] docs: Add generic vhost-vdpa device documentation

On Thu, Dec 15, 2022 at 9:50 PM Longpeng(Mike)  wrote:
>
> From: Longpeng 
>
> Signed-off-by: Longpeng 
> ---
>  .../devices/vhost-vdpa-generic-device.rst | 68 +++
>  1 file changed, 68 insertions(+)
>  create mode 100644 docs/system/devices/vhost-vdpa-generic-device.rst
>
> diff --git a/docs/system/devices/vhost-vdpa-generic-device.rst 
> b/docs/system/devices/vhost-vdpa-generic-device.rst
> new file mode 100644
> index 00..24c825ef1a
> --- /dev/null
> +++ b/docs/system/devices/vhost-vdpa-generic-device.rst
> @@ -0,0 +1,68 @@
> +
> +=
> +vhost-vDPA generic device
> +=
> +
> +This document explains the usage of the vhost-vDPA generic device.
> +
> +
> +Description
> +---
> +
> +vDPA(virtio data path acceleration) device is a device that uses a datapath
> +which complies with the virtio specifications with vendor specific control
> +path.
> +
> +QEMU provides two types of vhost-vDPA devices to enable the vDPA device, one
> +is type sensitive which means QEMU needs to know the actual device type
> +(e.g. net, blk, scsi) and another is called "vhost-vDPA generic device" which
> +is type insensitive.
> +
> +The vhost-vDPA generic device builds on the vhost-vdpa subsystem and virtio
> +subsystem. It is quite small, but it can support any type of virtio device.
> +
> +
> +Requirements
> +
> +Linux 5.18+
> +iproute2/vdpa 5.12.0+
> +
> +
> +Examples
> +
> +
> +1. Prepare the vhost-vDPA backends, here is an example using vdpa_sim_blk
> +   device:
> +
> +::
> +  host# modprobe vhost_vdpa
> +  host# modprobe vdpa_sim_blk

Nit: it's probably better to add driver binding steps here.

> +  host# vdpa dev add mgmtdev vdpasim_blk name blk0
> +  (...you can see the vhost-vDPA device under /dev directory now...)

And then the vhost char dev name could be fetch via

ls /sys/bus/vdpa/device/blk0/vhost-vdpa*

With the above changes.

Acked-by: Jason Wang 

Thanks

> +  host# ls -l /dev/vhost-vdpa-*
> +  crw--- 1 root root 236, 0 Nov  2 00:49 /dev/vhost-vdpa-0
> +
> +Note:
> +It needs some vendor-specific steps to provision the vDPA device if you're
> +using real HW devices, such as loading the vendor-specific vDPA driver and
> +binding the device to the driver.
> +
> +
> +2. Start the virtual machine:
> +
> +Start QEMU with virtio-mmio bus:
> +
> +::
> +  host# qemu-system  \
> +  -M microvm -m 512 -smp 2 -kernel ... -initrd ...   \
> +  -device vhost-vdpa-device,vhostdev=/dev/vhost-vdpa-0   \
> +  ...
> +
> +
> +Start QEMU with virtio-pci bus:
> +
> +::
> +  host# qemu-system  \
> +  -M pc -m 512 -smp 2\
> +  -device vhost-vdpa-device-pci,vhostdev=/dev/vhost-vdpa-0   \
> +  ...
> --
> 2.23.0
>

[PING PATCH 0/1] Fix some typos

2022-12-15 Thread Dongdong Zhang

Hi all,

I would like to ping a patch

https://lists.nongnu.org/archive/html/qemu-devel/2022-11/msg04568.html
https://lists.nongnu.org/archive/html/qemu-devel/2022-11/msg04570.html


> -Original Messages-From:"Dongdong Zhang" 
> Sent Time:2022-11-30 09:53:57 
> (Wednesday)To:qemu-devel@nongnu.orgCc:js...@redhat.com, cr...@redhat.com, 
> bl...@redhat.com, "Dongdong Zhang" 
> Subject:[PATCH 0/1]  Fix some typos
> 
> This patch mainly fixes some typos in the 'python' directory.
> 
> Dongdong Zhang (1):
>   Fix some typos
> 
>  python/qemu/machine/console_socket.py | 2 +-
>  python/qemu/machine/qtest.py  | 2 +-
>  python/qemu/qmp/protocol.py   | 2 +-
>  python/qemu/qmp/qmp_tui.py| 6 +++---
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
> -- 
> 2.17.1

PING: [for-8.0 v2 00/11] Refactor cryptodev

2022-12-15 Thread zhenwei pi


Hi, Lei

Could you please review this series?

On 11/22/22 22:07, zhenwei pi wrote:

v1 -> v2:
- fix coding style and use 'g_strjoin()' instead of 'char services[128]'
   (suggested by Dr. David Alan Gilbert)
- wrapper function 'cryptodev_backend_account' to record statistics, and
   allocate sym_stat/asym_stat in cryptodev base class. see patch:
   'cryptodev: Support statistics'.
- add more arguments into struct CryptoDevBackendOpInfo, then
   cryptodev_backend_crypto_operation() uses *op_info only.
- support cryptodev QoS settings(BPS&OPS), both QEMU command line and QMP
   command works fine.
- add myself as the maintainer for cryptodev.

v1:
- introduce cryptodev.json to describe the attributes of crypto device, then
   drop duplicated type declare, remove some virtio related dependence.
- add statistics: OPS and bandwidth.
- add QMP command: query-cryptodev
- add HMP info command: cryptodev
- misc fix: detect akcipher capability instead of exposing akcipher service
   unconditionally.

Zhenwei Pi (11):
   cryptodev: Introduce cryptodev.json
   cryptodev: Remove 'name' & 'model' fields
   cryptodev: Introduce cryptodev alg type in QAPI
   cryptodev: Introduce server type in QAPI
   cryptodev: Introduce 'query-cryptodev' QMP command
   cryptodev: Support statistics
   cryptodev-builtin: Detect akcipher capability
   hmp: add cryptodev info command
   cryptodev: Use CryptoDevBackendOpInfo for operation
   cryptodev: support QoS
   MAINTAINERS: add myself as the maintainer for cryptodev

  MAINTAINERS |   2 +
  backends/cryptodev-builtin.c|  42 +++--
  backends/cryptodev-lkcf.c   |  19 +-
  backends/cryptodev-vhost-user.c |  13 +-
  backends/cryptodev-vhost.c  |   4 +-
  backends/cryptodev.c| 295 +---
  hmp-commands-info.hx|  14 ++
  hw/virtio/virtio-crypto.c   |  48 --
  include/monitor/hmp.h   |   1 +
  include/sysemu/cryptodev.h  |  94 +-
  monitor/hmp-cmds.c  |  36 
  qapi/cryptodev.json | 144 
  qapi/meson.build|   1 +
  qapi/qapi-schema.json   |   1 +
  qapi/qom.json   |   8 +-
  15 files changed, 604 insertions(+), 118 deletions(-)
  create mode 100644 qapi/cryptodev.json



--
zhenwei pi

Re: [PATCH] target/riscv/cpu.c: Fix elen check

2022-12-15 Thread Frank Chang

Reviewed-by: Frank Chang 

On Thu, Dec 15, 2022 at 11:09 PM Elta <503386...@qq.com> wrote:

> Should be cpu->cfg.elen in range [8, 64].
>
> Signed-off-by: Dongxue Zhang 
> ---
>  target/riscv/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index d14e95c9dc..1e8032c969 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -870,7 +870,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error
> **errp)
>  "Vector extension ELEN must be power of 2");
>  return;
>  }
> -if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
> +if (cpu->cfg.elen > 64 || cpu->cfg.elen < 8) {
>  error_setg(errp,
>  "Vector extension implementation only supports
> ELEN "
>  "in the range [8, 64]");
> --
> 2.17.1
>
>

Re: [PATCH v3 2/3] target/riscv: Extend isa_ext_data for single letter extensions

2022-12-15 Thread Alistair Francis

On Fri, Dec 9, 2022 at 12:58 AM Mayuresh Chitale
 wrote:
>
> Currently the ISA string for a CPU is generated from two different
> arrays, one for single letter extensions and another for multi letter
> extensions. Add all the single letter extensions to the isa_ext_data
> array and use it for generating the ISA string. Also drop 'P' and 'Q'
> extensions from the list of single letter extensions as those are not
> supported yet.
>
> Signed-off-by: Mayuresh Chitale 
> Reviewed-by: Andrew Jones 
> Reviewed-by: Alistair Francis 
> Reviewed-by: Bin Meng 

This breaks the SiFive CPUs (as well as others). A large number of
CPUs set these single letter extensions just with set_misa(), so the
cfg values are never actually set.

We probably want to add something like this (does not compile):

@@ -222,6 +225,10 @@ static void set_misa(CPURISCVState *env, RISCVMXL
mxl, uint32_t ext)
{
env->misa_mxl_max = env->misa_mxl = mxl;
env->misa_ext_mask = env->misa_ext = ext;
+
+if (ext & RVI == RVI) {
+cpu->cfg.ext_i = true;
+}
}

Alistair

> ---
>  target/riscv/cpu.c | 41 +++--
>  1 file changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 042fd541b4..8c8f085a80 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -41,8 +41,6 @@
>   (QEMU_VERSION_MICRO))
>  #define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID
>
> -static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
> -
>  struct isa_ext_data {
>  const char *name;
>  bool multi_letter;
> @@ -71,6 +69,13 @@ struct isa_ext_data {
>   *extensions by an underscore.
>   */
>  static const struct isa_ext_data isa_edata_arr[] = {
> +ISA_EXT_DATA_ENTRY(i, false, PRIV_VERSION_1_10_0, ext_i),
> +ISA_EXT_DATA_ENTRY(e, false, PRIV_VERSION_1_10_0, ext_e),
> +ISA_EXT_DATA_ENTRY(m, false, PRIV_VERSION_1_10_0, ext_m),
> +ISA_EXT_DATA_ENTRY(a, false, PRIV_VERSION_1_10_0, ext_a),
> +ISA_EXT_DATA_ENTRY(f, false, PRIV_VERSION_1_10_0, ext_f),
> +ISA_EXT_DATA_ENTRY(d, false, PRIV_VERSION_1_10_0, ext_d),
> +ISA_EXT_DATA_ENTRY(c, false, PRIV_VERSION_1_10_0, ext_c),
>  ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
>  ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
>  ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
> @@ -1196,16 +1201,23 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>
> -static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int 
> max_str_len)
> +static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str)
>  {
>  char *old = *isa_str;
>  char *new = *isa_str;
>  int i;
>
>  for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> -if (isa_edata_arr[i].multi_letter &&
> -isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
> -new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
> +if (isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
> +if (isa_edata_arr[i].multi_letter) {
> +if (cpu->cfg.short_isa_string) {
> +continue;
> +}
> +new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
> +} else {
> +new = g_strconcat(old, isa_edata_arr[i].name, NULL);
> +}
> +
>  g_free(old);
>  old = new;
>  }
> @@ -1216,19 +1228,12 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
> **isa_str, int max_str_len)
>
>  char *riscv_isa_string(RISCVCPU *cpu)
>  {
> -int i;
> -const size_t maxlen = sizeof("rv128") + sizeof(riscv_single_letter_exts);
> +const size_t maxlen = sizeof("rv128");
>  char *isa_str = g_new(char, maxlen);
> -char *p = isa_str + snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
> -for (i = 0; i < sizeof(riscv_single_letter_exts) - 1; i++) {
> -if (cpu->env.misa_ext & RV(riscv_single_letter_exts[i])) {
> -*p++ = qemu_tolower(riscv_single_letter_exts[i]);
> -}
> -}
> -*p = '\0';
> -if (!cpu->cfg.short_isa_string) {
> -riscv_isa_string_ext(cpu, &isa_str, maxlen);
> -}
> +
> +snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
> +riscv_isa_string_ext(cpu, &isa_str);
> +
>  return isa_str;
>  }
>
> --
> 2.34.1
>
>

Re: [PATCH] target/riscv/cpu.c: Fix elen check

2022-12-15 Thread LIU Zhiwei




On 2022/12/15 16:46, Elta wrote:

Should be cpu->cfg.elen in range [8, 64].

Signed-off-by: Dongxue Zhang 
---
 target/riscv/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d14e95c9dc..1e8032c969 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -870,7 +870,7 @@ static void riscv_cpu_realize(DeviceState *dev, 
Error **errp)

                         "Vector extension ELEN must be power of 2");
                 return;
             }
-            if (cpu->cfg.elen > 64 || cpu->cfg.vlen < 8) {
+            if (cpu->cfg.elen > 64 || cpu->cfg.elen < 8) {


Oops. You are right.

Reviewed-by: LIU Zhiwei 

Zhiwei


                 error_setg(errp,
                         "Vector extension implementation only 
supports ELEN "

                         "in the range [8, 64]");
--
2.17.1

[RFC PATCH v3 04/38] i386/kvm: Add xen-version machine property and init KVM Xen support

From: David Woodhouse 

This just initializes the basic Xen support in KVM for now.

Signed-off-by: David Woodhouse 
---
 accel/kvm/kvm-all.c |  1 +
 include/sysemu/kvm_int.h|  1 +
 target/i386/kvm/kvm.c   | 53 +
 target/i386/kvm/meson.build |  2 ++
 target/i386/kvm/xen-emu.c   | 50 ++
 target/i386/kvm/xen-emu.h   | 19 +
 6 files changed, 126 insertions(+)
 create mode 100644 target/i386/kvm/xen-emu.c
 create mode 100644 target/i386/kvm/xen-emu.h

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f99b0becd8..568bb09c09 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3620,6 +3620,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->kvm_dirty_ring_size = 0;
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
+s->xen_version = 0;
 }
 
 /**
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 3b4adcdc10..429cecbd04 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -110,6 +110,7 @@ struct KVMState
 struct KVMDirtyRingReaper reaper;
 NotifyVmexitOption notify_vmexit;
 uint32_t notify_window;
+uint32_t xen_version;
 };
 
 void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a213209379..a98995d4d7 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -31,6 +31,7 @@
 #include "sysemu/runstate.h"
 #include "kvm_i386.h"
 #include "sev.h"
+#include "xen-emu.h"
 #include "hyperv.h"
 #include "hyperv-proto.h"
 
@@ -48,6 +49,8 @@
 #include "hw/i386/x86-iommu.h"
 #include "hw/i386/e820_memory_layout.h"
 
+#include "hw/xen/xen.h"
+
 #include "hw/pci/pci.h"
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
@@ -2513,6 +2516,18 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 }
 }
 
+if (s->xen_version) {
+#ifdef CONFIG_XEN_EMU
+ret = kvm_xen_init(s);
+if (ret < 0) {
+return ret;
+}
+#else
+error_report("kvm: Xen support not enabled in qemu");
+return -ENOTSUP;
+#endif
+}
+
 ret = kvm_get_supported_msrs(s);
 if (ret < 0) {
 return ret;
@@ -5706,6 +5721,35 @@ static void kvm_arch_set_notify_window(Object *obj, 
Visitor *v,
 s->notify_window = value;
 }
 
+static void kvm_arch_get_xen_version(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+uint32_t value = s->xen_version;
+
+visit_type_uint32(v, name, &value, errp);
+}
+
+static void kvm_arch_set_xen_version(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+uint32_t value;
+
+visit_type_uint32(v, name, &value, &error);
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->xen_version = value;
+if (value && xen_mode == XEN_DISABLED)
+xen_mode = XEN_EMULATE;
+}
+
 void kvm_arch_accel_class_init(ObjectClass *oc)
 {
 object_class_property_add_enum(oc, "notify-vmexit", "NotifyVMexitOption",
@@ -5722,6 +5766,15 @@ void kvm_arch_accel_class_init(ObjectClass *oc)
 object_class_property_set_description(oc, "notify-window",
   "Clock cycles without an event 
window "
   "after which a notification VM exit 
occurs");
+
+object_class_property_add(oc, "xen-version", "uint32",
+  kvm_arch_get_xen_version,
+  kvm_arch_set_xen_version,
+  NULL, NULL);
+object_class_property_set_description(oc, "xen-version",
+  "Xen version to be emulated "
+  "(in XENVER_version form "
+  "e.g. 0x4000a for 4.10)");
 }
 
 void kvm_set_max_apic_id(uint32_t max_apic_id)
diff --git a/target/i386/kvm/meson.build b/target/i386/kvm/meson.build
index 736df8b72e..322272091b 100644
--- a/target/i386/kvm/meson.build
+++ b/target/i386/kvm/meson.build
@@ -7,6 +7,8 @@ i386_softmmu_kvm_ss.add(files(
   'kvm-cpu.c',
 ))
 
+i386_softmmu_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen-emu.c'))
+
 i386_softmmu_kvm_ss.add(when: 'CONFIG_SEV', if_false: files('sev-stub.c'))
 
 i386_softmmu_ss.add(when: 'CONFIG_HYPERV', if_true: files('hyperv.c'), 
if_false: files('hyperv-stub.c'))
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
new file mode 100644
index 00..4bd2eeeb5a
--- /dev/null
+++ b/target/i386/kvm/xen-emu.c
@@ -0,0 +1,50 @@
+/*
+ * Xen HVM emulation support in KVM
+ *
+ * Copyright © 2019 Oracle and

[RFC PATCH v3 32/38] hw/xen: Implement EVTCHNOP_bind_interdomain

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 77 +++
 hw/i386/kvm/xen_evtchn.h  |  2 +
 target/i386/kvm/xen-emu.c | 15 
 3 files changed, 94 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 4272b63853..b286bbd20e 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -618,6 +618,23 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 deassign_kernel_port(port);
 break;
 
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+/* Not yet implemented. This can't happen! */
+} else {
+/* Loopback interdomain */
+XenEvtchnPort *rp = &s->port_table[p->type_val];
+if (!valid_port(p->type_val) || rp->type_val != port ||
+rp->type != EVTCHNSTAT_interdomain) {
+error_report("Inconsistent state for interdomain unbind");
+} else {
+/* Set the other end back to unbound */
+rp->type = EVTCHNSTAT_unbound;
+rp->type_val = 0;
+}
+}
+break;
+
 default:
 break;
 }
@@ -732,6 +749,66 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 return ret;
 }
 
+int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint16_t type_val;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (interdomain->remote_dom == DOMID_QEMU) {
+type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+} else if (interdomain->remote_dom == DOMID_SELF ||
+   interdomain->remote_dom == xen_domid) {
+type_val = 0;
+} else {
+return -ESRCH;
+}
+
+if (!valid_port(interdomain->remote_port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+/* The newly allocated port starts out as unbound */
+ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val, 
&interdomain->local_port);
+if (ret) {
+goto out;
+}
+
+if (interdomain->remote_dom == DOMID_QEMU) {
+/* We haven't hooked up QEMU's PV drivers to this yet */
+ret = -ENOSYS;
+} else {
+/* Loopback */
+XenEvtchnPort *rp = &s->port_table[interdomain->remote_port];
+XenEvtchnPort *lp = &s->port_table[interdomain->local_port];
+
+if (rp->type == EVTCHNSTAT_unbound && rp->type_val == 0) {
+/* It's a match! */
+rp->type = EVTCHNSTAT_interdomain;
+rp->type_val = interdomain->local_port;
+
+lp->type = EVTCHNSTAT_interdomain;
+lp->type_val = interdomain->remote_port;
+} else {
+ret = -EINVAL;
+}
+}
+
+if (ret) {
+free_port(s, interdomain->local_port);
+}
+ out:
+qemu_mutex_unlock(&s->port_lock);
+
+return ret;
+
+}
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 5dc68a188d..4783a6f127 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -20,6 +20,7 @@ struct evtchn_bind_virq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
+struct evtchn_bind_interdomain;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -27,3 +28,4 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
+int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 6f393c4149..c4c595cb1a 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -827,6 +827,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_interdomain: {
+struct evtchn_bind_interdomain interdomain;
+
+qemu_build_assert(sizeof(interdomain) == 12);
+if (kvm_copy_from_gva(cs, arg, &interdomain, sizeof(interdomain))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_interdomain_op(&interdomain);
+if (!err && kvm_copy_to_gva(cs, arg, &interdomain, 
sizeof(interdomain))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 10/38] i386/xen: implement HYPERCALL_xen_version

From: Joao Martins 

This is just meant to serve as an example on how we can implement
hypercalls. xen_version specifically since Qemu does all kind of
feature controllability. So handling that here seems appropriate.

Signed-off-by: Joao Martins 
[dwmw2: Implement kvm_gva_rw() safely]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 86 +++
 1 file changed, 86 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 668713d5af..9026fd3eb6 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -13,9 +13,55 @@
 #include "qemu/log.h"
 #include "sysemu/kvm_int.h"
 #include "kvm/kvm_i386.h"
+#include "exec/address-spaces.h"
 #include "xen-emu.h"
+#include "xen.h"
 #include "trace.h"
 
+#include "standard-headers/xen/version.h"
+
+static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
+  bool is_write)
+{
+uint8_t *buf = (uint8_t *)_buf;
+int ret;
+
+while (sz) {
+struct kvm_translation tr = {
+.linear_address = gva,
+};
+
+size_t len = TARGET_PAGE_SIZE - (tr.linear_address & 
~TARGET_PAGE_MASK);
+if (len > sz)
+len = sz;
+
+ret = kvm_vcpu_ioctl(cs, KVM_TRANSLATE, &tr);
+if (ret || !tr.valid || (is_write && !tr.writeable)) {
+return -EFAULT;
+}
+
+cpu_physical_memory_rw(tr.physical_address, buf, len, is_write);
+
+buf += len;
+sz -= len;
+gva += len;
+}
+
+return 0;
+}
+
+static inline int kvm_copy_from_gva(CPUState *cs, uint64_t gva, void *buf,
+size_t sz)
+{
+return kvm_gva_rw(cs, gva, buf, sz, false);
+}
+
+static inline int kvm_copy_to_gva(CPUState *cs, uint64_t gva, void *buf,
+  size_t sz)
+{
+return kvm_gva_rw(cs, gva, buf, sz, true);
+}
+
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 {
 const int required_caps = KVM_XEN_HVM_CONFIG_HYPERCALL_MSR |
@@ -51,6 +97,43 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 return 0;
 }
 
+static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+int err = 0;
+
+switch (cmd) {
+case XENVER_get_features: {
+struct xen_feature_info fi;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(fi) == 8);
+
+err = kvm_copy_from_gva(CPU(cpu), arg, &fi, sizeof(fi));
+if (err) {
+break;
+}
+
+fi.submap = 0;
+if (fi.submap_idx == 0) {
+fi.submap |= 1 << XENFEAT_writable_page_tables |
+ 1 << XENFEAT_writable_descriptor_tables |
+ 1 << XENFEAT_auto_translated_physmap |
+ 1 << XENFEAT_supervisor_mode_kernel;
+}
+
+err = kvm_copy_to_gva(CPU(cpu), arg, &fi, sizeof(fi));
+break;
+}
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -61,6 +144,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_xen_version:
+return kvm_xen_hcall_xen_version(exit, cpu, exit->u.hcall.params[0],
+ exit->u.hcall.params[1]);
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 09/38] i386/xen: handle guest hypercalls

From: Joao Martins 

This means handling the new exit reason for Xen but still
crashing on purpose. As we implement each of the hypercalls
we will then return the right return code.

Signed-off-by: Joao Martins 
[dwmw2: Add CPL to hypercall tracing, disallow hypercalls from CPL > 0]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/kvm.c|  5 +
 target/i386/kvm/trace-events |  3 +++
 target/i386/kvm/xen-emu.c| 39 
 target/i386/kvm/xen-emu.h|  1 +
 4 files changed, 48 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 5977edb1ca..c37e44d88f 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -5466,6 +5466,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run 
*run)
 assert(run->msr.reason == KVM_MSR_EXIT_REASON_FILTER);
 ret = kvm_handle_wrmsr(cpu, run);
 break;
+#ifdef CONFIG_XEN_EMU
+case KVM_EXIT_XEN:
+ret = kvm_xen_handle_exit(cpu, &run->xen);
+break;
+#endif
 default:
 fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason);
 ret = -1;
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 7c369db1e1..cd6f842b1f 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -5,3 +5,6 @@ kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap 
interrupt for GSI %"
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
 kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
+
+# xen-emu.c
+kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 8433c4d70f..668713d5af 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -10,9 +10,11 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "sysemu/kvm_int.h"
 #include "kvm/kvm_i386.h"
 #include "xen-emu.h"
+#include "trace.h"
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 {
@@ -48,3 +50,40 @@ int kvm_xen_init(KVMState *s, uint32_t hypercall_msr)
 
 return 0;
 }
+
+static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
+{
+uint16_t code = exit->u.hcall.input;
+
+if (exit->u.hcall.cpl > 0) {
+exit->u.hcall.result = -EPERM;
+return true;
+}
+
+switch (code) {
+default:
+return false;
+}
+}
+
+int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
+{
+if (exit->type != KVM_EXIT_XEN_HCALL)
+return -1;
+
+if (!do_kvm_xen_handle_exit(cpu, exit)) {
+/* Some hypercalls will be deliberately "implemented" by returning
+ * -ENOSYS. This case is for hypercalls which are unexpected. */
+exit->u.hcall.result = -ENOSYS;
+qemu_log_mask(LOG_UNIMP, "Unimplemented Xen hypercall %"
+  PRId64 " (0x%" PRIx64 " 0x%" PRIx64 " 0x%" PRIx64 ")\n",
+  (uint64_t)exit->u.hcall.input, 
(uint64_t)exit->u.hcall.params[0],
+  (uint64_t)exit->u.hcall.params[1], 
(uint64_t)exit->u.hcall.params[1]);
+}
+
+trace_kvm_xen_hypercall(CPU(cpu)->cpu_index, exit->u.hcall.cpl,
+exit->u.hcall.input, exit->u.hcall.params[0],
+exit->u.hcall.params[1], exit->u.hcall.params[2],
+exit->u.hcall.result);
+return 0;
+}
diff --git a/target/i386/kvm/xen-emu.h b/target/i386/kvm/xen-emu.h
index 2101df0182..76a3de6c4d 100644
--- a/target/i386/kvm/xen-emu.h
+++ b/target/i386/kvm/xen-emu.h
@@ -24,5 +24,6 @@
 #define XEN_VERSION(maj, min) ((maj) << 16 | (min))
 
 int kvm_xen_init(KVMState *s, uint32_t hypercall_msr);
+int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit);
 
 #endif /* QEMU_I386_KVM_XEN_EMU_H */
-- 
2.35.3

[RFC PATCH v3 37/38] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

From: David Woodhouse 

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 88 +++
 hw/i386/kvm/xen_evtchn.h  |  3 ++
 hw/i386/pc.c  | 10 -
 include/sysemu/kvm_xen.h  |  2 +-
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c | 13 ++
 target/i386/kvm/xen-emu.c | 64 
 target/i386/kvm/xen-emu.h |  1 +
 8 files changed, 154 insertions(+), 28 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 9292602c09..8ea8cf550e 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -24,6 +24,8 @@
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
+#include "hw/i386/x86.h"
+#include "hw/irq.h"
 
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
@@ -102,6 +104,7 @@ struct XenEvtchnState {
 QemuMutex port_lock;
 uint32_t nr_ports;
 XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
+qemu_irq gsis[GSI_NUM_PINS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -166,9 +169,29 @@ static const TypeInfo xen_evtchn_info = {
 void xen_evtchn_create(void)
 {
 XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN, -1, 
NULL));
+int i;
+
 xen_evtchn_singleton = s;
 
 qemu_mutex_init(&s->port_lock);
+
+for (i = 0; i < GSI_NUM_PINS; i++) {
+sysbus_init_irq(SYS_BUS_DEVICE(s), &s->gsis[i]);
+}
+}
+
+void xen_evtchn_connect_gsis(qemu_irq *system_gsis)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int i;
+
+if (!s) {
+return;
+}
+
+for (i = 0; i < GSI_NUM_PINS; i++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(s), i, system_gsis[i]);
+}
 }
 
 static void xen_evtchn_register_types(void)
@@ -178,26 +201,75 @@ static void xen_evtchn_register_types(void)
 
 type_init(xen_evtchn_register_types)
 
-
 #define CALLBACK_VIA_TYPE_SHIFT   56
 
 int xen_evtchn_set_callback_param(uint64_t param)
 {
+XenEvtchnState *s = xen_evtchn_singleton;
 int ret = -ENOSYS;
 
-if (param >> CALLBACK_VIA_TYPE_SHIFT == HVM_PARAM_CALLBACK_TYPE_VECTOR) {
+if (!s) {
+return -ENOTSUP;
+}
+
+switch (param >> CALLBACK_VIA_TYPE_SHIFT) {
+case HVM_PARAM_CALLBACK_TYPE_VECTOR: {
 struct kvm_xen_hvm_attr xa = {
 .type = KVM_XEN_ATTR_TYPE_UPCALL_VECTOR,
 .u.vector = (uint8_t)param,
 };
 
 ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &xa);
-if (!ret && xen_evtchn_singleton)
-xen_evtchn_singleton->callback_param = param;
+break;
+}
+case HVM_PARAM_CALLBACK_TYPE_GSI:
+ret = 0;
+break;
 }
+
+if (!ret) {
+s->callback_param = param;
+}
+
 return ret;
 }
 
+static void xen_evtchn_set_callback_level(XenEvtchnState *s, int level)
+{
+uint32_t param = (uint32_t)s->callback_param;
+
+switch (s->callback_param >> CALLBACK_VIA_TYPE_SHIFT) {
+case HVM_PARAM_CALLBACK_TYPE_GSI:
+if (param < GSI_NUM_PINS) {
+qemu_set_irq(s->gsis[param], level);
+}
+break;
+}
+}
+
+static void inject_callback(XenEvtchnState *s, uint32_t vcpu)
+{
+if (kvm_xen_inject_vcpu_callback_vector(vcpu, s->callback_param)) {
+return;
+}
+
+/* GSI or PCI_INTX delivery is only for events on vCPU 0 */
+if (vcpu) {
+return;
+}
+
+xen_evtchn_set_callback_level(s, 1);
+}
+
+void xen_evtchn_deassert_callback(void)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+
+if (s) {
+xen_evtchn_set_callback_level(s, 0);
+}
+}
+
 static void deassign_kernel_port(evtchn_port_t port)
 {
 struct kvm_xen_hvm_attr ha;
@@

[RFC PATCH v3 05/38] i386/kvm: handle Xen HVM cpuid leaves

From: Joao Martins 

Introduce support for emulating CPUID for Xen HVM guests. It doesn't make
sense to advertise the KVM leaves to a Xen guest, so do it unconditionally
when the xen-version machine property is set.

Signed-off-by: Joao Martins 
[dwmw2: Obtain xen_version from machine property, make it automatic]
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.c |  1 +
 target/i386/cpu.h |  2 +
 target/i386/kvm/kvm.c | 78 +--
 target/i386/kvm/xen-emu.c |  4 +-
 target/i386/kvm/xen-emu.h | 13 ++-
 5 files changed, 91 insertions(+), 7 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 22b681ca37..50aa95f134 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7069,6 +7069,7 @@ static Property x86_cpu_properties[] = {
  * own cache information (see x86_cpu_load_def()).
  */
 DEFINE_PROP_BOOL("legacy-cache", X86CPU, legacy_cache, true),
+DEFINE_PROP_BOOL("xen-vapic", X86CPU, xen_vapic, false),
 
 /*
  * From "Requirements for Implementing the Microsoft
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index d4bc19577a..c6c57baed5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1964,6 +1964,8 @@ struct ArchCPU {
 int32_t thread_id;
 
 int32_t hv_max_vps;
+
+bool xen_vapic;
 };
 
 
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index a98995d4d7..5977edb1ca 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -22,6 +22,7 @@
 
 #include 
 #include "standard-headers/asm-x86/kvm_para.h"
+#include "standard-headers/xen/arch-x86/cpuid.h"
 
 #include "cpu.h"
 #include "host-cpu.h"
@@ -1750,7 +1751,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
 int max_nested_state_len;
 int r;
 Error *local_err = NULL;
-
 memset(&cpuid_data, 0, sizeof(cpuid_data));
 
 cpuid_i = 0;
@@ -1802,7 +1802,77 @@ int kvm_arch_init_vcpu(CPUState *cs)
 has_msr_hv_hypercall = true;
 }
 
-if (cpu->expose_kvm) {
+if (cs->kvm_state->xen_version) {
+#ifdef CONFIG_XEN_EMU
+struct kvm_cpuid_entry2 *xen_max_leaf;
+
+memcpy(signature, "XenVMMXenVMM", 12);
+
+xen_max_leaf = c = &cpuid_data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_SIGNATURE;
+c->eax = kvm_base + XEN_CPUID_TIME;
+c->ebx = signature[0];
+c->ecx = signature[1];
+c->edx = signature[2];
+
+c = &cpuid_data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_VENDOR;
+c->eax = cs->kvm_state->xen_version;
+c->ebx = 0;
+c->ecx = 0;
+c->edx = 0;
+
+c = &cpuid_data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_HVM_MSR;
+/* Number of hypercall-transfer pages */
+c->eax = 1;
+/* Hypercall MSR base address */
+if (hyperv_enabled(cpu)) {
+c->ebx = XEN_HYPERCALL_MSR_HYPERV;
+kvm_xen_init(cs->kvm_state, c->ebx);
+} else {
+c->ebx = XEN_HYPERCALL_MSR;
+}
+c->ecx = 0;
+c->edx = 0;
+
+c = &cpuid_data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_TIME;
+c->eax = ((!!tsc_is_stable_and_known(env) << 1) |
+(!!(env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_RDTSCP) << 2));
+/* default=0 (emulate if necessary) */
+c->ebx = 0;
+/* guest tsc frequency */
+c->ecx = env->user_tsc_khz;
+/* guest tsc incarnation (migration count) */
+c->edx = 0;
+
+c = &cpuid_data.entries[cpuid_i++];
+c->function = kvm_base + XEN_CPUID_HVM;
+xen_max_leaf->eax = kvm_base + XEN_CPUID_HVM;
+if (cs->kvm_state->xen_version >= XEN_VERSION(4,5)) {
+c->function = kvm_base + XEN_CPUID_HVM;
+
+if (cpu->xen_vapic) {
+c->eax |= XEN_HVM_CPUID_APIC_ACCESS_VIRT;
+c->eax |= XEN_HVM_CPUID_X2APIC_VIRT;
+}
+
+c->eax |= XEN_HVM_CPUID_IOMMU_MAPPINGS;
+
+if (cs->kvm_state->xen_version >= XEN_VERSION(4,6)) {
+c->eax |= XEN_HVM_CPUID_VCPU_ID_PRESENT;
+c->ebx = cs->cpu_index;
+}
+}
+
+kvm_base += 0x100;
+#else /* CONFIG_XEN_EMU */
+/* This should never happen as kvm_arch_init() would have died first. 
*/
+fprintf(stderr, "Cannot enable Xen CPUID without Xen support\n");
+abort();
+#endif
+} else if (cpu->expose_kvm) {
 memcpy(signature, "KVMKVMKVM\0\0\0", 12);
 c = &cpuid_data.entries[cpuid_i++];
 c->function = KVM_CPUID_SIGNATURE | kvm_base;
@@ -2518,7 +2588,9 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
 if (s->xen_version) {
 #ifdef CONFIG_XEN_EMU
-ret = kvm_xen_init(s);
+/* hyperv_enabled() doesn't work yet. */
+uint32_t msr = XEN_HYPERCALL_MSR;
+ret = kvm_xen_init(s, msr);
 if (ret < 0) {
 return

[RFC PATCH v3 18/38] i386/xen: handle VCPUOP_register_vcpu_info

From: Joao Martins 

Handle the hypercall to set a per vcpu info, and also wire up the default
vcpu_info in the shared_info page for the first 32 vCPUs.

To avoid deadlock within KVM a vCPU thread must set its *own* vcpu_info
rather than it being set from the context in which the hypercall is
invoked.

Add the vcpu_info (and default) GPA to the vmstate_x86_cpu for migration,
and restore it in kvm_arch_put_registers() appropriately.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h|  2 +
 target/i386/kvm/kvm.c| 19 +
 target/i386/kvm/trace-events |  1 +
 target/i386/kvm/xen-emu.c| 78 ++--
 target/i386/kvm/xen-emu.h|  1 +
 target/i386/machine.c| 21 ++
 6 files changed, 119 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index c6c57baed5..109b2e5669 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1788,6 +1788,8 @@ typedef struct CPUArchState {
 #endif
 #if defined(CONFIG_KVM)
 struct kvm_nested_state *nested_state;
+uint64_t xen_vcpu_info_gpa;
+uint64_t xen_vcpu_info_default_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index c37e44d88f..8affe1eeae 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1802,6 +1802,9 @@ int kvm_arch_init_vcpu(CPUState *cs)
 has_msr_hv_hypercall = true;
 }
 
+env->xen_vcpu_info_gpa = UINT64_MAX;
+env->xen_vcpu_info_default_gpa = UINT64_MAX;
+
 if (cs->kvm_state->xen_version) {
 #ifdef CONFIG_XEN_EMU
 struct kvm_cpuid_entry2 *xen_max_leaf;
@@ -4723,6 +4726,22 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 kvm_arch_set_tsc_khz(cpu);
 }
 
+#ifdef CONFIG_XEN_EMU
+if (level == KVM_PUT_FULL_STATE) {
+uint64_t gpa = x86_cpu->env.xen_vcpu_info_gpa;
+if (gpa == UINT64_MAX) {
+gpa = x86_cpu->env.xen_vcpu_info_default_gpa;
+}
+
+if (gpa != UINT64_MAX) {
+ret = kvm_xen_set_vcpu_attr(cpu, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO, 
gpa);
+if (ret < 0) {
+return ret;
+}
+}
+}
+#endif
+
 ret = kvm_getput_regs(x86_cpu, 1);
 if (ret < 0) {
 return ret;
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 0a47c26e80..14e54dfca5 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -9,3 +9,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
+kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type 
%d gpa 0x%" PRIx64
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 83d98cbfd9..25c48248ce 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -147,10 +147,47 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+int kvm_xen_set_vcpu_attr(CPUState *cs, uint16_t type, uint64_t gpa)
+{
+struct kvm_xen_vcpu_attr xhsi;
+
+xhsi.type = type;
+xhsi.u.gpa = gpa;
+
+trace_kvm_xen_set_vcpu_attr(cs->cpu_index, type, gpa);
+
+return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, &xhsi);
+}
+
+static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+env->xen_vcpu_info_default_gpa = data.host_ulong;
+
+/* Changing the default does nothing if a vcpu_info was explicitly set. */
+if (env->xen_vcpu_info_gpa == UINT64_MAX) {
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
+  env->xen_vcpu_info_default_gpa);
+}
+}
+
+static void do_set_vcpu_info_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+env->xen_vcpu_info_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO,
+  env->xen_vcpu_info_gpa);
+}
+
 static int xen_set_shared_info(uint64_t gfn)
 {
 uint64_t gpa = gfn << TARGET_PAGE_BITS;
-int err;
+int i, err;
 
 /* The xen_overlay device tells KVM about it too, since it had to
  * do that on migration load anyway (unless we're going to jump
@@ -162,6 +199,14 @@ static int xen_set_shared_info(uint64_t gfn)
 
 trace_kvm_xen_set_shared_info(gfn);
 
+for (i = 0; i < XEN_LEGACY_MAX_VCPUS; i++) {
+CPUState *cpu = qemu_get_cpu(i);
+if (cpu) {
+async_run_on_cpu(cpu, do_set_vcpu_info_default_gpa, 
RUN_ON_CPU_HOST_ULONG(gpa));
+}
+gpa += sizeof(vc

[RFC PATCH v3 30/38] hw/xen: Implement EVTCHNOP_send

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 162 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 target/i386/kvm/xen-emu.c |  12 +++
 3 files changed, 176 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 2e35812b32..d90a92a25a 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -441,6 +441,117 @@ static int unmask_port(XenEvtchnState *s, evtchn_port_t 
port, bool do_unmask)
 }
 }
 
+static int do_set_port_lm(XenEvtchnState *s, evtchn_port_t port,
+  struct shared_info *shinfo,
+  struct vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+/* Update the pending bit itself. If it was already set, we're done. */
+if (qatomic_fetch_or(&shinfo->evtchn_pending[idx], mask) & mask) {
+return 0;
+}
+
+/* Check if it's masked. */
+if (qatomic_fetch_or(&shinfo->evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(&vcpu_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(&vcpu_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+kvm_xen_inject_vcpu_callback_vector(s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int do_set_port_compat(XenEvtchnState *s, evtchn_port_t port,
+  struct compat_shared_info *shinfo,
+  struct compat_vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+/* Update the pending bit itself. If it was already set, we're done. */
+if (qatomic_fetch_or(&shinfo->evtchn_pending[idx], mask) & mask) {
+return 0;
+}
+
+/* Check if it's masked. */
+if (qatomic_fetch_or(&shinfo->evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(&vcpu_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(&vcpu_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+kvm_xen_inject_vcpu_callback_vector(s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int set_port_pending(XenEvtchnState *s, evtchn_port_t port)
+{
+void *vcpu_info, *shinfo;
+
+if (s->port_table[port].type == EVTCHNSTAT_closed) {
+return -EINVAL;
+}
+
+shinfo = xen_overlay_page_ptr(XENMAPSPACE_shared_info, 0);
+if (!shinfo) {
+return -ENOTSUP;
+}
+
+vcpu_info = kvm_xen_get_vcpu_info_hva(s->port_table[port].vcpu);
+if (!vcpu_info) {
+return -EINVAL;
+}
+
+if (xen_is_long_mode()) {
+return do_set_port_lm(s, port, shinfo, vcpu_info);
+} else {
+return do_set_port_compat(s, port, shinfo, vcpu_info);
+}
+}
+
 static bool virq_is_global(uint32_t virq)
 {
 switch (virq) {
@@ -620,3 +731,54 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 
 return ret;
 }
+
+int xen_evtchn_send_op(struct evtchn_send *send)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = 0;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(send->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+p = &s->port_table[send->port];
+
+switch(p->type) {
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU) {
+/* This is an event from the guest to qemu itself, which is
+ * serving as the driver domain. Not yet implemented; it will
+ * be hooked up to the qemu implementation of xenstore,
+ * console, PV net/block drivers etc. */
+ret = -ENOSYS;
+} else {
+/* Loopback interdomain ports; just a complex IPI */
+set_port_pending(s, p->type_val);
+}
+break;
+
+case EVTCHNSTAT_ipi:
+set_port_pending(s, send->port);
+break;
+
+case EVTCHNSTAT_unbound:
+/* Xen will silently drop these */
+

[RFC PATCH v3 35/38] i386/xen: add monitor commands to test event injection

From: Joao Martins 

Specifically add listing, injection of event channels.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hmp-commands.hx  | 30 +++
 hw/i386/kvm/xen_evtchn.c | 83 
 hw/i386/kvm/xen_evtchn.h |  3 ++
 monitor/misc.c   |  4 ++
 4 files changed, 120 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 673e39a697..a36516c287 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1815,3 +1815,33 @@ SRST
   Dump the FDT in dtb format to *filename*.
 ERST
 #endif
+
+#if defined(CONFIG_XEN_EMU)
+
+{
+.name   = "xen-event-inject",
+.args_type  = "port:i",
+.params = "port",
+.help   = "inject event channel",
+.cmd= hmp_xen_event_inject,
+},
+
+SRST
+``xen-event-inject`` *port*
+  Notify guest via event channel on port *port*.
+ERST
+
+
+{
+.name   = "xen-event-list",
+.args_type  = "",
+.params = "",
+.help   = "list event channel state",
+.cmd= hmp_xen_event_list,
+},
+
+SRST
+``xen-event-list``
+  List event channels in the guest
+ERST
+#endif
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 225d984371..9292602c09 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -19,6 +19,8 @@
 #include "exec/target_page.h"
 #include "exec/address-spaces.h"
 #include "migration/vmstate.h"
+#include "monitor/monitor.h"
+#include "qapi/qmp/qdict.h"
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
@@ -955,3 +957,84 @@ int xen_evtchn_send_op(struct evtchn_send *send)
 return ret;
 }
 
+static const char *type_names[] = {
+"closed",
+"unbound",
+"interdomain",
+"pirq",
+"virq",
+"ipi"
+};
+
+void hmp_xen_event_list(Monitor *mon, const QDict *qdict)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+void *shinfo, *pending, *mask;
+int i;
+
+if (!s) {
+monitor_printf(mon, "Xen event channel emulation not enabled\n");
+return;
+}
+
+shinfo = xen_overlay_page_ptr(XENMAPSPACE_shared_info, 0);
+if (!shinfo) {
+monitor_printf(mon, "Xen shared info page not allocated\n");
+return;
+}
+if (xen_is_long_mode()) {
+pending = shinfo + offsetof(struct shared_info, evtchn_pending);
+mask = shinfo + offsetof(struct shared_info, evtchn_mask);
+} else {
+pending = shinfo + offsetof(struct compat_shared_info, evtchn_pending);
+mask = shinfo + offsetof(struct compat_shared_info, evtchn_mask);
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+for (i = 0; i < s->nr_ports; i++) {
+XenEvtchnPort *p = &s->port_table[i];
+
+if (p->type == EVTCHNSTAT_closed) {
+continue;
+}
+
+monitor_printf(mon, "port %4u %s/%d vcpu:%d pending:%d mask:%d\n", i,
+   type_names[p->type], p->type_val, p->vcpu,
+   test_bit(i, pending), test_bit(i, mask));
+}
+
+qemu_mutex_unlock(&s->port_lock);
+}
+
+void hmp_xen_event_inject(Monitor *mon, const QDict *qdict)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int port = qdict_get_int(qdict, "port");
+XenEvtchnPort *p;
+
+if (!s) {
+monitor_printf(mon, "Xen event channel emulation not enabled\n");
+return;
+}
+
+if (!valid_port(port)) {
+monitor_printf(mon, "Invalid port %d\n", port);
+return;
+}
+p = &s->port_table[port];
+
+qemu_mutex_lock(&s->port_lock);
+
+monitor_printf(mon, "port %4u %s/%d vcpu:%d\n", port,
+   type_names[p->type], p->type_val, p->vcpu);
+
+if (set_port_pending(s, port)) {
+monitor_printf(mon, "Failed to set port %d\n", port);
+} else {
+monitor_printf(mon, "Delivered port %d\n", port);
+}
+
+qemu_mutex_unlock(&s->port_lock);
+}
+
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b93f534bee..2acbaeabaa 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -13,6 +13,9 @@
 void xen_evtchn_create(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
+void hmp_xen_event_list(Monitor *mon, const QDict *qdict);
+void hmp_xen_event_inject(Monitor *mon, const QDict *qdict);
+
 struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
diff --git a/monitor/misc.c b/monitor/misc.c
index 205487e2b9..2b11c0f86a 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -88,6 +88,10 @@
 /* Make devices configuration available for use in hmp-commands*.hx templates 
*/
 #include CONFIG_DEVICES
 
+#ifdef CONFIG_XEN_EMU
+#include "hw/i386/kvm/xen_evtchn.h"
+#endif
+
 /* file descriptors passed via SCM_RIGHTS */
 typedef struct mon_fd_t mon_fd_t;
 struct mon_fd_t {
-- 
2.35.3

[RFC PATCH v3 26/38] hw/xen: Implement EVTCHNOP_close

From: David Woodhouse 

It calls an internal close_port() helper which will also be used from
EVTCHNOP_reset and will actually do the work to disconnect/unbind a port
once any of that is actually implemented in the first place.

That in turn calls a free_port() internal function which will be in
error paths after allocation.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 51 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 12 +
 3 files changed, 65 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 77acf58540..d4008e7ee1 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -221,3 +221,54 @@ int xen_evtchn_status_op(struct evtchn_status *status)
 qemu_mutex_unlock(&s->port_lock);
 return 0;
 }
+
+static void free_port(XenEvtchnState *s, evtchn_port_t port)
+{
+s->port_table[port].type = EVTCHNSTAT_closed;
+s->port_table[port].type_val = 0;
+s->port_table[port].vcpu = 0;
+
+if (s->nr_ports == port + 1) {
+do {
+s->nr_ports--;
+} while (s->port_table[s->nr_ports - 1].type == EVTCHNSTAT_closed);
+}
+}
+
+static int close_port(XenEvtchnState *s, evtchn_port_t port)
+{
+XenEvtchnPort *p = &s->port_table[port];
+
+switch (p->type) {
+case EVTCHNSTAT_closed:
+return -ENOENT;
+
+default:
+break;
+}
+
+free_port(s, port);
+return 0;
+}
+
+int xen_evtchn_close_op(struct evtchn_close *close)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(close->port)) {
+return -EINVAL;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+ret = close_port(s, close->port);
+
+qemu_mutex_unlock(&s->port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 6f50e5c52d..4c0315 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -14,4 +14,6 @@ void xen_evtchn_create(void);
 int xen_evtchn_set_callback_param(uint64_t param);
 
 struct evtchn_status;
+struct evtchn_close;
 int xen_evtchn_status_op(struct evtchn_status *status);
+int xen_evtchn_close_op(struct evtchn_close *close);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index d4a35bef64..f57d99f9d6 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -640,6 +640,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_close: {
+struct evtchn_close close;
+
+qemu_build_assert(sizeof(close) == 4);
+if (kvm_copy_from_gva(cs, arg, &close, sizeof(close))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_close_op(&close);
+break;
+}
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 16/38] i386/xen: implement HYPERVISOR_hvm_op

From: Joao Martins 

This is when guest queries for support for HVMOP_pagetable_dying.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index c23026b872..da77297ef9 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -22,6 +22,7 @@
 #include "hw/i386/kvm/xen_overlay.h"
 #include "standard-headers/xen/version.h"
 #include "standard-headers/xen/memory.h"
+#include "standard-headers/xen/hvm/hvm_op.h"
 
 #include "xen-compat.h"
 
@@ -303,6 +304,19 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+ int cmd, uint64_t arg)
+{
+switch (cmd) {
+case HVMOP_pagetable_dying:
+exit->u.hcall.result = -ENOSYS;
+return true;
+
+default:
+return false;
+}
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -313,6 +327,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_hvm_op:
+return kvm_xen_hcall_hvm_op(exit, cpu, exit->u.hcall.params[0],
+exit->u.hcall.params[1]);
 case __HYPERVISOR_memory_op:
 return kvm_xen_hcall_memory_op(exit, cpu, exit->u.hcall.params[0],
exit->u.hcall.params[1]);
-- 
2.35.3

[RFC PATCH v3 17/38] i386/xen: implement HYPERVISOR_vcpu_op

From: Joao Martins 

This is simply when guest tries to register a vcpu_info
and since vcpu_info placement is optional in the minimum ABI
therefore we can just fail with -ENOSYS

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index da77297ef9..83d98cbfd9 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -23,6 +23,7 @@
 #include "standard-headers/xen/version.h"
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/hvm_op.h"
+#include "standard-headers/xen/vcpu.h"
 
 #include "xen-compat.h"
 
@@ -317,6 +318,25 @@ static bool kvm_xen_hcall_hvm_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 }
 
+static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+  int cmd, int vcpu_id, uint64_t arg)
+{
+int err;
+
+switch (cmd) {
+case VCPUOP_register_vcpu_info:
+/* no vcpu info placement for now */
+err = -ENOSYS;
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -327,6 +347,11 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_vcpu_op:
+return kvm_xen_hcall_vcpu_op(exit, cpu,
+ exit->u.hcall.params[0],
+ exit->u.hcall.params[1],
+ exit->u.hcall.params[2]);
 case __HYPERVISOR_hvm_op:
 return kvm_xen_hcall_hvm_op(exit, cpu, exit->u.hcall.params[0],
 exit->u.hcall.params[1]);
-- 
2.35.3

[RFC PATCH v3 22/38] i386/xen: HVMOP_set_param / HVM_PARAM_CALLBACK_IRQ

From: Ankur Arora 

The HVM_PARAM_CALLBACK_IRQ parameter controls the system-wide event
channel upcall method.  The vector support is handled by KVM internally,
when the evtchn_upcall_pending field in the vcpu_info is set.

The GSI and PCI_INTX delivery methods are not supported. yet; those
need to simulate a level-triggered event on the I/OAPIC.

Add a 'xen_evtchn' device to host the migration state, as we'll shortly
be adding a full event channel table there too.

Signed-off-by: Ankur Arora 
Signed-off-by: Joao Martins 
[dwmw2: Rework for upstream kernel changes, split from per-VCPU vector]
Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/meson.build   |   5 +-
 hw/i386/kvm/xen_evtchn.c  | 117 ++
 hw/i386/kvm/xen_evtchn.h  |  13 +
 hw/i386/pc.c  |   2 +
 target/i386/kvm/xen-emu.c |  39 +
 5 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 hw/i386/kvm/xen_evtchn.c
 create mode 100644 hw/i386/kvm/xen_evtchn.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 6165cbf019..cab64df339 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,6 +4,9 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
-i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files(
+  'xen_overlay.c',
+  'xen_evtchn.c',
+  ))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
new file mode 100644
index 00..1ca0c034e7
--- /dev/null
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -0,0 +1,117 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_evtchn.h"
+
+#include "sysemu/kvm.h"
+#include 
+
+#include "standard-headers/xen/memory.h"
+#include "standard-headers/xen/hvm/params.h"
+
+#define TYPE_XEN_EVTCHN "xenevtchn"
+OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
+
+struct XenEvtchnState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+uint64_t callback_param;
+};
+
+struct XenEvtchnState *xen_evtchn_singleton;
+
+static int xen_evtchn_post_load(void *opaque, int version_id)
+{
+XenEvtchnState *s = opaque;
+
+if (s->callback_param) {
+xen_evtchn_set_callback_param(s->callback_param);
+}
+
+return 0;
+}
+
+static bool xen_evtchn_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_evtchn_vmstate = {
+.name = "xen_evtchn",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xen_evtchn_is_needed,
+.post_load = xen_evtchn_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(callback_param, XenEvtchnState),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void xen_evtchn_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->vmsd = &xen_evtchn_vmstate;
+}
+
+static const TypeInfo xen_evtchn_info = {
+.name  = TYPE_XEN_EVTCHN,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XenEvtchnState),
+.class_init= xen_evtchn_class_init,
+};
+
+void xen_evtchn_create(void)
+{
+xen_evtchn_singleton = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN, 
-1, NULL));
+}
+
+static void xen_evtchn_register_types(void)
+{
+type_register_static(&xen_evtchn_info);
+}
+
+type_init(xen_evtchn_register_types)
+
+
+#define CALLBACK_VIA_TYPE_SHIFT   56
+
+int xen_evtchn_set_callback_param(uint64_t param)
+{
+int ret = -ENOSYS;
+
+if (param >> CALLBACK_VIA_TYPE_SHIFT == HVM_PARAM_CALLBACK_TYPE_VECTOR) {
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_UPCALL_VECTOR,
+.u.vector = (uint8_t)param,
+};
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &xa);
+if (!ret && xen_evtchn_singleton)
+xen_evtchn_singleton->callback_param = param;
+}
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
new file mode 100644
index 00..11c6ed22a0
--- /dev/null
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -0,0 +1,13 @@
+/*
+ * QEMU Xen emulation: Event channel support
+ *
+

[RFC PATCH v3 07/38] xen-platform: allow its creation with XEN_EMULATE mode

From: Joao Martins 

The only thing we need to handle on KVM side is to change the
pfn from R/W to R/O.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hw/i386/xen/xen_platform.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index a6f0fb478a..15d5ae7c69 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -283,7 +283,10 @@ static void platform_fixed_ioport_writeb(void *opaque, 
uint32_t addr, uint32_t v
 case 0: /* Platform flags */ {
 hvmmem_type_t mem_type = (val & PFFLAG_ROM_LOCK) ?
 HVMMEM_ram_ro : HVMMEM_ram_rw;
-if (xen_set_mem_type(xen_domid, mem_type, 0xc0, 0x40)) {
+if (xen_mode == XEN_EMULATE) {
+/* XXX */
+s->flags = val & PFFLAG_ROM_LOCK;
+} else if (xen_set_mem_type(xen_domid, mem_type, 0xc0, 0x40)) {
 DPRINTF("unable to change ro/rw state of ROM memory area!\n");
 } else {
 s->flags = val & PFFLAG_ROM_LOCK;
@@ -508,12 +511,6 @@ static void xen_platform_realize(PCIDevice *dev, Error 
**errp)
 PCIXenPlatformState *d = XEN_PLATFORM(dev);
 uint8_t *pci_conf;
 
-/* Device will crash on reset if xen is not initialized */
-if (!xen_enabled()) {
-error_setg(errp, "xen-platform device requires the Xen accelerator");
-return;
-}
-
 pci_conf = dev->config;
 
 pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_IO | PCI_COMMAND_MEMORY);
-- 
2.35.3

[RFC PATCH v3 34/38] hw/xen: Implement EVTCHNOP_reset

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 24 
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 12 
 3 files changed, 38 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 8cdc26afb7..225d984371 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -643,6 +643,30 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 return 0;
 }
 
+int xen_evtchn_reset_op(struct evtchn_reset *reset)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int i;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (reset->dom != DOMID_SELF && reset->dom != xen_domid) {
+return -ESRCH;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+for (i = 0; i < s->nr_ports; i++) {
+close_port(s, i);
+}
+
+qemu_mutex_unlock(&s->port_lock);
+
+return 0;
+}
+
 int xen_evtchn_close_op(struct evtchn_close *close)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 99d5292c1e..b93f534bee 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -22,6 +22,7 @@ struct evtchn_send;
 struct evtchn_alloc_unbound;
 struct evtchn_bind_interdomain;
 struct evtchn_bind_vcpu;
+struct evtchn_reset;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -31,3 +32,4 @@ int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
 int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
+int xen_evtchn_reset_op(struct evtchn_reset *reset);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 58fa82682f..055afba627 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -854,6 +854,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = xen_evtchn_bind_vcpu_op(&vcpu);
 break;
 }
+case EVTCHNOP_reset: {
+struct evtchn_reset reset;
+
+qemu_build_assert(sizeof(reset) == 2);
+if (kvm_copy_from_gva(cs, arg, &reset, sizeof(reset))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_reset_op(&reset);
+break;
+}
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 27/38] hw/xen: Implement EVTCHNOP_unmask

From: David Woodhouse 

This finally comes with a mechanism for actually injecting events into
the guest vCPU, with all the atomic-test-and-set that's involved in
setting the bit in the shinfo, then the index in the vcpu_info, and
injecting either the lapic vector as MSI, or letting KVM inject the
bare vector.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 198 ++
 hw/i386/kvm/xen_evtchn.h  |   2 +
 include/sysemu/kvm_xen.h  |  18 
 target/i386/kvm/xen-emu.c |  72 ++
 4 files changed, 290 insertions(+)
 create mode 100644 include/sysemu/kvm_xen.h

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d4008e7ee1..50adef0864 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -21,10 +21,13 @@
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
+
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
 
 #include "sysemu/kvm.h"
+#include "sysemu/kvm_xen.h"
+
 #include 
 
 #include "standard-headers/xen/memory.h"
@@ -39,6 +42,41 @@ typedef struct XenEvtchnPort {
 uint16_t type_val;  /* pirq# / virq# / remote port according to type */
 } XenEvtchnPort;
 
+/* 32-bit compatibility definitions, also used natively in 32-bit build */
+struct compat_arch_vcpu_info {
+unsigned int cr2;
+unsigned int pad[5];
+};
+
+struct compat_vcpu_info {
+uint8_t evtchn_upcall_pending;
+uint8_t evtchn_upcall_mask;
+uint16_t pad;
+uint32_t evtchn_pending_sel;
+struct compat_arch_vcpu_info arch;
+struct vcpu_time_info time;
+}; /* 64 bytes (x86) */
+
+struct compat_arch_shared_info {
+unsigned int max_pfn;
+unsigned int pfn_to_mfn_frame_list_list;
+unsigned int nmi_reason;
+unsigned int p2m_cr3;
+unsigned int p2m_vaddr;
+unsigned int p2m_generation;
+uint32_t wc_sec_hi;
+};
+
+struct compat_shared_info {
+struct compat_vcpu_info vcpu_info[XEN_LEGACY_MAX_VCPUS];
+uint32_t evtchn_pending[32];
+uint32_t evtchn_mask[32];
+uint32_t wc_version;  /* Version counter: see vcpu_time_info_t. */
+uint32_t wc_sec;
+uint32_t wc_nsec;
+struct compat_arch_shared_info arch;
+};
+
 #define COMPAT_EVTCHN_2L_NR_CHANNELS1024
 
 /*
@@ -222,6 +260,144 @@ int xen_evtchn_status_op(struct evtchn_status *status)
 return 0;
 }
 
+/*
+ * Never thought I'd hear myself say this, but C++ templates would be
+ * kind of nice here.
+ *
+ * template static int do_unmask_port(T *shinfo, ...);
+ */
+static int do_unmask_port_lm(XenEvtchnState *s, evtchn_port_t port,
+ bool do_unmask, struct shared_info *shinfo,
+ struct vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+if (do_unmask) {
+/* If this is a true unmask operation, clear the mask bit. If
+ * it was already unmasked, we have nothing further to do. */
+if (!((qatomic_fetch_and(&shinfo->evtchn_mask[idx], ~mask) & mask))) {
+return 0;
+}
+} else {
+/* This is a pseudo-unmask for affinity changes. We don't
+ * change the mask bit, and if it's *masked* we have nothing
+ * else to do. */
+if (qatomic_fetch_or(&shinfo->evtchn_mask[idx], 0) & mask) {
+return 0;
+}
+}
+
+/* If the event was not pending, we're done. */
+if (!(qatomic_fetch_or(&shinfo->evtchn_pending[idx], 0) & mask)) {
+return 0;
+}
+
+/* Now on to the vcpu_info evtchn_pending_sel index... */
+mask = 1UL << idx;
+
+/* If a port in this word was already pending for this vCPU, all done. */
+if (qatomic_fetch_or(&vcpu_info->evtchn_pending_sel, mask) & mask) {
+return 0;
+}
+
+/* Set evtchn_upcall_pending for this vCPU */
+if (qatomic_fetch_or(&vcpu_info->evtchn_upcall_pending, 1)) {
+return 0;
+}
+
+kvm_xen_inject_vcpu_callback_vector(s->port_table[port].vcpu);
+
+return 0;
+}
+
+static int do_unmask_port_compat(XenEvtchnState *s, evtchn_port_t port,
+ bool do_unmask,
+ struct compat_shared_info *shinfo,
+ struct compat_vcpu_info *vcpu_info)
+{
+const int bits_per_word = BITS_PER_BYTE * 
sizeof(shinfo->evtchn_pending[0]);
+typeof(shinfo->evtchn_pending[0]) mask;
+int idx = port / bits_per_word;
+int offset = port % bits_per_word;
+
+mask = 1UL << offset;
+
+if (idx >= bits_per_word) {
+return -EINVAL;
+}
+
+if (do_unmask) {
+/* If this is a true unmask operation, clear the mask bit. If
+ * it was already unmasked, we have nothing further to do. */
+if (!((qatomic_fetch_and(&shinfo->ev

[RFC PATCH v3 20/38] i386/xen: handle VCPUOP_register_runstate_memory_area

From: Joao Martins 

Allow guest to setup the vcpu runstates which is used as
steal clock.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c |  9 +
 target/i386/kvm/xen-emu.c | 42 +++
 target/i386/machine.c |  4 +++-
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 96c2d0d5cb..bf44a87ddb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1791,6 +1791,7 @@ typedef struct CPUArchState {
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
+uint64_t xen_vcpu_runstate_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 766e0add13..8aee95d4c1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1805,6 +1805,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 env->xen_vcpu_info_gpa = UINT64_MAX;
 env->xen_vcpu_info_default_gpa = UINT64_MAX;
 env->xen_vcpu_time_info_gpa = UINT64_MAX;
+env->xen_vcpu_runstate_gpa = UINT64_MAX;
 
 if (cs->kvm_state->xen_version) {
 #ifdef CONFIG_XEN_EMU
@@ -4748,6 +4749,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 return ret;
 }
 }
+
+gpa = x86_cpu->env.xen_vcpu_runstate_gpa;
+if (gpa != UINT64_MAX) {
+ret = kvm_xen_set_vcpu_attr(cpu, 
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR, gpa);
+if (ret < 0) {
+return ret;
+}
+}
 }
 #endif
 
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index b45d5af7d7..1297b37ee8 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -208,6 +208,17 @@ static void do_set_vcpu_time_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_time_info_gpa);
 }
 
+static void do_set_vcpu_runstate_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+env->xen_vcpu_runstate_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR,
+  env->xen_vcpu_runstate_gpa);
+}
+
 static int xen_set_shared_info(uint64_t gfn)
 {
 uint64_t gpa = gfn << TARGET_PAGE_BITS;
@@ -448,6 +459,34 @@ static int vcpuop_register_vcpu_time_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static int vcpuop_register_runstate_info(CPUState *cs, CPUState *target,
+ uint64_t arg)
+{
+struct vcpu_register_runstate_memory_area rma;
+uint64_t gpa;
+size_t len;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(rma) == 8);
+/* The runstate area actually does change size, but Linux copes. */
+
+if (!target)
+return -ENOENT;
+
+if (kvm_copy_from_gva(cs, arg, &rma, sizeof(rma))) {
+return -EFAULT;
+}
+
+/* As with vcpu_time_info, Xen actually uses the GVA but KVM doesn't. */
+if (!kvm_gva_to_gpa(cs, rma.addr.p, &gpa, &len, false)) {
+return -EFAULT;
+}
+
+async_run_on_cpu(target, do_set_vcpu_runstate_gpa,
+ RUN_ON_CPU_HOST_ULONG(gpa));
+return 0;
+}
+
 static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit *exit, X86CPU *cpu,
   int cmd, int vcpu_id, uint64_t arg)
 {
@@ -456,6 +495,9 @@ static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 int err;
 
 switch (cmd) {
+case VCPUOP_register_runstate_memory_area:
+err = vcpuop_register_runstate_info(cs, dest, arg);
+break;
 case VCPUOP_register_vcpu_time_memory_area:
 err = vcpuop_register_vcpu_time_info(cs, dest, arg);
 break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 9acef102a3..6a510e5cbd 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -1264,7 +1264,8 @@ static bool xen_vcpu_needed(void *opaque)
 
 return (env->xen_vcpu_info_gpa != UINT64_MAX ||
 env->xen_vcpu_info_default_gpa != UINT64_MAX ||
-env->xen_vcpu_time_info_gpa != UINT64_MAX);
+env->xen_vcpu_time_info_gpa != UINT64_MAX ||
+env->xen_vcpu_runstate_gpa != UINT64_MAX);
 }
 
 static const VMStateDescription vmstate_xen_vcpu = {
@@ -1276,6 +1277,7 @@ static const VMStateDescription vmstate_xen_vcpu = {
 VMSTATE_UINT64(env.xen_vcpu_info_gpa, X86CPU),
 VMSTATE_UINT64(env.xen_vcpu_info_default_gpa, X86CPU),
 VMSTATE_UINT64(env.xen_vcpu_time_info_gpa, X86CPU),
+VMSTATE_UINT64(env.xen_vcpu_runstate_gpa, X86CPU),
 VMSTATE_END_OF_LIST()
 }
 };
-- 
2.35.3

[RFC PATCH v3 14/38] i386/xen: implement HYPERVISOR_memory_op

From: Joao Martins 

Specifically XENMEM_add_to_physmap with space XENMAPSPACE_shared_info to
allow the guest to set its shared_info page.

Signed-off-by: Joao Martins 
[dwmw2: Use the xen_overlay device, add compat support]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/trace-events |   1 +
 target/i386/kvm/xen-compat.h |  27 +
 target/i386/kvm/xen-emu.c| 104 ++-
 3 files changed, 131 insertions(+), 1 deletion(-)
 create mode 100644 target/i386/kvm/xen-compat.h

diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index cd6f842b1f..0a47c26e80 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -8,3 +8,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 
 # xen-emu.c
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
+kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
new file mode 100644
index 00..0b7088662a
--- /dev/null
+++ b/target/i386/kvm/xen-compat.h
@@ -0,0 +1,27 @@
+/*
+ * Xen HVM emulation support in KVM
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_I386_KVM_XEN_COMPAT_H
+#define QEMU_I386_KVM_XEN_COMPAT_H
+
+#include "standard-headers/xen/memory.h"
+
+typedef uint32_t compat_pfn_t;
+typedef uint32_t compat_ulong_t;
+
+struct compat_xen_add_to_physmap {
+domid_t domid;
+uint16_t size;
+unsigned int space;
+compat_ulong_t idx;
+compat_pfn_t gpfn;
+};
+
+#endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 11e34ed125..1fecab6e10 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -9,16 +9,27 @@
  *
  */
 
+#define __XEN_INTERFACE_VERSION__ 0x00040e00
+
 #include "qemu/osdep.h"
 #include "qemu/log.h"
+#include "hw/xen/xen.h"
 #include "sysemu/kvm_int.h"
 #include "kvm/kvm_i386.h"
 #include "exec/address-spaces.h"
 #include "xen-emu.h"
-#include "xen.h"
 #include "trace.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "standard-headers/xen/version.h"
+#include "standard-headers/xen/memory.h"
+
+#include "xen-compat.h"
+
+#ifdef TARGET_X86_64
+#define hypercall_compat32(longmode) (!(longmode))
+#else
+#define hypercall_compat32(longmode) (false)
+#endif
 
 static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
   bool is_write)
@@ -134,6 +145,94 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int xen_set_shared_info(uint64_t gfn)
+{
+uint64_t gpa = gfn << TARGET_PAGE_BITS;
+int err;
+
+/* The xen_overlay device tells KVM about it too, since it had to
+ * do that on migration load anyway (unless we're going to jump
+ * through lots of hoops to maintain the fiction that this isn't
+ * KVM-specific */
+err = xen_overlay_map_page(XENMAPSPACE_shared_info, 0, gpa);
+if (err)
+return err;
+
+trace_kvm_xen_set_shared_info(gfn);
+
+return err;
+}
+
+static int add_to_physmap_one(uint32_t space, uint64_t idx, uint64_t gfn)
+{
+switch (space) {
+case XENMAPSPACE_shared_info:
+if (idx > 0) {
+return -EINVAL;
+}
+return xen_set_shared_info(gfn);
+
+case XENMAPSPACE_grant_table:
+case XENMAPSPACE_gmfn:
+case XENMAPSPACE_gmfn_range:
+return -ENOTSUP;
+
+case XENMAPSPACE_gmfn_foreign:
+case XENMAPSPACE_dev_mmio:
+return -EPERM;
+
+default:
+return -EINVAL;;
+}
+}
+
+static int do_add_to_physmap(struct kvm_xen_exit *exit, X86CPU *cpu, uint64_t 
arg)
+{
+struct xen_add_to_physmap xatp;
+CPUState *cs = CPU(cpu);
+
+if (hypercall_compat32(exit->u.hcall.longmode)) {
+struct compat_xen_add_to_physmap xatp32;
+
+qemu_build_assert(sizeof(struct compat_xen_add_to_physmap) == 16);
+if (kvm_copy_from_gva(cs, arg, &xatp32, sizeof(xatp32))) {
+return -EFAULT;
+}
+xatp.domid = xatp32.domid;
+xatp.size = xatp32.size;
+xatp.space = xatp32.space;
+xatp.idx = xatp32.idx;
+xatp.gpfn = xatp32.gpfn;
+} else {
+if (kvm_copy_from_gva(cs, arg, &xatp, sizeof(xatp))) {
+return -EFAULT;
+}
+}
+
+if (xatp.domid != DOMID_SELF && xatp.domid != xen_domid) {
+return -ESRCH;
+}
+
+return add_to_physmap_one(xatp.space, xatp.idx, xatp.gpfn);
+}
+static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+   int cmd, uint64_t arg)
+{
+i

[RFC PATCH v3 31/38] hw/xen: Implement EVTCHNOP_alloc_unbound

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 32 
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 15 +++
 3 files changed, 49 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d90a92a25a..4272b63853 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -732,6 +732,38 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 return ret;
 }
 
+int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+uint16_t type_val;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (alloc->dom != DOMID_SELF && alloc->dom != xen_domid) {
+return -ESRCH;
+}
+
+if (alloc->remote_dom == DOMID_QEMU) {
+type_val = PORT_INFO_TYPEVAL_REMOTE_QEMU;
+} else if (alloc->remote_dom == DOMID_SELF ||
+   alloc->remote_dom == xen_domid) {
+type_val = 0;
+} else {
+return -EPERM;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+ret = allocate_port(s, 0, EVTCHNSTAT_unbound, type_val, &alloc->port);
+
+qemu_mutex_unlock(&s->port_lock);
+
+return ret;
+}
+
 int xen_evtchn_send_op(struct evtchn_send *send)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index c27b9e8096..5dc68a188d 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -19,9 +19,11 @@ struct evtchn_unmask;
 struct evtchn_bind_virq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
+struct evtchn_alloc_unbound;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
+int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 300b0d75bc..6f393c4149 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -812,6 +812,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = xen_evtchn_send_op(&send);
 break;
 }
+case EVTCHNOP_alloc_unbound: {
+struct evtchn_alloc_unbound alloc;
+
+qemu_build_assert(sizeof(alloc) == 8);
+if (kvm_copy_from_gva(cs, arg, &alloc, sizeof(alloc))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_alloc_unbound_op(&alloc);
+if (!err && kvm_copy_to_gva(cs, arg, &alloc, sizeof(alloc))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 23/38] i386/xen: implement HYPERVISOR_event_channel_op

From: Joao Martins 

Additionally set XEN_INTERFACE_VERSION to most recent in order to
exercise the "new" event_channel_op.

Signed-off-by: Joao Martins 
[dwmw2: Ditch event_channel_op_compat which was never available to HVM guests]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 09978c83ca..d6f3102d8e 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -27,6 +27,7 @@
 #include "standard-headers/xen/hvm/hvm_op.h"
 #include "standard-headers/xen/hvm/params.h"
 #include "standard-headers/xen/vcpu.h"
+#include "standard-headers/xen/event_channel.h"
 
 #include "xen-compat.h"
 
@@ -611,6 +612,23 @@ static bool kvm_xen_hcall_vcpu_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit *exit,
+int cmd, uint64_t arg)
+{
+int err = -ENOSYS;
+
+switch (cmd) {
+case EVTCHNOP_init_control:
+err = -ENOSYS;
+break;
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -621,6 +639,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_event_channel_op:
+return kvm_xen_hcall_evtchn_op(exit, exit->u.hcall.params[0],
+   exit->u.hcall.params[1]);
 case __HYPERVISOR_vcpu_op:
 return kvm_xen_hcall_vcpu_op(exit, cpu,
  exit->u.hcall.params[0],
-- 
2.35.3

[RFC PATCH v3 33/38] hw/xen: Implement EVTCHNOP_bind_vcpu

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 40 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 12 
 3 files changed, 54 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index b286bbd20e..8cdc26afb7 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -687,6 +687,46 @@ int xen_evtchn_unmask_op(struct evtchn_unmask *unmask)
 return ret;
 }
 
+int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+int ret = -EINVAL;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_port(vcpu->port)) {
+return -EINVAL;
+}
+
+if (!valid_vcpu(vcpu->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+p = &s->port_table[vcpu->port];
+
+if (p->type == EVTCHNSTAT_interdomain ||
+p->type == EVTCHNSTAT_unbound ||
+p->type == EVTCHNSTAT_pirq ||
+(p->type == EVTCHNSTAT_virq && virq_is_global(p->type_val))) {
+/*
+ * unmask_port() with do_unmask==false will just raise the event
+ * on the new vCPU if the port was already pending.
+ */
+p->vcpu = vcpu->vcpu;
+unmask_port(s, vcpu->port, false);
+ret = 0;
+}
+
+qemu_mutex_unlock(&s->port_lock);
+
+return ret;
+}
+
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 4783a6f127..99d5292c1e 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -21,6 +21,7 @@ struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
 struct evtchn_bind_interdomain;
+struct evtchn_bind_vcpu;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
@@ -29,3 +30,4 @@ int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
 int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain 
*interdomain);
+int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index c4c595cb1a..58fa82682f 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -842,6 +842,18 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_vcpu: {
+struct evtchn_bind_vcpu vcpu;
+
+qemu_build_assert(sizeof(vcpu) == 8);
+if (kvm_copy_from_gva(cs, arg, &vcpu, sizeof(vcpu))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_vcpu_op(&vcpu);
+break;
+}
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 29/38] hw/xen: Implement EVTCHNOP_bind_ipi

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 67 +++
 hw/i386/kvm/xen_evtchn.h  |  2 ++
 target/i386/kvm/xen-emu.c | 15 +
 3 files changed, 84 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 19b8eb7a6f..2e35812b32 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -13,6 +13,7 @@
 #include "qemu/host-utils.h"
 #include "qemu/module.h"
 #include "qemu/main-loop.h"
+#include "qemu/log.h"
 #include "qapi/error.h"
 #include "qom/object.h"
 #include "exec/target_page.h"
@@ -195,6 +196,43 @@ int xen_evtchn_set_callback_param(uint64_t param)
 return ret;
 }
 
+static void deassign_kernel_port(evtchn_port_t port)
+{
+struct kvm_xen_hvm_attr ha;
+int ret;
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.flags = KVM_XEN_EVTCHN_DEASSIGN;
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "Failed to unbind kernel port %d: %s\n",
+  port, strerror(ret));
+}
+}
+
+static int assign_kernel_port(uint16_t type, evtchn_port_t port,
+  uint32_t vcpu_id)
+{
+CPUState *cpu = qemu_get_cpu(vcpu_id);
+struct kvm_xen_hvm_attr ha;
+
+if (!cpu) {
+return -ENOENT;
+}
+
+ha.type = KVM_XEN_ATTR_TYPE_EVTCHN;
+ha.u.evtchn.send_port = port;
+ha.u.evtchn.type = type;
+ha.u.evtchn.flags = 0;
+ha.u.evtchn.deliver.port.port = port;
+ha.u.evtchn.deliver.port.vcpu = kvm_arch_vcpu_id(cpu);
+ha.u.evtchn.deliver.port.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+return kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &ha);
+}
+
 static bool valid_port(evtchn_port_t port)
 {
 if (!port) {
@@ -465,6 +503,10 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
   p->type_val, 0);
 break;
 
+case EVTCHNSTAT_ipi:
+deassign_kernel_port(port);
+break;
+
 default:
 break;
 }
@@ -553,3 +595,28 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
 
 return ret;
 }
+
+int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (!valid_vcpu(ipi->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+ret = allocate_port(s, ipi->vcpu, EVTCHNSTAT_ipi, 0, &ipi->port);
+if (!ret) {
+assign_kernel_port(EVTCHNSTAT_ipi, ipi->port, ipi->vcpu);
+}
+
+qemu_mutex_unlock(&s->port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index ffddd87bdc..52ade5a64e 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -17,7 +17,9 @@ struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
 struct evtchn_bind_virq;
+struct evtchn_bind_ipi;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
+int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index d6b26c59e1..b5f8f30d62 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -785,6 +785,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 }
 break;
 }
+case EVTCHNOP_bind_ipi: {
+struct evtchn_bind_ipi ipi;
+
+qemu_build_assert(sizeof(ipi) == 8);
+if (kvm_copy_from_gva(cs, arg, &ipi, sizeof(ipi))) {
+err = -EFAULT;
+break;
+}
+
+err = xen_evtchn_bind_ipi_op(&ipi);
+if (!err && kvm_copy_to_gva(cs, arg, &ipi, sizeof(ipi))) {
+err = -EFAULT;
+}
+break;
+}
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 02/38] xen: add CONFIG_XENFV_MACHINE and CONFIG_XEN_EMU options for Xen emulation

From: David Woodhouse 

The XEN_EMU option will cover core Xen support in target/, which exists
only for x86 with KVM today but could theoretically also be implemented
on Arm/Aarch64 and with TCG or other accelerators. It will also cover
the support for architecture-independent grant table and event channel
support which will be added in hw/i386/kvm/ (on the basis that the
non-KVM support is very theoretical and making it not use KVM directly
seems like gratuitous overengineering at this point).

The XENFV_MACHINE option is for the xenfv platform support, which will
now be used both by XEN_EMU and by real Xen.

The XEN option remains dependent on the Xen runtime libraries, and covers
support for real Xen. Some code which currently resides under CONFIG_XEN
will be moving to CONFIG_XENFV_MACHINE over time.

Signed-off-by: David Woodhouse 
---
 hw/Kconfig  | 1 +
 hw/i386/Kconfig | 5 +
 hw/xen/Kconfig  | 3 +++
 meson.build | 1 +
 4 files changed, 10 insertions(+)
 create mode 100644 hw/xen/Kconfig

diff --git a/hw/Kconfig b/hw/Kconfig
index 38233bbb0f..ba62ff6417 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -41,6 +41,7 @@ source tpm/Kconfig
 source usb/Kconfig
 source virtio/Kconfig
 source vfio/Kconfig
+source xen/Kconfig
 source watchdog/Kconfig
 
 # arch Kconfig
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index d22ac4a4b9..c9fd577997 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -137,3 +137,8 @@ config VMPORT
 config VMMOUSE
 bool
 depends on VMPORT
+
+config XEN_EMU
+bool
+default y
+depends on KVM && (I386 || X86_64)
diff --git a/hw/xen/Kconfig b/hw/xen/Kconfig
new file mode 100644
index 00..755c8b1faf
--- /dev/null
+++ b/hw/xen/Kconfig
@@ -0,0 +1,3 @@
+config XENFV_MACHINE
+bool
+default y if (XEN || XEN_EMU)
diff --git a/meson.build b/meson.build
index 5c6b5a1c75..9348cf572c 100644
--- a/meson.build
+++ b/meson.build
@@ -3828,6 +3828,7 @@ if have_system
   if xen.found()
 summary_info += {'xen ctrl version':  xen.version()}
   endif
+  summary_info += {'Xen emulation': config_all.has_key('CONFIG_XEN_EMU')}
 endif
 summary_info += {'TCG support':   config_all.has_key('CONFIG_TCG')}
 if config_all.has_key('CONFIG_TCG')
-- 
2.35.3

[RFC PATCH v3 24/38] i386/xen: implement HYPERVISOR_sched_op

From: Joao Martins 

It allows to shutdown itself via hypercall with any of the 3 reasons:
  1) self-reboot
  2) shutdown
  3) crash

Implementing SCHEDOP_shutdown sub op let us handle crashes gracefully rather
than leading to triple faults if it remains unimplemented.

Signed-off-by: Joao Martins 
[dwmw2: Ditch sched_op_compat which was never available for HVM guests]
Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 58 +++
 1 file changed, 58 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index d6f3102d8e..1ff6d32edd 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -21,12 +21,14 @@
 #include "trace.h"
 #include "hw/i386/kvm/xen_overlay.h"
 #include "hw/i386/kvm/xen_evtchn.h"
+#include "sysemu/runstate.h"
 
 #include "standard-headers/xen/version.h"
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/hvm_op.h"
 #include "standard-headers/xen/hvm/params.h"
 #include "standard-headers/xen/vcpu.h"
+#include "standard-headers/xen/sched.h"
 #include "standard-headers/xen/event_channel.h"
 
 #include "xen-compat.h"
@@ -629,6 +631,59 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit 
*exit,
 return true;
 }
 
+static int schedop_shutdown(CPUState *cs, uint64_t arg)
+{
+struct sched_shutdown shutdown;
+int ret = 0;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(shutdown) == 4);
+
+if (kvm_copy_from_gva(cs, arg, &shutdown, sizeof(shutdown))) {
+return -EFAULT;
+}
+
+switch(shutdown.reason) {
+case SHUTDOWN_crash:
+cpu_dump_state(cs, stderr, CPU_DUMP_CODE);
+qemu_system_guest_panicked(NULL);
+break;
+
+case SHUTDOWN_reboot:
+qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
+break;
+
+case SHUTDOWN_poweroff:
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+break;
+
+default:
+ret = -EINVAL;
+break;
+}
+
+return ret;
+}
+
+static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+   int cmd, uint64_t arg)
+{
+CPUState *cs = CPU(cpu);
+int err = -ENOSYS;
+
+switch (cmd) {
+case SCHEDOP_shutdown:
+err = schedop_shutdown(cs, arg);
+break;
+
+default:
+return false;
+}
+
+exit->u.hcall.result = err;
+return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
 uint16_t code = exit->u.hcall.input;
@@ -639,6 +694,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct 
kvm_xen_exit *exit)
 }
 
 switch (code) {
+case __HYPERVISOR_sched_op:
+return kvm_xen_hcall_sched_op(exit, cpu, exit->u.hcall.params[0],
+  exit->u.hcall.params[1]);
 case __HYPERVISOR_event_channel_op:
 return kvm_xen_hcall_evtchn_op(exit, exit->u.hcall.params[0],
exit->u.hcall.params[1]);
-- 
2.35.3

[RFC PATCH v3 13/38] i386/xen: manage and save/restore Xen guest long_mode setting

From: David Woodhouse 

Xen will "latch" the guest's 32-bit or 64-bit ("long mode") setting when
the guest writes the MSR to fill in the hypercall page, or when the guest
sets the event channel callback in HVM_PARAM_CALLBACK_IRQ.

KVM handles the former and sets the kernel's long_mode flag accordingly.
The latter will be handled in userspace. Keep them in sync by noticing
when a hypercall is made in a mode that doesn't match qemu's idea of
the guest mode, and resyncing from the kernel. Do that same sync right
before serialization too, in case the guest has set the hypercall page
but hasn't yet made a system call.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_overlay.c | 63 +++
 hw/i386/kvm/xen_overlay.h |  4 +++
 target/i386/kvm/xen-emu.c | 10 ++-
 3 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
index 2ae54e1a88..a0ddbda91c 100644
--- a/hw/i386/kvm/xen_overlay.c
+++ b/hw/i386/kvm/xen_overlay.c
@@ -47,6 +47,7 @@ struct XenOverlayState {
 MemoryRegion shinfo_mem;
 void *shinfo_ptr;
 uint64_t shinfo_gpa;
+bool long_mode;
 };
 
 struct XenOverlayState *xen_overlay_singleton;
@@ -64,9 +65,19 @@ static void xen_overlay_realize(DeviceState *dev, Error 
**errp)
 memory_region_set_enabled(&s->shinfo_mem, true);
 s->shinfo_ptr = memory_region_get_ram_ptr(&s->shinfo_mem);
 s->shinfo_gpa = INVALID_GPA;
+s->long_mode = false;
 memset(s->shinfo_ptr, 0, XEN_PAGE_SIZE);
 }
 
+static int xen_overlay_pre_save(void *opaque)
+{
+/* Fetch the kernel's idea of long_mode to avoid the race condition where
+ * the guest has set the hypercall page up in 64-bit mode but not yet
+ * made a hypercall by the time migration happens, so qemu hasn't yet
+ * noticed. */
+return xen_sync_long_mode();
+}
+
 static int xen_overlay_post_load(void *opaque, int version_id)
 {
 XenOverlayState *s = opaque;
@@ -74,6 +85,9 @@ static int xen_overlay_post_load(void *opaque, int version_id)
 if (s->shinfo_gpa != INVALID_GPA) {
 xen_overlay_map_page_locked(XENMAPSPACE_shared_info, 0, s->shinfo_gpa);
 }
+if (s->long_mode) {
+xen_set_long_mode(true);
+}
 
 return 0;
 }
@@ -88,9 +102,11 @@ static const VMStateDescription xen_overlay_vmstate = {
 .version_id = 1,
 .minimum_version_id = 1,
 .needed = xen_overlay_is_needed,
+.pre_save = xen_overlay_pre_save,
 .post_load = xen_overlay_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(shinfo_gpa, XenOverlayState),
+VMSTATE_BOOL(long_mode, XenOverlayState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -196,3 +212,50 @@ void *xen_overlay_page_ptr(uint32_t space, uint64_t idx)
 
 return xen_overlay_singleton->shinfo_ptr;
 }
+
+int xen_sync_long_mode(void)
+{
+int ret;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+};
+
+if (!xen_overlay_singleton) {
+return -ENOENT;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_GET_ATTR, &xa);
+if (!ret) {
+xen_overlay_singleton->long_mode = xa.u.long_mode;
+}
+
+return ret;
+}
+
+int xen_set_long_mode(bool long_mode)
+{
+int ret;
+struct kvm_xen_hvm_attr xa = {
+.type = KVM_XEN_ATTR_TYPE_LONG_MODE,
+.u.long_mode = long_mode,
+};
+
+if (!xen_overlay_singleton) {
+return -ENOENT;
+}
+
+ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &xa);
+if (!ret) {
+xen_overlay_singleton->long_mode = xa.u.long_mode;
+}
+
+return ret;
+}
+
+bool xen_is_long_mode(void)
+{
+if (xen_overlay_singleton) {
+return xen_overlay_singleton->long_mode;
+}
+return false;
+}
diff --git a/hw/i386/kvm/xen_overlay.h b/hw/i386/kvm/xen_overlay.h
index afc63991ea..ed8f0ef0e7 100644
--- a/hw/i386/kvm/xen_overlay.h
+++ b/hw/i386/kvm/xen_overlay.h
@@ -12,3 +12,7 @@
 void xen_overlay_create(void);
 int xen_overlay_map_page(uint32_t space, uint64_t idx, uint64_t gpa);
 void *xen_overlay_page_ptr(uint32_t space, uint64_t idx);
+
+int xen_sync_long_mode(void);
+int xen_set_long_mode(bool long_mode);
+bool xen_is_long_mode(void);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 9026fd3eb6..11e34ed125 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -17,7 +17,7 @@
 #include "xen-emu.h"
 #include "xen.h"
 #include "trace.h"
-
+#include "hw/i386/kvm/xen_overlay.h"
 #include "standard-headers/xen/version.h"
 
 static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
@@ -157,6 +157,14 @@ int kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit 
*exit)
 if (exit->type != KVM_EXIT_XEN_HCALL)
 return -1;
 
+/* The kernel latches the guest 32/64 mode when the MSR is used to fill
+ * the hypercall page. So if we see a hypercall in a mode that doesn't
+ * match our own idea of the guest mode, fetch the kernel's

[RFC PATCH v3 06/38] xen-platform: exclude vfio-pci from the PCI platform unplug

From: Joao Martins 

Such that PCI passthrough devices work for Xen emulated guests.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hw/i386/xen/xen_platform.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
index a64265cca0..a6f0fb478a 100644
--- a/hw/i386/xen/xen_platform.c
+++ b/hw/i386/xen/xen_platform.c
@@ -109,12 +109,25 @@ static void log_writeb(PCIXenPlatformState *s, char val)
 #define _UNPLUG_NVME_DISKS 3
 #define UNPLUG_NVME_DISKS (1u << _UNPLUG_NVME_DISKS)
 
+static bool pci_device_is_passthrough(PCIDevice *d)
+{
+if (!strcmp(d->name, "xen-pci-passthrough")) {
+return true;
+}
+
+if (xen_mode == XEN_EMULATE && !strcmp(d->name, "vfio-pci")) {
+return true;
+}
+
+return false;
+}
+
 static void unplug_nic(PCIBus *b, PCIDevice *d, void *o)
 {
 /* We have to ignore passthrough devices */
 if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
 PCI_CLASS_NETWORK_ETHERNET
-&& strcmp(d->name, "xen-pci-passthrough") != 0) {
+&& !pci_device_is_passthrough(d)) {
 object_unparent(OBJECT(d));
 }
 }
@@ -187,9 +200,8 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
*opaque)
 !(flags & UNPLUG_IDE_SCSI_DISKS);
 
 /* We have to ignore passthrough devices */
-if (!strcmp(d->name, "xen-pci-passthrough")) {
+if (pci_device_is_passthrough(d))
 return;
-}
 
 switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
 case PCI_CLASS_STORAGE_IDE:
-- 
2.35.3

[RFC PATCH v3 25/38] hw/xen: Implement EVTCHNOP_status

From: David Woodhouse 

This adds the basic structure for maintaining the port table and reporting
the status of ports therein.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 108 +-
 hw/i386/kvm/xen_evtchn.h  |   4 ++
 target/i386/kvm/xen-emu.c |  21 +++-
 3 files changed, 130 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 1ca0c034e7..77acf58540 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -22,6 +22,7 @@
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
 #include "xen_evtchn.h"
+#include "xen_overlay.h"
 
 #include "sysemu/kvm.h"
 #include 
@@ -32,12 +33,34 @@
 #define TYPE_XEN_EVTCHN "xenevtchn"
 OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
 
+typedef struct XenEvtchnPort {
+uint32_t vcpu;  /* Xen/ACPI vcpu_id */
+uint16_t type;  /* EVTCHNSTAT_ */
+uint16_t type_val;  /* pirq# / virq# / remote port according to type */
+} XenEvtchnPort;
+
+#define COMPAT_EVTCHN_2L_NR_CHANNELS1024
+
+/*
+ * For unbound/interdomain ports there are only two possible remote
+ * domains; self and QEMU. Use a single high bit in type_val for that,
+ * and the low bits for the remote port number (or 0 for unbound).
+ */
+#define PORT_INFO_TYPEVAL_REMOTE_QEMU   0x8000
+#define PORT_INFO_TYPEVAL_REMOTE_PORT_MASK  0x7FFF
+
+#define DOMID_QEMU  0
+
 struct XenEvtchnState {
 /*< private >*/
 SysBusDevice busdev;
 /*< public >*/
 
 uint64_t callback_param;
+
+QemuMutex port_lock;
+uint32_t nr_ports;
+XenEvtchnPort port_table[EVTCHN_2L_NR_CHANNELS];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -58,6 +81,18 @@ static bool xen_evtchn_is_needed(void *opaque)
 return xen_mode == XEN_EMULATE;
 }
 
+static const VMStateDescription xen_evtchn_port_vmstate = {
+.name = "xen_evtchn_port",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(vcpu, XenEvtchnPort),
+VMSTATE_UINT16(type, XenEvtchnPort),
+VMSTATE_UINT16(type_val, XenEvtchnPort),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription xen_evtchn_vmstate = {
 .name = "xen_evtchn",
 .version_id = 1,
@@ -66,6 +101,9 @@ static const VMStateDescription xen_evtchn_vmstate = {
 .post_load = xen_evtchn_post_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(callback_param, XenEvtchnState),
+VMSTATE_UINT32(nr_ports, XenEvtchnState),
+VMSTATE_STRUCT_VARRAY_UINT32(port_table, XenEvtchnState, nr_ports, 1,
+ xen_evtchn_port_vmstate, XenEvtchnPort),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -86,7 +124,10 @@ static const TypeInfo xen_evtchn_info = {
 
 void xen_evtchn_create(void)
 {
-xen_evtchn_singleton = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN, 
-1, NULL));
+XenEvtchnState *s = XEN_EVTCHN(sysbus_create_simple(TYPE_XEN_EVTCHN, -1, 
NULL));
+xen_evtchn_singleton = s;
+
+qemu_mutex_init(&s->port_lock);
 }
 
 static void xen_evtchn_register_types(void)
@@ -115,3 +156,68 @@ int xen_evtchn_set_callback_param(uint64_t param)
 }
 return ret;
 }
+
+static bool valid_port(evtchn_port_t port)
+{
+if (!port) {
+return false;
+}
+
+if (xen_is_long_mode()) {
+return port < EVTCHN_2L_NR_CHANNELS;
+} else {
+return port < COMPAT_EVTCHN_2L_NR_CHANNELS;
+}
+}
+
+int xen_evtchn_status_op(struct evtchn_status *status)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+XenEvtchnPort *p;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (status->dom != DOMID_SELF && status->dom != xen_domid)
+return -EPERM;
+
+if (!valid_port(status->port))
+return -EINVAL;
+
+qemu_mutex_lock(&s->port_lock);
+
+p = &s->port_table[status->port];
+
+status->status = p->type;
+status->vcpu = p->vcpu;
+
+switch (p->type) {
+case EVTCHNSTAT_unbound:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU)
+status->u.unbound.dom = DOMID_QEMU;
+else
+status->u.unbound.dom = xen_domid;
+break;
+
+case EVTCHNSTAT_interdomain:
+if (p->type_val & PORT_INFO_TYPEVAL_REMOTE_QEMU)
+status->u.interdomain.dom = DOMID_QEMU;
+else
+status->u.interdomain.dom = xen_domid;
+
+status->u.interdomain.port = p->type_val & 
PORT_INFO_TYPEVAL_REMOTE_PORT_MASK;
+break;
+
+case EVTCHNSTAT_pirq:
+status->u.pirq = p->type_val;
+break;
+
+case EVTCHNSTAT_virq:
+status->u.virq = p->type_val;
+break;
+}
+
+qemu_mutex_unlock(&s->port_lock);
+return 0;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 11c6ed22a0..6f50e5c52d 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -9,5 +9,9 @@
  * See the COPYING file in the top-level dir

[RFC PATCH v3 08/38] hw/xen_backend: refactor xen_be_init()

From: Joao Martins 

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 hw/xen/xen-legacy-backend.c | 40 +
 include/hw/xen/xen-legacy-backend.h |  3 +++
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index 085fd31ef7..694e7bbc54 100644
--- a/hw/xen/xen-legacy-backend.c
+++ b/hw/xen/xen-legacy-backend.c
@@ -676,17 +676,40 @@ void xenstore_update_fe(char *watch, struct 
XenLegacyDevice *xendev)
 }
 /*  */
 
-int xen_be_init(void)
+int xen_be_xenstore_open(void)
 {
-xengnttab_handle *gnttabdev;
-
 xenstore = xs_daemon_open();
 if (!xenstore) {
-xen_pv_printf(NULL, 0, "can't connect to xenstored\n");
 return -1;
 }
 
 qemu_set_fd_handler(xs_fileno(xenstore), xenstore_update, NULL, NULL);
+return 0;
+}
+
+void xen_be_xenstore_close(void)
+{
+qemu_set_fd_handler(xs_fileno(xenstore), NULL, NULL, NULL);
+xs_daemon_close(xenstore);
+xenstore = NULL;
+}
+
+void xen_be_sysdev_init(void)
+{
+xen_sysdev = qdev_new(TYPE_XENSYSDEV);
+sysbus_realize_and_unref(SYS_BUS_DEVICE(xen_sysdev), &error_fatal);
+xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus");
+qbus_set_bus_hotplug_handler(xen_sysbus);
+}
+
+int xen_be_init(void)
+{
+xengnttab_handle *gnttabdev;
+
+if (xen_be_xenstore_open()) {
+xen_pv_printf(NULL, 0, "can't connect to xenstored\n");
+return -1;
+}
 
 if (xen_xc == NULL || xen_fmem == NULL) {
 /* Check if xen_init() have been called */
@@ -701,17 +724,12 @@ int xen_be_init(void)
 xengnttab_close(gnttabdev);
 }
 
-xen_sysdev = qdev_new(TYPE_XENSYSDEV);
-sysbus_realize_and_unref(SYS_BUS_DEVICE(xen_sysdev), &error_fatal);
-xen_sysbus = qbus_new(TYPE_XENSYSBUS, xen_sysdev, "xen-sysbus");
-qbus_set_bus_hotplug_handler(xen_sysbus);
+xen_be_sysdev_init();
 
 return 0;
 
 err:
-qemu_set_fd_handler(xs_fileno(xenstore), NULL, NULL, NULL);
-xs_daemon_close(xenstore);
-xenstore = NULL;
+xen_be_xenstore_close();
 
 return -1;
 }
diff --git a/include/hw/xen/xen-legacy-backend.h 
b/include/hw/xen/xen-legacy-backend.h
index be281e1f38..0aa171f6c2 100644
--- a/include/hw/xen/xen-legacy-backend.h
+++ b/include/hw/xen/xen-legacy-backend.h
@@ -42,6 +42,9 @@ int xenstore_read_fe_uint64(struct XenLegacyDevice *xendev, 
const char *node,
 void xen_be_check_state(struct XenLegacyDevice *xendev);
 
 /* xen backend driver bits */
+int xen_be_xenstore_open(void);
+void xen_be_xenstore_close(void);
+void xen_be_sysdev_init(void);
 int xen_be_init(void);
 void xen_be_register_common(void);
 int xen_be_register(const char *type, struct XenDevOps *ops);
-- 
2.35.3

[RFC PATCH v3 21/38] i386/xen: implement HVMOP_set_evtchn_upcall_vector

From: Ankur Arora 

The HVMOP_set_evtchn_upcall_vector hypercall sets the per-vCPU upcall
vector, to be delivered to the local APIC just like an MSI (with an EOI).

This takes precedence over the system-wide delivery method set by the
HVMOP_set_param hypercall with HVM_PARAM_CALLBACK_IRQ. It's used by
Windows and Xen (PV shim) guests but normally not by Linux.

Signed-off-by: Ankur Arora 
Signed-off-by: Joao Martins 
[dwmw2: Rework for upstream kernel changes and split from HVMOP_set_param]
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h|  1 +
 target/i386/kvm/kvm.c|  7 
 target/i386/kvm/trace-events |  1 +
 target/i386/kvm/xen-emu.c| 67 +---
 target/i386/kvm/xen-emu.h|  1 +
 target/i386/machine.c|  4 ++-
 6 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index bf44a87ddb..938a1b9c8b 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1792,6 +1792,7 @@ typedef struct CPUArchState {
 uint64_t xen_vcpu_info_default_gpa;
 uint64_t xen_vcpu_time_info_gpa;
 uint64_t xen_vcpu_runstate_gpa;
+uint8_t xen_vcpu_callback_vector;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8aee95d4c1..cbf41d6f81 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4757,6 +4757,13 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 return ret;
 }
 }
+
+if (x86_cpu->env.xen_vcpu_callback_vector) {
+ret = kvm_xen_set_vcpu_callback_vector(cpu);
+if (ret < 0) {
+return ret;
+}
+}
 }
 #endif
 
diff --git a/target/i386/kvm/trace-events b/target/i386/kvm/trace-events
index 14e54dfca5..6133f6dd9e 100644
--- a/target/i386/kvm/trace-events
+++ b/target/i386/kvm/trace-events
@@ -10,3 +10,4 @@ kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
 kvm_xen_hypercall(int cpu, uint8_t cpl, uint64_t input, uint64_t a0, uint64_t 
a1, uint64_t a2, uint64_t ret) "xen_hypercall: cpu %d cpl %d input %" PRIu64 " 
a0 0x%" PRIx64 " a1 0x%" PRIx64 " a2 0x%" PRIx64" ret 0x%" PRIx64
 kvm_xen_set_shared_info(uint64_t gfn) "shared info at gfn 0x%" PRIx64
 kvm_xen_set_vcpu_attr(int cpu, int type, uint64_t gpa) "vcpu attr cpu %d type 
%d gpa 0x%" PRIx64
+kvm_xen_set_vcpu_callback(int cpu, int vector) "callback vcpu %d vector %d"
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 1297b37ee8..d35609563c 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -23,6 +23,7 @@
 #include "standard-headers/xen/version.h"
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/hvm_op.h"
+#include "standard-headers/xen/hvm/params.h"
 #include "standard-headers/xen/vcpu.h"
 
 #include "xen-compat.h"
@@ -145,7 +146,8 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 fi.submap |= 1 << XENFEAT_writable_page_tables |
  1 << XENFEAT_writable_descriptor_tables |
  1 << XENFEAT_auto_translated_physmap |
- 1 << XENFEAT_supervisor_mode_kernel;
+ 1 << XENFEAT_supervisor_mode_kernel |
+ 1 << XENFEAT_hvm_callback_vector;
 }
 
 err = kvm_copy_to_gva(CPU(cpu), arg, &fi, sizeof(fi));
@@ -172,6 +174,29 @@ int kvm_xen_set_vcpu_attr(CPUState *cs, uint16_t type, 
uint64_t gpa)
 return kvm_vcpu_ioctl(cs, KVM_XEN_VCPU_SET_ATTR, &xhsi);
 }
 
+int kvm_xen_set_vcpu_callback_vector(CPUState *cs)
+{
+uint8_t vector = X86_CPU(cs)->env.xen_vcpu_callback_vector;
+struct kvm_xen_vcpu_attr xva;
+
+xva.type = KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR;
+xva.u.vector = vector;
+
+trace_kvm_xen_set_vcpu_callback(cs->cpu_index, vector);
+
+return kvm_vcpu_ioctl(cs, KVM_XEN_HVM_SET_ATTR, &xva);
+}
+
+static void do_set_vcpu_callback_vector(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+env->xen_vcpu_callback_vector = data.host_int;
+
+kvm_xen_set_vcpu_callback_vector(cs);
+}
+
 static void do_set_vcpu_info_default_gpa(CPUState *cs, run_on_cpu_data data)
 {
 X86CPU *cpu = X86_CPU(cs);
@@ -385,17 +410,51 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 return true;
 }
 
+static int kvm_xen_hcall_evtchn_upcall_vector(struct kvm_xen_exit *exit,
+  X86CPU *cpu, uint64_t arg)
+{
+struct xen_hvm_evtchn_upcall_vector up;
+CPUState *target_cs;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(up) == 8);
+
+if (kvm_copy_from_gva(CPU(cpu), arg, &up, sizeof(up))) {
+return -EFAULT;
+}
+
+if (up.vector < 0x10) {
+return -EINVAL;
+}
+
+target_cs = qemu_get_cpu(up.vcpu);
+if (

[RFC PATCH v3 15/38] i386/xen: implement XENMEM_add_to_physmap_batch

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-compat.h | 24 +
 target/i386/kvm/xen-emu.c| 70 
 2 files changed, 94 insertions(+)

diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
index 0b7088662a..2925dcc7f6 100644
--- a/target/i386/kvm/xen-compat.h
+++ b/target/i386/kvm/xen-compat.h
@@ -15,6 +15,20 @@
 
 typedef uint32_t compat_pfn_t;
 typedef uint32_t compat_ulong_t;
+typedef uint32_t compat_ptr_t;
+
+#define __DEFINE_COMPAT_HANDLE(name, type)  \
+typedef struct {\
+compat_ptr_t c; \
+type *_[0] __attribute__((packed)); \
+} __compat_handle_ ## name; \
+
+#define DEFINE_COMPAT_HANDLE(name) __DEFINE_COMPAT_HANDLE(name, name)
+#define COMPAT_HANDLE(name) __compat_handle_ ## name
+
+DEFINE_COMPAT_HANDLE(compat_pfn_t);
+DEFINE_COMPAT_HANDLE(compat_ulong_t);
+DEFINE_COMPAT_HANDLE(int);
 
 struct compat_xen_add_to_physmap {
 domid_t domid;
@@ -24,4 +38,14 @@ struct compat_xen_add_to_physmap {
 compat_pfn_t gpfn;
 };
 
+struct compat_xen_add_to_physmap_batch {
+domid_t domid;
+uint16_t space;
+uint16_t size;
+uint16_t extra;
+COMPAT_HANDLE(compat_ulong_t) idxs;
+COMPAT_HANDLE(compat_pfn_t) gpfns;
+COMPAT_HANDLE(int) errs;
+};
+
 #endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 1fecab6e10..c23026b872 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -215,6 +215,72 @@ static int do_add_to_physmap(struct kvm_xen_exit *exit, 
X86CPU *cpu, uint64_t ar
 
 return add_to_physmap_one(xatp.space, xatp.idx, xatp.gpfn);
 }
+
+static int do_add_to_physmap_batch(struct kvm_xen_exit *exit, X86CPU *cpu,
+   uint64_t arg)
+{
+struct xen_add_to_physmap_batch xatpb;
+unsigned long idxs_gva, gpfns_gva, errs_gva;
+CPUState *cs = CPU(cpu);
+size_t op_sz;
+
+if (hypercall_compat32(exit->u.hcall.longmode)) {
+struct compat_xen_add_to_physmap_batch xatpb32;
+
+qemu_build_assert(sizeof(struct compat_xen_add_to_physmap_batch) == 
20);
+if (kvm_copy_from_gva(cs, arg, &xatpb32, sizeof(xatpb32))) {
+return -EFAULT;
+}
+xatpb.domid = xatpb32.domid;
+xatpb.space = xatpb32.space;
+xatpb.size = xatpb32.size;
+
+idxs_gva = xatpb32.idxs.c;
+gpfns_gva = xatpb32.gpfns.c;
+errs_gva = xatpb32.errs.c;
+op_sz = sizeof(uint32_t);;
+} else {
+if (kvm_copy_from_gva(cs, arg, &xatpb, sizeof(xatpb))) {
+return -EFAULT;
+}
+op_sz = sizeof(unsigned long);
+idxs_gva = (unsigned long)xatpb.idxs.p;
+gpfns_gva = (unsigned long)xatpb.gpfns.p;
+errs_gva = (unsigned long)xatpb.errs.p;
+}
+
+if (xatpb.domid != DOMID_SELF && xatpb.domid != xen_domid) {
+return -ESRCH;
+}
+
+/* Explicitly invalid for the batch op. Not that we implement it anyway. */
+if (xatpb.space == XENMAPSPACE_gmfn_range) {
+return -EINVAL;
+}
+
+while (xatpb.size--) {
+unsigned long idx = 0;
+unsigned long gpfn = 0;
+int err;
+
+/* For 32-bit compat this only copies the low 32 bits of each */
+if (kvm_copy_from_gva(cs, idxs_gva, &idx, op_sz) ||
+kvm_copy_from_gva(cs, gpfns_gva, &gpfn, op_sz)) {
+return -EFAULT;
+}
+idxs_gva += op_sz;
+gpfns_gva += op_sz;
+
+err = add_to_physmap_one(xatpb.space, idx, gpfn);
+
+if (kvm_copy_to_gva(cs, errs_gva, &err, sizeof(err))) {
+return -EFAULT;
+}
+errs_gva += sizeof(err);
+}
+return 0;
+}
+
 static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit *exit, X86CPU *cpu,
int cmd, uint64_t arg)
 {
@@ -225,6 +291,10 @@ static bool kvm_xen_hcall_memory_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = do_add_to_physmap(exit, cpu, arg);
 break;
 
+case XENMEM_add_to_physmap_batch:
+err = do_add_to_physmap_batch(exit, cpu, arg);
+break;
+
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 11/38] hw/xen: Add xen_overlay device for emulating shared xenheap pages

From: David Woodhouse 

For the shared info page and for grant tables, Xen shares its own pages
from the "Xen heap" to the guest. The guest requests that a given page
from a certain address space (XENMAPSPACE_shared_info, etc.) be mapped
to a given GPA using the XENMEM_add_to_physmap hypercall.

To support that in qemu when *emulating* Xen, create a memory region
(migratable) and allow it to be mapped as an overlay when requested.

Xen theoretically allows the same page to be mapped multiple times
into the guest, but that's hard to track and reinstate over migration,
so we automatically *unmap* any previous mapping when creating a new
one. This approach has been used in production with a non-trivial
number of guests expecting true Xen, without any problems yet being
noticed.

This adds just the shared info page for now. The grant tables will be
a larger region, and will need to be overlaid one page at a time. I
think that means I need to create separate aliases for each page of
the overall grant_frames region, so that they can be mapped individually.

Expecting some heckling at the use of xen_overlay_singleton. What is
the best way to do that? Using qemu_find_recursive() every time seemed
a bit wrong. But I suppose mapping it into the *guest* isn't a fast
path, and if the actual grant table code is allowed to just stash the
pointer it gets from xen_overlay_page_ptr() for later use then that
isn't a fast path for device I/O either.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/meson.build   |   1 +
 hw/i386/kvm/xen_overlay.c | 198 ++
 hw/i386/kvm/xen_overlay.h |  14 +++
 3 files changed, 213 insertions(+)
 create mode 100644 hw/i386/kvm/xen_overlay.c
 create mode 100644 hw/i386/kvm/xen_overlay.h

diff --git a/hw/i386/kvm/meson.build b/hw/i386/kvm/meson.build
index 95467f1ded..6165cbf019 100644
--- a/hw/i386/kvm/meson.build
+++ b/hw/i386/kvm/meson.build
@@ -4,5 +4,6 @@ i386_kvm_ss.add(when: 'CONFIG_APIC', if_true: files('apic.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8254', if_true: files('i8254.c'))
 i386_kvm_ss.add(when: 'CONFIG_I8259', if_true: files('i8259.c'))
 i386_kvm_ss.add(when: 'CONFIG_IOAPIC', if_true: files('ioapic.c'))
+i386_kvm_ss.add(when: 'CONFIG_XEN_EMU', if_true: files('xen_overlay.c'))
 
 i386_ss.add_all(when: 'CONFIG_KVM', if_true: i386_kvm_ss)
diff --git a/hw/i386/kvm/xen_overlay.c b/hw/i386/kvm/xen_overlay.c
new file mode 100644
index 00..2ae54e1a88
--- /dev/null
+++ b/hw/i386/kvm/xen_overlay.c
@@ -0,0 +1,198 @@
+/*
+ * QEMU Xen emulation: Shared/overlay pages support
+ *
+ * Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Authors: David Woodhouse 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
+#include "qemu/module.h"
+#include "qemu/main-loop.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "exec/target_page.h"
+#include "exec/address-spaces.h"
+#include "migration/vmstate.h"
+
+#include "hw/sysbus.h"
+#include "hw/xen/xen.h"
+#include "xen_overlay.h"
+
+#include "sysemu/kvm.h"
+#include 
+
+#include "standard-headers/xen/memory.h"
+
+static int xen_overlay_map_page_locked(uint32_t space, uint64_t idx, uint64_t 
gpa);
+
+#define INVALID_GPA UINT64_MAX
+#define INVALID_GFN UINT64_MAX
+
+#define TYPE_XEN_OVERLAY "xenoverlay"
+OBJECT_DECLARE_SIMPLE_TYPE(XenOverlayState, XEN_OVERLAY)
+
+#define XEN_PAGE_SHIFT 12
+#define XEN_PAGE_SIZE (1ULL << XEN_PAGE_SHIFT)
+
+struct XenOverlayState {
+/*< private >*/
+SysBusDevice busdev;
+/*< public >*/
+
+MemoryRegion shinfo_mem;
+void *shinfo_ptr;
+uint64_t shinfo_gpa;
+};
+
+struct XenOverlayState *xen_overlay_singleton;
+
+static void xen_overlay_realize(DeviceState *dev, Error **errp)
+{
+XenOverlayState *s = XEN_OVERLAY(dev);
+
+if (xen_mode != XEN_EMULATE) {
+error_setg(errp, "Xen overlay page support is for Xen emulation");
+return;
+}
+
+memory_region_init_ram(&s->shinfo_mem, OBJECT(dev), "xen:shared_info", 
XEN_PAGE_SIZE, &error_abort);
+memory_region_set_enabled(&s->shinfo_mem, true);
+s->shinfo_ptr = memory_region_get_ram_ptr(&s->shinfo_mem);
+s->shinfo_gpa = INVALID_GPA;
+memset(s->shinfo_ptr, 0, XEN_PAGE_SIZE);
+}
+
+static int xen_overlay_post_load(void *opaque, int version_id)
+{
+XenOverlayState *s = opaque;
+
+if (s->shinfo_gpa != INVALID_GPA) {
+xen_overlay_map_page_locked(XENMAPSPACE_shared_info, 0, s->shinfo_gpa);
+}
+
+return 0;
+}
+
+static bool xen_overlay_is_needed(void *opaque)
+{
+return xen_mode == XEN_EMULATE;
+}
+
+static const VMStateDescription xen_overlay_vmstate = {
+.name = "xen_overlay",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = xen_overlay_is_needed,
+.post_load = xen_overlay_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64

[RFC PATCH v3 28/38] hw/xen: Implement EVTCHNOP_bind_virq

From: David Woodhouse 

Add the array of virq ports to each vCPU so that we can deliver timers,
debug ports, etc. Global virqs are allocated against vCPU 0 initially,
but can be migrated to other vCPUs (when we implement that).

The kernel needs to know about VIRQ_TIMER in order to accelerate timers,
so tell it via KVM_XEN_VCPU_ATTR_TYPE_TIMER.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 83 +++
 hw/i386/kvm/xen_evtchn.h  |  2 +
 include/sysemu/kvm_xen.h  |  1 +
 target/i386/cpu.h |  3 ++
 target/i386/kvm/xen-emu.c | 61 
 target/i386/machine.c |  1 +
 6 files changed, 151 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 50adef0864..19b8eb7a6f 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -208,6 +208,11 @@ static bool valid_port(evtchn_port_t port)
 }
 }
 
+static bool valid_vcpu(uint32_t vcpu)
+{
+return !!qemu_get_cpu(vcpu);
+}
+
 int xen_evtchn_status_op(struct evtchn_status *status)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
@@ -398,6 +403,20 @@ static int unmask_port(XenEvtchnState *s, evtchn_port_t 
port, bool do_unmask)
 }
 }
 
+static bool virq_is_global(uint32_t virq)
+{
+switch (virq) {
+case VIRQ_TIMER:
+case VIRQ_DEBUG:
+case VIRQ_XENOPROF:
+case VIRQ_XENPMU:
+return false;
+
+default:
+return true;
+}
+}
+
 static void free_port(XenEvtchnState *s, evtchn_port_t port)
 {
 s->port_table[port].type = EVTCHNSTAT_closed;
@@ -411,6 +430,28 @@ static void free_port(XenEvtchnState *s, evtchn_port_t 
port)
 }
 }
 
+static int allocate_port(XenEvtchnState *s, uint32_t vcpu, uint16_t type,
+ uint16_t val, evtchn_port_t *port)
+{
+evtchn_port_t p = 1;
+
+for (p = 1; valid_port(p); p++) {
+if (s->port_table[p].type == EVTCHNSTAT_closed) {
+s->port_table[p].vcpu = vcpu;
+s->port_table[p].type = type;
+s->port_table[p].type_val = val;
+
+*port = p;
+
+if (s->nr_ports < p + 1)
+s->nr_ports = p + 1;
+
+return 0;
+}
+}
+return -ENOSPC;
+}
+
 static int close_port(XenEvtchnState *s, evtchn_port_t port)
 {
 XenEvtchnPort *p = &s->port_table[port];
@@ -419,6 +460,11 @@ static int close_port(XenEvtchnState *s, evtchn_port_t 
port)
 case EVTCHNSTAT_closed:
 return -ENOENT;
 
+case EVTCHNSTAT_virq:
+kvm_xen_set_vcpu_virq(virq_is_global(p->type_val) ? 0 : p->vcpu,
+  p->type_val, 0);
+break;
+
 default:
 break;
 }
@@ -470,3 +516,40 @@ int xen_evtchn_unmask_op(struct evtchn_unmask *unmask)
 
 return ret;
 }
+
+int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
+{
+XenEvtchnState *s = xen_evtchn_singleton;
+int ret;
+
+if (!s) {
+return -ENOTSUP;
+}
+
+if (virq->virq >= NR_VIRQS) {
+return -EINVAL;
+}
+
+/* Global VIRQ must be allocated on vCPU0 first */
+if (virq_is_global(virq->virq) && virq->vcpu != 0) {
+return -EINVAL;
+}
+
+if (!valid_vcpu(virq->vcpu)) {
+return -ENOENT;
+}
+
+qemu_mutex_lock(&s->port_lock);
+
+ret = allocate_port(s, virq->vcpu, EVTCHNSTAT_virq, virq->virq, 
&virq->port);
+if (!ret) {
+ret = kvm_xen_set_vcpu_virq(virq->vcpu, virq->virq, virq->port);
+if (ret) {
+free_port(s, virq->port);
+}
+}
+
+qemu_mutex_unlock(&s->port_lock);
+
+return ret;
+}
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 2fb7d70043..ffddd87bdc 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -16,6 +16,8 @@ int xen_evtchn_set_callback_param(uint64_t param);
 struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
+struct evtchn_bind_virq;
 int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
+int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
diff --git a/include/sysemu/kvm_xen.h b/include/sysemu/kvm_xen.h
index ab629feb13..e5b14ffe8d 100644
--- a/include/sysemu/kvm_xen.h
+++ b/include/sysemu/kvm_xen.h
@@ -14,5 +14,6 @@
 
 void *kvm_xen_get_vcpu_info_hva(uint32_t vcpu_id);
 void kvm_xen_inject_vcpu_callback_vector(uint32_t vcpu_id);
+int kvm_xen_set_vcpu_virq(uint32_t vcpu_id, uint16_t virq, uint16_t port);
 
 #endif /* QEMU_SYSEMU_KVM_XEN_H */
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 938a1b9c8b..846c738fd7 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -27,6 +27,8 @@
 #include "qapi/qapi-types-common.h"
 #include "qemu/cpu-float.h"
 
+#define XEN_NR_VIRQS 24
+
 /* The x86 has a strong memory model with some store-after-load re-ordering */
 #define TCG_GUEST_DEFAULT_MO  (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
@@ -1793,6 +1795,7 @@ typed

[RFC PATCH v3 03/38] xen: Add XEN_DISABLED mode and make it default

From: David Woodhouse 

Also check for XEN_ATTACH mode in xen_init()

Suggested-by: Paolo Bonzini 
Signed-off-by: David Woodhouse 
---
 accel/xen/xen-all.c  | 4 
 include/hw/xen/xen.h | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/accel/xen/xen-all.c b/accel/xen/xen-all.c
index 69aa7d018b..109d2e84bc 100644
--- a/accel/xen/xen-all.c
+++ b/accel/xen/xen-all.c
@@ -158,6 +158,10 @@ static int xen_init(MachineState *ms)
 {
 MachineClass *mc = MACHINE_GET_CLASS(ms);
 
+if (xen_mode != XEN_ATTACH) {
+xen_pv_printf(NULL, 0, "xen requires --xen-attach mode\n");
+return -1;
+}
 xen_xc = xc_interface_open(0, 0, 0);
 if (xen_xc == NULL) {
 xen_pv_printf(NULL, 0, "can't open xen interface\n");
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index afdf9c436a..82347e76a4 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -12,8 +12,9 @@
 
 /* xen-machine.c */
 enum xen_mode {
-XEN_EMULATE = 0,  // xen emulation, using xenner (default)
-XEN_ATTACH// attach to xen domain created by libxl
+XEN_DISABLED = 0, // xen support disabled (default)
+XEN_ATTACH,   // attach to xen domain created by libxl
+XEN_EMULATE,
 };
 
 extern uint32_t xen_domid;
-- 
2.35.3

[RFC PATCH v3 38/38] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_PCI_INTX callback

From: David Woodhouse 

The guest is permitted to specify an arbitrary domain/bus/device/function
and INTX pin from which the callback IRQ shall appear to have come.

In QEMU we can only easily do this for devices that actually exist, and
even that requires us "knowing" that it's a PCMachine in order to find
the PCI root bus — although that's OK really because it's always true.

We also don't get to get notified of INTX routing changes, because we
can't do that as a passive observer; if we try to register a notifier
it will overwrite any existing notifier callback on the device.

But in practice, guests using PCI_INTX will only ever use pin A on the
Xen platform device, and won't swizzle the INTX routing after they set
it up. So this is just fine.

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c | 70 +---
 1 file changed, 58 insertions(+), 12 deletions(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 8ea8cf550e..2852b46b45 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -25,6 +25,8 @@
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
 #include "hw/i386/x86.h"
+#include "hw/i386/pc.h"
+#include "hw/pci/pci.h"
 #include "hw/irq.h"
 
 #include "xen_evtchn.h"
@@ -100,6 +102,7 @@ struct XenEvtchnState {
 /*< public >*/
 
 uint64_t callback_param;
+uint32_t callback_gsi;
 
 QemuMutex port_lock;
 uint32_t nr_ports;
@@ -201,11 +204,50 @@ static void xen_evtchn_register_types(void)
 
 type_init(xen_evtchn_register_types)
 
+static int set_callback_pci_intx(XenEvtchnState *s, uint64_t param)
+{
+PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+uint8_t pin = param & 3;
+uint8_t devfn = (param >> 8) & 0xff;
+uint16_t bus = (param >> 16) & 0x;
+uint16_t domain = (param >> 32) & 0x;
+PCIDevice *pdev;
+PCIINTxRoute r;
+
+if (domain || !pcms)
+return 0;
+
+pdev = pci_find_device(pcms->bus, bus, devfn);
+if (!pdev) {
+return 0;
+}
+
+r = pci_device_route_intx_to_irq(pdev, pin);
+if (r.mode != PCI_INTX_ENABLED) {
+return 0;
+}
+
+/*
+ * Hm, can we be notified of INTX routing changes? Not without
+ * *owning* the device and being allowed to overwrite its own
+ * ->intx_routing_notifier, AFAICT. So let's not.
+ */
+return r.irq;
+}
+
+static void xen_evtchn_set_callback_level(XenEvtchnState *s, int level)
+{
+if (s->callback_gsi && s->callback_gsi < GSI_NUM_PINS) {
+qemu_set_irq(s->gsis[s->callback_gsi], level);
+}
+}
+
 #define CALLBACK_VIA_TYPE_SHIFT   56
 
 int xen_evtchn_set_callback_param(uint64_t param)
 {
 XenEvtchnState *s = xen_evtchn_singleton;
+uint32_t gsi = 0;
 int ret = -ENOSYS;
 
 if (!s) {
@@ -220,31 +262,35 @@ int xen_evtchn_set_callback_param(uint64_t param)
 };
 
 ret = kvm_vm_ioctl(kvm_state, KVM_XEN_HVM_SET_ATTR, &xa);
+gsi = 0;
 break;
 }
 case HVM_PARAM_CALLBACK_TYPE_GSI:
+gsi = (uint32_t)param;
 ret = 0;
 break;
+
+case HVM_PARAM_CALLBACK_TYPE_PCI_INTX:
+gsi = set_callback_pci_intx(s, param);
+ret = gsi ? 0 : -EINVAL;
+break;
 }
 
 if (!ret) {
 s->callback_param = param;
-}
-
-return ret;
-}
+if (gsi != s->callback_gsi) {
+struct vcpu_info *vi = kvm_xen_get_vcpu_info_hva(0);
 
-static void xen_evtchn_set_callback_level(XenEvtchnState *s, int level)
-{
-uint32_t param = (uint32_t)s->callback_param;
+xen_evtchn_set_callback_level(s, 0);
+s->callback_gsi = gsi;
 
-switch (s->callback_param >> CALLBACK_VIA_TYPE_SHIFT) {
-case HVM_PARAM_CALLBACK_TYPE_GSI:
-if (param < GSI_NUM_PINS) {
-qemu_set_irq(s->gsis[param], level);
+if (gsi && vi && vi->evtchn_upcall_pending) {
+xen_evtchn_set_callback_level(s, 1);
+}
 }
-break;
 }
+
+return ret;
 }
 
 static void inject_callback(XenEvtchnState *s, uint32_t vcpu)
-- 
2.35.3

[RFC PATCH v3 00/38] Xen HVM support under KVM

Xen guests actually boot now. No PV drivers, as there's no grant table
or xenstore yet. But event channel IPIs are working, as are in-kernel
vCPU timers.

Moderately unhappy with having to poll for the GSI callback going down,
because we don't have a hook on the PIC EOI. If I can fix that for VFIO
while I'm at it, I may investigate further. I note that VFIO does seem
to use pci_device_route_intx_to_irq() and know the actual target GSI#,
which means that a trivial hook based on the GSI# might be feasible.

Next up, timers (which actually work with a new enough kernel where it's
all offloaded, but even then they need migration support). Then grant
tables, at which point it's time to work out how to provide a xenstore
implementation. I quite like the idea of that being purely internal,
but don't fancy adding *that* much code to qemu so will probably hook
up to an existing external xenstored. 

Still need to fix up that platform PCI patch to call pam_update() to
change UMB mode.

  qemu-system-x86_64 -serial mon:stdio -accel kvm,xen-version=0x4000a \
 -device xen-platform -cpu host,+xen-vapic  -display none \
 -kernel /boot/vmlinuz-5.17.8-200.fc35.x86_64 \
 -append "console=ttyS0,115200 earlyprintk=ttyS0,115200" \
 --trace "kvm_xen*"

v3:

 • Switch back to xen-version as KVM accelerator property, other review
   feedback and bug fixes.

 • Fix Hyper-V coexistence (ick, calling kvm_xen_init() again because
   hyperv_enabled() doesn't return the right answer the first time).

 • Implement event channel support, including GSI/PCI_INTX callback.

 • Implement 32-bit guest support.

v2: 
https://lore.kernel.org/qemu-devel/20221209095612.689243-1-dw...@infradead.org/

 • Attempt to implement migration support; every Xen enlightenment is
   now recorded either from vmstate_x86_cpu or from a new sysdev device
   created for that purpose. And — I believe — correctly restored, in
   the right order, on vmload.

 • The shared_info page is created as a proper overlay instead of abusing
   the underlying guest page. This is important because Windows doesn't
   even select a GPA which had RAM behind it beforehand. This will be
   extended to handle the grant frames too, in the fullness of time.

 • Set vCPU attributes from the correct vCPU thread to avoid deadlocks.

 • Carefully copy the entire hypercall argument structure from userspace
   instead of assuming that it's contiguous in HVA space.

 • Distinguish between "handled but intentionally returns -ENOSYS" and
   "no idea what that was" in hypercalls, allowing us to emit a
   GUEST_ERROR (actually, shouldn't that change to UNIMP?) on the
   latter. Experience shows that to we'll end up having to intentionally
   return -ENOSYS to a bunch of weird crap that ancient guests still
   attempt to use, including XenServer local hacks that nobody even
   remembers what they were (hvmop 0x101, anyone? Some old Windows
   PV driver appears to be trying to use it...).

 * Drop the '+xen' CPU property and present Xen CPUID instead of KVM
   unconditionally when running in Xen mode. Make the Xen CPUID coexist
   with Hyper-V CPUID as it should, though.

 • Add XEN_EMU and XENFV_MACHINE (the latter to be XEN_EMU||XEN) config
   options. Some more work on this, and the incestuous relationships
   between the KVM target code and the 'platform' code, is going to be
   required but it's probably better to get on with implementing the
   real code so we can see those interactions in all their glory,
   before losing too much sleep over the details here.

 • Drop the GSI-2 hack, and also the patch which made the PCI platform
   device have real RAM (which isn't needed now we have overlays, qv).

 • Drop the XenState and XenVcpuState from KVMState and CPUArchState
   respectively. The Xen-specific fields are natively included in
   CPUArchState now though, for migration purposes. And we don't
   keep a host pointer to the shared_info or vcpu_info at all any
   more. With the kernel doing everything for us, we don't actually
   need them.

v1: 
https://lore.kernel.org/qemu-devel/20221205173137.607044-1-dw...@infradead.org/T/

v0: https://github.com/jpemartins/qemu/commits/xen-shim-rfc (Joao et al.)

Ankur Arora (2):
  i386/xen: implement HVMOP_set_evtchn_upcall_vector
  i386/xen: HVMOP_set_param / HVM_PARAM_CALLBACK_IRQ

David Woodhouse (20):
  xen: add CONFIG_XENFV_MACHINE and CONFIG_XEN_EMU options for Xen emulation
  xen: Add XEN_DISABLED mode and make it default
  i386/kvm: Add xen-version machine property and init KVM Xen support
  hw/xen: Add xen_overlay device for emulating shared xenheap pages
  i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode
  i386/xen: manage and save/restore Xen guest long_mode setting
  i386/xen: implement XENMEM_add_to_physmap_batch
  hw/xen: Implement EVTCHNOP_status
  hw/xen: Implement EVTCHNOP_close
  hw/xen: Implement EVTCHNOP_unmask
  hw/xen: Implement EV

[RFC PATCH v3 12/38] i386/xen: add pc_machine_kvm_type to initialize XEN_EMULATE mode

From: David Woodhouse 

The xen_overlay device (and later similar devices for event channels and
grant tables) need to be instantiated. Do this from a kvm_type method on
the PC machine derivatives, since KVM is only way to support Xen emulation
for now.

Signed-off-by: David Woodhouse 
---
 hw/i386/pc.c | 11 +++
 include/hw/i386/pc.h |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 546b703cb4..f1780daa4c 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -89,6 +89,7 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/virtio/virtio-pmem-pci.h"
 #include "hw/virtio/virtio-mem-pci.h"
+#include "hw/i386/kvm/xen_overlay.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "target/i386/cpu.h"
@@ -1842,6 +1843,16 @@ static void pc_machine_initfn(Object *obj)
 cxl_machine_init(obj, &pcms->cxl_devices_state);
 }
 
+int pc_machine_kvm_type(MachineState *machine, const char *kvm_type)
+{
+#ifdef CONFIG_XEN_EMU
+if (xen_mode == XEN_EMULATE) {
+xen_overlay_create();
+}
+#endif
+return 0;
+}
+
 static void pc_machine_reset(MachineState *machine, ShutdownCause reason)
 {
 CPUState *cs;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index c95333514e..e82224857e 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -290,12 +290,15 @@ extern const size_t pc_compat_1_5_len;
 extern GlobalProperty pc_compat_1_4[];
 extern const size_t pc_compat_1_4_len;
 
+extern int pc_machine_kvm_type(MachineState *machine, const char *vm_type);
+
 #define DEFINE_PC_MACHINE(suffix, namestr, initfn, optsfn) \
 static void pc_machine_##suffix##_class_init(ObjectClass *oc, void *data) \
 { \
 MachineClass *mc = MACHINE_CLASS(oc); \
 optsfn(mc); \
 mc->init = initfn; \
+mc->kvm_type = pc_machine_kvm_type; \
 } \
 static const TypeInfo pc_machine_type_##suffix = { \
 .name   = namestr TYPE_MACHINE_SUFFIX, \
-- 
2.35.3

[RFC PATCH v3 36/38] i386/xen: Implement SCHEDOP_poll

From: David Woodhouse 

Just a dummy implementation which will sched_yield(), but it's enough to
stop the Linux guest panicking when running on a host kernel which doesn't
intercept SCHEDOP_poll and lets it reach userspace.

Signed-off-by: David Woodhouse 
---
 target/i386/kvm/xen-emu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 055afba627..a8c953e3ca 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -919,6 +919,17 @@ static bool kvm_xen_hcall_sched_op(struct kvm_xen_exit 
*exit, X86CPU *cpu,
 err = schedop_shutdown(cs, arg);
 break;
 
+case SCHEDOP_poll:
+/*
+ * Linux will panic if this doesn't work. Just yield; it's not
+ * worth overthinking it because wWith event channel handling
+ * in KVM, the kernel will intercept this and it will never
+ * reach QEMU anyway.
+ */
+sched_yield();
+err = 0;
+break;
+
 default:
 return false;
 }
-- 
2.35.3

[RFC PATCH v3 19/38] i386/xen: handle VCPUOP_register_vcpu_time_info

From: Joao Martins 

In order to support Linux vdso in Xen.

Signed-off-by: Joao Martins 
Signed-off-by: David Woodhouse 
---
 target/i386/cpu.h |  1 +
 target/i386/kvm/kvm.c |  9 
 target/i386/kvm/xen-emu.c | 86 +--
 target/i386/machine.c |  4 +-
 4 files changed, 87 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 109b2e5669..96c2d0d5cb 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1790,6 +1790,7 @@ typedef struct CPUArchState {
 struct kvm_nested_state *nested_state;
 uint64_t xen_vcpu_info_gpa;
 uint64_t xen_vcpu_info_default_gpa;
+uint64_t xen_vcpu_time_info_gpa;
 #endif
 #if defined(CONFIG_HVF)
 HVFX86LazyFlags hvf_lflags;
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8affe1eeae..766e0add13 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -1804,6 +1804,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
 
 env->xen_vcpu_info_gpa = UINT64_MAX;
 env->xen_vcpu_info_default_gpa = UINT64_MAX;
+env->xen_vcpu_time_info_gpa = UINT64_MAX;
 
 if (cs->kvm_state->xen_version) {
 #ifdef CONFIG_XEN_EMU
@@ -4739,6 +4740,14 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
 return ret;
 }
 }
+
+gpa = x86_cpu->env.xen_vcpu_time_info_gpa;
+if (gpa != UINT64_MAX) {
+ret = kvm_xen_set_vcpu_attr(cpu, 
KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO, gpa);
+if (ret < 0) {
+return ret;
+}
+}
 }
 #endif
 
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 25c48248ce..b45d5af7d7 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -33,27 +33,40 @@
 #define hypercall_compat32(longmode) (false)
 #endif
 
-static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
-  bool is_write)
+static bool kvm_gva_to_gpa(CPUState *cs, uint64_t gva, uint64_t *gpa,
+   size_t *len, bool is_write)
 {
-uint8_t *buf = (uint8_t *)_buf;
-int ret;
-
-while (sz) {
 struct kvm_translation tr = {
 .linear_address = gva,
 };
 
-size_t len = TARGET_PAGE_SIZE - (tr.linear_address & 
~TARGET_PAGE_MASK);
-if (len > sz)
-len = sz;
+if (len) {
+*len = TARGET_PAGE_SIZE - (gva & ~TARGET_PAGE_MASK);
+}
+
+if (kvm_vcpu_ioctl(cs, KVM_TRANSLATE, &tr) || !tr.valid ||
+(is_write && !tr.writeable)) {
+return false;
+}
+*gpa = tr.physical_address;
+return true;
+}
+
+static int kvm_gva_rw(CPUState *cs, uint64_t gva, void *_buf, size_t sz,
+  bool is_write)
+{
+uint8_t *buf = (uint8_t *)_buf;
+uint64_t gpa;
+size_t len;
 
-ret = kvm_vcpu_ioctl(cs, KVM_TRANSLATE, &tr);
-if (ret || !tr.valid || (is_write && !tr.writeable)) {
+while (sz) {
+if (!kvm_gva_to_gpa(cs, gva, &gpa, &len, is_write)) {
 return -EFAULT;
 }
+if (len > sz)
+len = sz;
 
-cpu_physical_memory_rw(tr.physical_address, buf, len, is_write);
+cpu_physical_memory_rw(gpa, buf, len, is_write);
 
 buf += len;
 sz -= len;
@@ -184,6 +197,17 @@ static void do_set_vcpu_info_gpa(CPUState *cs, 
run_on_cpu_data data)
   env->xen_vcpu_info_gpa);
 }
 
+static void do_set_vcpu_time_info_gpa(CPUState *cs, run_on_cpu_data data)
+{
+X86CPU *cpu = X86_CPU(cs);
+CPUX86State *env = &cpu->env;
+
+env->xen_vcpu_time_info_gpa = data.host_ulong;
+
+kvm_xen_set_vcpu_attr(cs, KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO,
+  env->xen_vcpu_time_info_gpa);
+}
+
 static int xen_set_shared_info(uint64_t gfn)
 {
 uint64_t gpa = gfn << TARGET_PAGE_BITS;
@@ -389,6 +413,41 @@ static int vcpuop_register_vcpu_info(CPUState *cs, 
CPUState *target,
 return 0;
 }
 
+static int vcpuop_register_vcpu_time_info(CPUState *cs, CPUState *target,
+  uint64_t arg)
+{
+struct vcpu_register_time_memory_area tma;
+uint64_t gpa;
+size_t len;
+
+/* No need for 32/64 compat handling */
+qemu_build_assert(sizeof(tma) == 8);
+qemu_build_assert(sizeof(struct vcpu_time_info) == 32);
+
+if (!target)
+return -ENOENT;
+
+if (kvm_copy_from_gva(cs, arg, &tma, sizeof(tma))) {
+return -EFAULT;
+}
+
+/*
+ * Xen actually uses the GVA and does the translation through the guest
+ * page tables each time. But Linux/KVM uses the GPA, on the assumption
+ * that guests only ever use *global* addresses (kernel virtual addresses)
+ * for it. If Linux is changed to redo the GVA→GPA translation each time,
+ * it will offer a new vCPU attribute for that, and we'll use it instead.
+ */
+if (!kvm_gva_to_gpa(cs, tma.addr.p, &gpa, &len, false) ||
+

Re: [PATCH] linux-user: Add translation for argument of msync()

Host!

r~

On Thu, 15 Dec 2022, 12:58 Philippe Mathieu-Daudé, 
wrote:

> On 15/12/22 16:58, Richard Henderson wrote:
> > On 12/14/22 23:58, Philippe Mathieu-Daudé wrote:
> >>> --- a/linux-user/alpha/target_mman.h
> >>> +++ b/linux-user/alpha/target_mman.h
> >>> @@ -3,6 +3,10 @@
> >>>
> >>>   #define TARGET_MADV_DONTNEED 6
> >>>
> >>> +#define TARGET_MS_ASYNC 1
> >>> +#define TARGET_MS_SYNC 2
> >>> +#define TARGET_MS_INVALIDATE 4
> >>> +
> >>>   #include "../generic/target_mman.h"
> >>>
> >>>   #endif
> >>> diff --git a/linux-user/generic/target_mman.h
> >>> b/linux-user/generic/target_mman.h
> >>> index 1436a3c543..32bf1a52d0 100644
> >>> --- a/linux-user/generic/target_mman.h
> >>> +++ b/linux-user/generic/target_mman.h
> >>> @@ -89,4 +89,17 @@
> >>>   #define TARGET_MADV_DONTNEED_LOCKED 24
> >>>   #endif
> >>>
> >>> +
> >>> +#ifndef TARGET_MS_ASYNC
> >>> +#define TARGET_MS_ASYNC 1
> >>
> >> Hmm don't we want to keep the host flag instead?
> >>
> >> #define TARGET_MS_ASYNC MS_ASYNC
> >
> > No.  What if the host has an odd value, like Alpha.
>
> But TARGET_MS_ASYNC  would be defined in linux-user/alpha/target_mman.h
> so this path won't apply... What am I missing?
>

Re: [PULL 19/20] tcg/ppc: Optimize 26-bit jumps

It's also has a race condition.
Please see

https://lore.kernel.org/qemu-devel/20221206041715.314209-18-richard.hender...@linaro.org/


r~

On Thu, 15 Dec 2022, 13:33 Michael Tokarev,  wrote:

> 04.10.2022 22:52, Richard Henderson wrote:
> > From: Leandro Lupori 
> >
> > PowerPC64 processors handle direct branches better than indirect
> > ones, resulting in less stalled cycles and branch misses.
> >
> > However, PPC's tb_target_set_jmp_target() was only using direct
> > branches for 16-bit jumps, while PowerPC64's unconditional branch
> > instructions are able to handle displacements of up to 26 bits.
> > To take advantage of this, now jumps whose displacements fit in
> > between 17 and 26 bits are also converted to direct branches.
> >
> > Reviewed-by: Richard Henderson 
> > Signed-off-by: Leandro Lupori 
> > [rth: Expanded some commentary.]
> > Signed-off-by: Richard Henderson 
> > ---
> >   tcg/ppc/tcg-target.c.inc | 119 +--
> >   1 file changed, 88 insertions(+), 31 deletions(-)
> >
> > diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> > index 1cbd047ab3..e3dba47697 100644
> > --- a/tcg/ppc/tcg-target.c.inc
> > +++ b/tcg/ppc/tcg-target.c.inc
> ...
>
> > +/*
> > + * There's no convenient way to get the compiler to allocate a pair
> > + * of registers at an even index, so copy into r6/r7 and clobber.
> > + */
> > +asm("mr  %%r6, %1\n\t"
> > +"mr  %%r7, %2\n\t"
> > +"stq %%r6, %0"
> > +: "=Q"(*(__int128 *)rw) : "r"(p[0]), "r"(p[1]) : "r6", "r7");
>
> This is the only place in qemu where __int128 is used (other places name
> it __int128_t), and is used *unconditionally*.  Is it right?
>
> In particular, this breaks compilation on powerpc:
>
> cc -m32 -Ilibqemu-aarch64-softmmu.fa.p... -c ../../tcg/tcg.c
> In file included from ../../tcg/tcg.c:432:
> /<>/tcg/ppc/tcg-target.c.inc: In function ‘ppc64_replace4’:
> /<>/tcg/ppc/tcg-target.c.inc:1885:18: error: expected
> expression before ‘__int128’
>   1885 | : "=Q"(*(__int128 *)rw) : "r"(p[0]), "r"(p[1]) : "r6",
> "r7");
>|  ^~~~
> /<>/tcg/ppc/tcg-target.c.inc:1885:29: error: expected ‘)’
> before ‘rw’
>   1885 | : "=Q"(*(__int128 *)rw) : "r"(p[0]), "r"(p[1]) : "r6",
> "r7");
>|   ~ ^~
>
> Thanks,
>
> /mjt
>

[PATCH] tests/avocado: add machine:none tag to version.py

2022-12-15 Thread Fabiano Rosas

This test currently fails when run on a host for which the QEMU target
has no default machine set:

ERROR| Output: qemu-system-aarch64: No machine specified, and there is
no default

Signed-off-by: Fabiano Rosas 
---
 tests/avocado/version.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/avocado/version.py b/tests/avocado/version.py
index ded7f039c1..dd775955eb 100644
--- a/tests/avocado/version.py
+++ b/tests/avocado/version.py
@@ -15,6 +15,7 @@
 class Version(QemuSystemTest):
 """
 :avocado: tags=quick
+:avocado: tags=machine:none
 """
 def test_qmp_human_info_version(self):
 self.vm.add_args('-nodefaults')
-- 
2.35.3

[PATCH 1/2] target/riscv: Fix up masking of vsip/vsie accesses

2022-12-15 Thread Andrew Bresticker

The current logic attempts to shift the VS-level bits into their correct
position in mip while leaving the remaining bits in-tact. This is both
pointless and likely incorrect since one would expect that any new, future
VS-level interrupts will get their own position in mip rather than sharing
with their (H)S-level equivalent. Fix this, and make the logic more
readable, by just making off the VS-level bits and shifting them into
position.

This also fixes reads of vsip, which would only ever report vsip.VSSIP
since the non-writable bits got masked off as well.

Fixes: d028ac7512f1 ("arget/riscv: Implement AIA CSRs for 64 local interrupts 
on RV32")
Signed-off-by: Andrew Bresticker 
---
 target/riscv/csr.c | 35 +++
 1 file changed, 11 insertions(+), 24 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 5c9a7ee287..984548bf87 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1975,22 +1975,15 @@ static RISCVException rmw_vsie64(CPURISCVState *env, 
int csrno,
  uint64_t new_val, uint64_t wr_mask)
 {
 RISCVException ret;
-uint64_t rval, vsbits, mask = env->hideleg & VS_MODE_INTERRUPTS;
+uint64_t rval, mask = env->hideleg & VS_MODE_INTERRUPTS;
 
 /* Bring VS-level bits to correct position */
-vsbits = new_val & (VS_MODE_INTERRUPTS >> 1);
-new_val &= ~(VS_MODE_INTERRUPTS >> 1);
-new_val |= vsbits << 1;
-vsbits = wr_mask & (VS_MODE_INTERRUPTS >> 1);
-wr_mask &= ~(VS_MODE_INTERRUPTS >> 1);
-wr_mask |= vsbits << 1;
+new_val = (new_val & (VS_MODE_INTERRUPTS >> 1)) << 1;
+wr_mask = (wr_mask & (VS_MODE_INTERRUPTS >> 1)) << 1;
 
 ret = rmw_mie64(env, csrno, &rval, new_val, wr_mask & mask);
 if (ret_val) {
-rval &= mask;
-vsbits = rval & VS_MODE_INTERRUPTS;
-rval &= ~VS_MODE_INTERRUPTS;
-*ret_val = rval | (vsbits >> 1);
+*ret_val = (rval & mask) >> 1;
 }
 
 return ret;
@@ -2191,22 +2184,16 @@ static RISCVException rmw_vsip64(CPURISCVState *env, 
int csrno,
  uint64_t new_val, uint64_t wr_mask)
 {
 RISCVException ret;
-uint64_t rval, vsbits, mask = env->hideleg & vsip_writable_mask;
+uint64_t rval, mask = env->hideleg & VS_MODE_INTERRUPTS;
 
 /* Bring VS-level bits to correct position */
-vsbits = new_val & (VS_MODE_INTERRUPTS >> 1);
-new_val &= ~(VS_MODE_INTERRUPTS >> 1);
-new_val |= vsbits << 1;
-vsbits = wr_mask & (VS_MODE_INTERRUPTS >> 1);
-wr_mask &= ~(VS_MODE_INTERRUPTS >> 1);
-wr_mask |= vsbits << 1;
-
-ret = rmw_mip64(env, csrno, &rval, new_val, wr_mask & mask);
+new_val = (new_val & (VS_MODE_INTERRUPTS >> 1)) << 1;
+wr_mask = (wr_mask & (VS_MODE_INTERRUPTS >> 1)) << 1;
+
+ret = rmw_mip64(env, csrno, &rval, new_val,
+wr_mask & mask & vsip_writable_mask);
 if (ret_val) {
-rval &= mask;
-vsbits = rval & VS_MODE_INTERRUPTS;
-rval &= ~VS_MODE_INTERRUPTS;
-*ret_val = rval | (vsbits >> 1);
+*ret_val = (rval & mask) >> 1;
 }
 
 return ret;
-- 
2.25.1

[PATCH 2/2] target/riscv: Trap on writes to stimecmp from VS when hvictl.VTI=1

2022-12-15 Thread Andrew Bresticker

Per the AIA specification, writes to stimecmp from VS level should
trap when hvictl.VTI is set since the write may cause vsip.STIP to
become unset.

Fixes: 3ec0fe18a31f ("target/riscv: Add vstimecmp support")
Signed-off-by: Andrew Bresticker 
---
 target/riscv/csr.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 984548bf87..7d9035e7bb 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -935,6 +935,9 @@ static RISCVException write_stimecmp(CPURISCVState *env, 
int csrno,
 RISCVCPU *cpu = env_archcpu(env);
 
 if (riscv_cpu_virt_enabled(env)) {
+if (env->hvictl & HVICTL_VTI) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
 return write_vstimecmp(env, csrno, val);
 }
 
@@ -955,6 +958,9 @@ static RISCVException write_stimecmph(CPURISCVState *env, 
int csrno,
 RISCVCPU *cpu = env_archcpu(env);
 
 if (riscv_cpu_virt_enabled(env)) {
+if (env->hvictl & HVICTL_VTI) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
 return write_vstimecmph(env, csrno, val);
 }
 
-- 
2.25.1

[PATCH] migration: Show downtime during postcopy phase

2022-12-15 Thread Peter Xu

The downtime should be displayed during postcopy phase because the
switchover phase is done.  OTOH it's weird to show "expected downtime"
which can confuse what does that mean if the switchover has already
happened anyway.

This is a slight ABI change on QMP, but I assume it shouldn't affect
anyone.

Signed-off-by: Peter Xu 
---
 migration/migration.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 64f74534e2..993782598f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1077,20 +1077,30 @@ bool migration_is_running(int state)
 }
 }
 
+static bool migrate_show_downtime(MigrationState *s)
+{
+return (s->state == MIGRATION_STATUS_COMPLETED) || migration_in_postcopy();
+}
+
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
 {
 info->has_status = true;
 info->has_setup_time = true;
 info->setup_time = s->setup_time;
+
 if (s->state == MIGRATION_STATUS_COMPLETED) {
 info->has_total_time = true;
 info->total_time = s->total_time;
-info->has_downtime = true;
-info->downtime = s->downtime;
 } else {
 info->has_total_time = true;
 info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) -
s->start_time;
+}
+
+if (migrate_show_downtime(s)) {
+info->has_downtime = true;
+info->downtime = s->downtime;
+} else {
 info->has_expected_downtime = true;
 info->expected_downtime = s->expected_downtime;
 }
-- 
2.37.3

Re: [PULL v2 00/28] target-arm queue

On Thu, 15 Dec 2022 at 17:40, Peter Maydell  wrote:
>
> drop the sysregs patch as the tcg sysregs test fails
> (probably a bug in the test)
>
> -- PMM
>
> The following changes since commit ae2b87341b5ddb0dcb1b3f2d4f586ef18de75873:
>
>   Merge tag 'pull-qapi-2022-12-14-v2' of https://repo.or.cz/qemu/armbru into 
> staging (2022-12-14 22:42:14 +)
>
> are available in the Git repository at:
>
>   https://git.linaro.org/people/pmaydell/qemu-arm.git 
> tags/pull-target-arm-20221215-1
>
> for you to fetch changes up to 9e406eea309bbe44c7fb17f6af112d2b756854ad:
>
>   target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator 
> (2022-12-15 17:37:48 +)
>
> 
> target-arm queue:
>  * hw/arm/virt: Add properties to allow more granular
>configuration of use of highmem space
>  * target/arm: Add Cortex-A55 CPU
>  * hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
>  * Implement FEAT_EVT
>  * Some 3-phase-reset conversions for Arm GIC, SMMU
>  * hw/arm/boot: set initrd with #address-cells type in fdt
>  * hw/misc: Move some arm-related files from specific_ss into softmmu_ss
>  * Restrict arm_cpu_exec_interrupt() to TCG accelerator


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [PULL 19/20] tcg/ppc: Optimize 26-bit jumps

2022-12-15 Thread Michael Tokarev


From: Leandro Lupori 


And this address bounces for me, FWIW:

 eldorado-org-br.mail.protection.outlook.com[104.47.70.110] said:
   550 5.4.1 Recipient address rejected: Access denied. AS(201806281)

/mjt

Re: [PULL 19/20] tcg/ppc: Optimize 26-bit jumps

2022-12-15 Thread Michael Tokarev


04.10.2022 22:52, Richard Henderson wrote:

From: Leandro Lupori 

PowerPC64 processors handle direct branches better than indirect
ones, resulting in less stalled cycles and branch misses.

However, PPC's tb_target_set_jmp_target() was only using direct
branches for 16-bit jumps, while PowerPC64's unconditional branch
instructions are able to handle displacements of up to 26 bits.
To take advantage of this, now jumps whose displacements fit in
between 17 and 26 bits are also converted to direct branches.

Reviewed-by: Richard Henderson 
Signed-off-by: Leandro Lupori 
[rth: Expanded some commentary.]
Signed-off-by: Richard Henderson 
---
  tcg/ppc/tcg-target.c.inc | 119 +--
  1 file changed, 88 insertions(+), 31 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 1cbd047ab3..e3dba47697 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc

...


+/*
+ * There's no convenient way to get the compiler to allocate a pair
+ * of registers at an even index, so copy into r6/r7 and clobber.
+ */
+asm("mr  %%r6, %1\n\t"
+"mr  %%r7, %2\n\t"
+"stq %%r6, %0"
+: "=Q"(*(__int128 *)rw) : "r"(p[0]), "r"(p[1]) : "r6", "r7");


This is the only place in qemu where __int128 is used (other places name
it __int128_t), and is used *unconditionally*.  Is it right?

In particular, this breaks compilation on powerpc:

cc -m32 -Ilibqemu-aarch64-softmmu.fa.p... -c ../../tcg/tcg.c
In file included from ../../tcg/tcg.c:432:
/<>/tcg/ppc/tcg-target.c.inc: In function ‘ppc64_replace4’:
/<>/tcg/ppc/tcg-target.c.inc:1885:18: error: expected expression 
before ‘__int128’
 1885 | : "=Q"(*(__int128 *)rw) : "r"(p[0]), "r"(p[1]) : "r6", "r7");
  |  ^~~~
/<>/tcg/ppc/tcg-target.c.inc:1885:29: error: expected ‘)’ before 
‘rw’
 1885 | : "=Q"(*(__int128 *)rw) : "r"(p[0]), "r"(p[1]) : "r6", "r7");
  |   ~ ^~

Thanks,

/mjt

Re: [PATCH] linux-user: Add translation for argument of msync()

2022-12-15 Thread Philippe Mathieu-Daudé


On 15/12/22 16:58, Richard Henderson wrote:

On 12/14/22 23:58, Philippe Mathieu-Daudé wrote:

--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -3,6 +3,10 @@

  #define TARGET_MADV_DONTNEED 6

+#define TARGET_MS_ASYNC 1
+#define TARGET_MS_SYNC 2
+#define TARGET_MS_INVALIDATE 4
+
  #include "../generic/target_mman.h"

  #endif
diff --git a/linux-user/generic/target_mman.h 
b/linux-user/generic/target_mman.h

index 1436a3c543..32bf1a52d0 100644
--- a/linux-user/generic/target_mman.h
+++ b/linux-user/generic/target_mman.h
@@ -89,4 +89,17 @@
  #define TARGET_MADV_DONTNEED_LOCKED 24
  #endif

+
+#ifndef TARGET_MS_ASYNC
+#define TARGET_MS_ASYNC 1


Hmm don't we want to keep the host flag instead?

    #define TARGET_MS_ASYNC MS_ASYNC


No.  What if the host has an odd value, like Alpha.


But TARGET_MS_ASYNC  would be defined in linux-user/alpha/target_mman.h
so this path won't apply... What am I missing?

Re: [RFC PATCH v2 20/22] i386/xen: HVMOP_set_param / HVM_PARAM_CALLBACK_IRQ

On Mon, 2022-12-12 at 16:39 +, Paul Durrant wrote:
> On 12/12/2022 16:26, David Woodhouse wrote:
> > On Mon, 2022-12-12 at 16:16 +, Paul Durrant wrote:
> > > On 09/12/2022 09:56, David Woodhouse wrote:
> > > > From: Ankur Arora 
> > > > The HVM_PARAM_CALLBACK_IRQ parameter controls the system-wide event
> > > > channel upcall method.  The vector support is handled by KVM internally,
> > > > when the evtchn_upcall_pending field in the vcpu_info is set.
> > > > The GSI and PCI_INTX delivery methods are not supported. yet; those
> > > > need to simulate a level-triggered event on the I/OAPIC.
> > > 
> > > That's gonna be somewhat limiting if anyone runs a Windows guest with
> > > upcall vector support turned off... which is an option at:
> > > 
> > > https://xenbits.xen.org/gitweb/?p=pvdrivers/win/xenbus.git;a=blob;f=src/xenbus/evtchn.c;;hb=HEAD#l1928
> > > 
> > 
> > Sure. And as you know, I also added the 'xen_no_vector_callback' option
> > to the Linux command line to allow for that mode to be tested with
> > Linux too: 
> > https://git.kernel.org/torvalds/c/b36b0fe96a
> > 
> > 
> > The GSI and PCI_INTX modes will be added in time, but not yet.
> 
> Ok, but maybe worth calling out the limitation in the commit comment for 
> those wishing to kick the tyres.

Hm... this isn't as simple in QEMU as I hoped.

The way I want to handle it is like the way that VFIO eventfd pairs
work for level-triggered interrupts: the first eventfd is triggered on
a rising edge, and the other one is a 'resampler' which is triggered on
EOI, and causes the first one to be retriggered if the level is still
actually high.

However... unlike the kernel and another VMM that you and I are
familiar with, QEMU doesn't actually hook that up to the EOI in the
APIC/IOAPIC at all.

Instead, when VFIO devices assert a level-triggered interrupt, QEMU
*unmaps* that device's BARs from the guest so it can trap-and-emulate
them, and each MMIO read or write will also trigger the resampler
(whether that line is currently masked in the APIC or not).

I suppose we could try making the page with the vcpu_info as read-only, 
and trapping access to that so we spot when the guest clears its own
->evtchn_upcall_pending flag? That seems overly complex though.

So I've resorted to doing what Xen itself does: just poll the flag on
every vmexit. Patch is at the tip of my tree at 
https://git.infradead.org/users/dwmw2/qemu.git/shortlog/refs/heads/xenfv
and below.

However, in the case of the in-kernel irqchip we might not even *get* a
vmexit all the way to userspace; can't a guest just get stuck in an
interrupt storm with it being handled entirely in-kernel? I might
decree that this works *only* with the split irqchip.

Then again, it'll work *nicely* in the kernel where the EOI
notification exists, so I might teach the kernel's evtchn code to
create those eventfd pairs like VFIO's, which can be hooked in as IRQFD
routing to the in-kernel {IOA,}PIC.

From 15a91ff4833d07910abba1dec093e48580c2b4c4 Mon Sep 17 00:00:00 2001
From: David Woodhouse 
Subject: [PATCH] hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback

The GSI callback (and later PCI_INTX) is a level triggered interrupt. It
is asserted when an event channel is delivered to vCPU0, and is supposed
to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0
is cleared again.

Thankfully, Xen does *not* assert the GSI if the guest sets its own
evtchn_upcall_pending field; we only need to assert the GSI when we
have delivered an event for ourselves. So that's the easy part.

However, we *do* need to poll for the evtchn_upcall_pending flag being
cleared. In an ideal world we would poll that when the EOI happens on
the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd
pairs — one is used to trigger the interrupt, and the other works in the
other direction to 'resample' on EOI, and trigger the first eventfd
again if the line is still active.

However, QEMU doesn't seem to do that. Even VFIO level interrupts seem
to be supported by temporarily unmapping the device's BARs from the
guest when an interrupt happens, then trapping *all* MMIO to the device
and sending the 'resample' event on *every* MMIO access until the IRQ
is cleared! Maybe in future we'll plumb the 'resample' concept through
QEMU's irq framework but for now we'll do what Xen itself does: just
check the flag on every vmexit if the upcall GSI is known to be
asserted.

This is barely tested; I did make it waggle IRQ1 up and down on *every*
delivery even through the vector method, and observed that it is indeed
visible to the guest (as lots of spurious keyboard interrupts). But if
the vector callback isn't in use, Linux guests won't actually use event
channels for anything except PV devices... which I haven't implemented
here yet, so it's hard to test delivery :)

Signed-off-by: David Woodhouse 
---
 hw/i386/kvm/xen_evtchn.c  | 88 +++
 hw/i386/kvm/xen_evtchn.

Re: [PATCH 2/2] tpm: add backend for mssim





On 12/15/22 15:30, James Bottomley wrote:

On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote:

On 12/15/22 15:07, James Bottomley wrote:

[...]

don't really have much interest in the migration use case, but I
knew it should work like the passthrough case, so that's what I
tested.


I think your device needs to block migrations since it doesn't handle
all migration scenarios correctly.


Passthrough doesn't block migrations either, presumably because it can
also be made to work if you know what you're doing.  I might not be


Don't compare it to passthrough, compare it to swtpm. It should have at least 
the same features as swtpm or be better, otherwise I don't see why we need to 
have the backend device in the upstream repo.

Stefan

Re: [PATCH 2/2] tpm: add backend for mssim

On Thu, 2022-12-15 at 15:22 -0500, Stefan Berger wrote:
> On 12/15/22 15:07, James Bottomley wrote:
[...]
> > don't really have much interest in the migration use case, but I
> > knew it should work like the passthrough case, so that's what I
> > tested.
> 
> I think your device needs to block migrations since it doesn't handle
> all migration scenarios correctly.

Passthrough doesn't block migrations either, presumably because it can
also be made to work if you know what you're doing.  I might not be
particularly interested in migrations, but that's not really a good
reason to prevent anyone from ever using them, particularly when the
experiment says they do work.

James

Re: [PATCH 2/2] tpm: add backend for mssim

On 12/15/22 15:07, James Bottomley wrote:

On Thu, 2022-12-15 at 14:57 -0500, Stefan Berger wrote:

On 12/15/22 14:40, James Bottomley wrote:

On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote:

[...]

You should also add a description to docs/specs/tpm.rst.

Description of what? It functions exactly like passthrough on

Please describe all the scenarios so that someone else can repeat
them when trying out **your** device.

There are sections describing how things for swtpm and you should add
how things work for the mssim TPM.

https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#the-qemu-tpm-emulator-device
https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#migration-with-the-tpm-emulator

The passthrough snapshot/restore isn't described there either. This

Forget about passthrough, rather compare it to swtpm.

behaves exactly the same in that it's caveat emptor. If something
happens in the interim to upset the TPM state then the restore will
have unexpected effects due to the externally changed TPM state. This
is actually a feature: I'm checking our interposer defences by doing
external state manipulation.

migration. Since the TPM state is retained in the server a
reconnection just brings everything back to where it was.

So it's remote. And the ports are always open and someone can just
connect to the open ports and power cycle the device?

in the same way as you can power off the hardware and have issues with
a passthrough TPM on vm restore, yes.

I don't thinkyou should compare the mssim TPM with passthrough but rather with
swtpm emulator + tpm_emulator backend. That's a much better comparison.

This may not be the most important scenario but nevertheless I
wouldn't want to deal with bug reports if someone does 'VM
snapshotting' -- how this is correctly handled would be of interest.

I'd rather say nothing, like passthrough, then there are no
expectations beyond it might work if you know what you're doing. I

Why do we need this device then if it doesn't handle migration scenarios in the
same or better way than swtpm + tpm_emulator backends already do?

don't really have much interest in the migration use case, but I knew
it should work like the passthrough case, so that's what I tested.

I think your device needs to block migrations since it doesn't handle all
migration scenarios correctly.

Stefan

James

Re: [PATCH 2/2] tpm: add backend for mssim

On Thu, 2022-12-15 at 14:57 -0500, Stefan Berger wrote:
> On 12/15/22 14:40, James Bottomley wrote:
> > On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote:
[...]
> > > You should also add a description to docs/specs/tpm.rst.
> > 
> > Description of what?  It functions exactly like passthrough on
> 
> Please describe all the scenarios so that someone else can repeat
> them when trying out **your** device.
> 
> There are sections describing how things for swtpm and you should add
> how things work for the mssim TPM.
> 
> https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#the-qemu-tpm-emulator-device
> https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#migration-with-the-tpm-emulator

The passthrough snapshot/restore isn't described there either.  This
behaves exactly the same in that it's caveat emptor.  If something
happens in the interim to upset the TPM state then the restore will
have unexpected effects due to the externally changed TPM state.  This
is actually a feature: I'm checking our interposer defences by doing
external state manipulation.

> > migration.  Since the TPM state is retained in the server a
> > reconnection just brings everything back to where it was.
> 
> So it's remote. And the ports are always open and someone can just
> connect to the open ports and power cycle the device?

in the same way as you can power off the hardware and have issues with
a passthrough TPM on vm restore, yes.

> This may not be the most important scenario but nevertheless I
> wouldn't want to deal with bug reports if someone does 'VM
> snapshotting' -- how this is correctly handled would be of interest.

I'd rather say nothing, like passthrough, then there are no
expectations beyond it might work if you know what you're doing.  I
don't really have much interest in the migration use case, but I knew
it should work like the passthrough case, so that's what I tested.

James

Re: [PATCH 3/8] tcg/loongarch64: Update tcg-insn-defs.c.inc


On 12/15/22 11:50, WANG Xuerui wrote:

So do you need the addu16i.d marked as @qemu now?


Soonish.  I made the change locally, but merging back to your repo seems to be 
disabled.

I can push the change into 
loongarch-opcodes tomorrow if so wanted.


Or that; it's probably easier.

Of course it's probably better to maintain the 
used opcodes list in qemu's repo, let me refactor this after I somehow crawl out of the 
pile of day job...


No rush on that.  I don't see any other insns that are likely to be used.


r~

Re: [PATCH 2/2] tpm: add backend for mssim





On 12/15/22 14:40, James Bottomley wrote:

On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote:



On 12/15/22 14:22, James Bottomley wrote:

On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote:



On 12/15/22 13:01, James Bottomley wrote:

From: James Bottomley 

The Microsoft Simulator (mssim) is the reference emulation
platform
for the TCG TPM 2.0 specification.

https://github.com/Microsoft/ms-tpm-20-ref.git

It exports a fairly simple network socket baset protocol on two
sockets, one for command (default 2321) and one for control
(default
2322).  This patch adds a simple backend that can speak the
mssim
protocol over the network.  It also allows the host, and two
ports
to
be specified on the qemu command line.  The benefits are
twofold:
firstly it gives us a backend that actually speaks a standard
TPM
emulation protocol instead of the linux specific TPM driver
format
of
the current emulated TPM backend and secondly, using the
microsoft
protocol, the end point of the emulator can be anywhere on the
network, facilitating the cloud use case where a central TPM
service
can be used over a control network.

The implementation does basic control commands like power
off/on,
but
doesn't implement cancellation or startup.  The former because
cancellation is pretty much useless on a fast operating TPM
emulator
and the latter because this emulator is designed to be used
with
OVMF
which itself does TPM startup and I wanted to validate that.

To run this, simply download an emulator based on the MS
specification
(package ibmswtpm2 on openSUSE) and run it, then add these two
lines
to the qemu command and it will use the emulator.

   -tpmdev mssim,id=tpm0 \
   -device tpm-crb,tpmdev=tpm0 \

to use a remote emulator replace the first line with

   -tpmdev
"{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'rem
ote'
,'port':'2321'}}"

tpm-tis also works as the backend.


Since this device does not properly support migration you have to
register a migration blocker.


Actually it seems to support migration just fine.  Currently the
PCR's
get zero'd which is my fault for doing a TPM power off/on, but
switching that based on state should be an easy fix.


How do you handle virsh save  -> host reboot -> virsh restore?


I didn't.  I just pulled out the TPM power state changes and followed
the guide here using the migrate "exec:gzip -c > STATEFILE.gz" recipe:

https://www.linux-kvm.org/page/Migration

and verified the TPM pcrs and the null name were unchanged.





You should also add a description to docs/specs/tpm.rst.


Description of what?  It functions exactly like passthrough on


Please describe all the scenarios so that someone else can repeat them when 
trying out **your** device.

There are sections describing how things for swtpm and you should add how 
things work for the mssim TPM.

https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#the-qemu-tpm-emulator-device
https://github.com/qemu/qemu/blob/master/docs/specs/tpm.rst#migration-with-the-tpm-emulator



migration.  Since the TPM state is retained in the server a
reconnection just brings everything back to where it was.


So it's remote. And the ports are always open and someone can just connect to 
the open ports and power cycle the device?

This may not be the most important scenario but nevertheless I wouldn't want to 
deal with bug reports if someone does 'VM snapshotting' -- how this is 
correctly handled would be of interest.

   Stefan



James

Re: [PATCH 3/8] tcg/loongarch64: Update tcg-insn-defs.c.inc

2022-12-15 Thread WANG Xuerui


On 12/15/22 23:51, Richard Henderson wrote:

On 12/14/22 23:50, Philippe Mathieu-Daudé wrote:

On 6/12/22 05:40, Richard Henderson wrote:

Regenerate with ADDU16I included.

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-insn-defs.c.inc | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/loongarch64/tcg-insn-defs.c.inc 
b/tcg/loongarch64/tcg-insn-defs.c.inc

index d162571856..c3c8669b4b 100644
--- a/tcg/loongarch64/tcg-insn-defs.c.inc
+++ b/tcg/loongarch64/tcg-insn-defs.c.inc
@@ -4,7 +4,7 @@
   *
   * This file is auto-generated by genqemutcgdefs from
   * https://github.com/loongson-community/loongarch-opcodes,
- * from commit 961f0c60f5b63e574d785995600c71ad5413fdc4.


Odd, addu16i.d is present since 3d057a6, so was already in 961f0c6.


It wasn't marked "qemu", so the generator didn't emit ...


@@ -74,6 +74,7 @@ typedef enum {
  OPC_ANDI = 0x0340,
  OPC_ORI = 0x0380,
  OPC_XORI = 0x03c0,
+    OPC_ADDU16I_D = 0x1000,
  OPC_LU12I_W = 0x1400,
  OPC_CU32I_D = 0x1600,
  OPC_PCADDU2I = 0x1800,
@@ -710,6 +711,13 @@ tcg_out_opc_xori(TCGContext *s, TCGReg d, 
TCGReg j, uint32_t uk12)

  tcg_out32(s, encode_djuk12_insn(OPC_XORI, d, j, uk12));
  }
+/* Emits the `addu16i.d d, j, sk16` instruction.  */
+static void __attribute__((unused))
+tcg_out_opc_addu16i_d(TCGContext *s, TCGReg d, TCGReg j, int32_t sk16)
+{
+    tcg_out32(s, encode_djsk16_insn(OPC_ADDU16I_D, d, j, sk16));
+}


... all this.

Ah. Sorry for the late reply, I've been busy with Gentoo and LLVM mostly 
these days (apart from the day job more demanding than ever, due to 
end-of-year and a bit too much slack doing LoongArch work instead ;-).


So do you need the addu16i.d marked as @qemu now? I can push the change 
into loongarch-opcodes tomorrow if so wanted. Of course it's probably 
better to maintain the used opcodes list in qemu's repo, let me refactor 
this after I somehow crawl out of the pile of day job...

Re: [PATCH 2/2] tpm: add backend for mssim

On Thu, 2022-12-15 at 14:35 -0500, Stefan Berger wrote:
> 
> 
> On 12/15/22 14:22, James Bottomley wrote:
> > On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote:
> > > 
> > > 
> > > On 12/15/22 13:01, James Bottomley wrote:
> > > > From: James Bottomley 
> > > > 
> > > > The Microsoft Simulator (mssim) is the reference emulation
> > > > platform
> > > > for the TCG TPM 2.0 specification.
> > > > 
> > > > https://github.com/Microsoft/ms-tpm-20-ref.git
> > > > 
> > > > It exports a fairly simple network socket baset protocol on two
> > > > sockets, one for command (default 2321) and one for control
> > > > (default
> > > > 2322).  This patch adds a simple backend that can speak the
> > > > mssim
> > > > protocol over the network.  It also allows the host, and two
> > > > ports
> > > > to
> > > > be specified on the qemu command line.  The benefits are
> > > > twofold:
> > > > firstly it gives us a backend that actually speaks a standard
> > > > TPM
> > > > emulation protocol instead of the linux specific TPM driver
> > > > format
> > > > of
> > > > the current emulated TPM backend and secondly, using the
> > > > microsoft
> > > > protocol, the end point of the emulator can be anywhere on the
> > > > network, facilitating the cloud use case where a central TPM
> > > > service
> > > > can be used over a control network.
> > > > 
> > > > The implementation does basic control commands like power
> > > > off/on,
> > > > but
> > > > doesn't implement cancellation or startup.  The former because
> > > > cancellation is pretty much useless on a fast operating TPM
> > > > emulator
> > > > and the latter because this emulator is designed to be used
> > > > with
> > > > OVMF
> > > > which itself does TPM startup and I wanted to validate that.
> > > > 
> > > > To run this, simply download an emulator based on the MS
> > > > specification
> > > > (package ibmswtpm2 on openSUSE) and run it, then add these two
> > > > lines
> > > > to the qemu command and it will use the emulator.
> > > > 
> > > >   -tpmdev mssim,id=tpm0 \
> > > >   -device tpm-crb,tpmdev=tpm0 \
> > > > 
> > > > to use a remote emulator replace the first line with
> > > > 
> > > >   -tpmdev
> > > > "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'rem
> > > > ote'
> > > > ,'port':'2321'}}"
> > > > 
> > > > tpm-tis also works as the backend.
> > > 
> > > Since this device does not properly support migration you have to
> > > register a migration blocker.
> > 
> > Actually it seems to support migration just fine.  Currently the
> > PCR's
> > get zero'd which is my fault for doing a TPM power off/on, but
> > switching that based on state should be an easy fix.
> 
> How do you handle virsh save  -> host reboot -> virsh restore?

I didn't.  I just pulled out the TPM power state changes and followed
the guide here using the migrate "exec:gzip -c > STATEFILE.gz" recipe:

https://www.linux-kvm.org/page/Migration

and verified the TPM pcrs and the null name were unchanged.

> You should also add a description to docs/specs/tpm.rst.

Description of what?  It functions exactly like passthrough on
migration.  Since the TPM state is retained in the server a
reconnection just brings everything back to where it was.

James

Re: [PATCH 2/2] tpm: add backend for mssim





On 12/15/22 14:22, James Bottomley wrote:

On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote:



On 12/15/22 13:01, James Bottomley wrote:

From: James Bottomley 

The Microsoft Simulator (mssim) is the reference emulation platform
for the TCG TPM 2.0 specification.

https://github.com/Microsoft/ms-tpm-20-ref.git

It exports a fairly simple network socket baset protocol on two
sockets, one for command (default 2321) and one for control
(default
2322).  This patch adds a simple backend that can speak the mssim
protocol over the network.  It also allows the host, and two ports
to
be specified on the qemu command line.  The benefits are twofold:
firstly it gives us a backend that actually speaks a standard TPM
emulation protocol instead of the linux specific TPM driver format
of
the current emulated TPM backend and secondly, using the microsoft
protocol, the end point of the emulator can be anywhere on the
network, facilitating the cloud use case where a central TPM
service
can be used over a control network.

The implementation does basic control commands like power off/on,
but
doesn't implement cancellation or startup.  The former because
cancellation is pretty much useless on a fast operating TPM
emulator
and the latter because this emulator is designed to be used with
OVMF
which itself does TPM startup and I wanted to validate that.

To run this, simply download an emulator based on the MS
specification
(package ibmswtpm2 on openSUSE) and run it, then add these two
lines
to the qemu command and it will use the emulator.

  -tpmdev mssim,id=tpm0 \
  -device tpm-crb,tpmdev=tpm0 \

to use a remote emulator replace the first line with

  -tpmdev
"{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote'
,'port':'2321'}}"

tpm-tis also works as the backend.


Since this device does not properly support migration you have to
register a migration blocker.


Actually it seems to support migration just fine.  Currently the PCR's
get zero'd which is my fault for doing a TPM power off/on, but
switching that based on state should be an easy fix.


How do you handle virsh save  -> host reboot -> virsh restore?

You should also add a description to docs/specs/tpm.rst.

Stefan



James

Re: [PATCH 2/2] tpm: add backend for mssim

On Thu, 2022-12-15 at 13:46 -0500, Stefan Berger wrote:
> 
> 
> On 12/15/22 13:01, James Bottomley wrote:
> > From: James Bottomley 
> > 
> > The Microsoft Simulator (mssim) is the reference emulation platform
> > for the TCG TPM 2.0 specification.
> > 
> > https://github.com/Microsoft/ms-tpm-20-ref.git
> > 
> > It exports a fairly simple network socket baset protocol on two
> > sockets, one for command (default 2321) and one for control
> > (default
> > 2322).  This patch adds a simple backend that can speak the mssim
> > protocol over the network.  It also allows the host, and two ports
> > to
> > be specified on the qemu command line.  The benefits are twofold:
> > firstly it gives us a backend that actually speaks a standard TPM
> > emulation protocol instead of the linux specific TPM driver format
> > of
> > the current emulated TPM backend and secondly, using the microsoft
> > protocol, the end point of the emulator can be anywhere on the
> > network, facilitating the cloud use case where a central TPM
> > service
> > can be used over a control network.
> > 
> > The implementation does basic control commands like power off/on,
> > but
> > doesn't implement cancellation or startup.  The former because
> > cancellation is pretty much useless on a fast operating TPM
> > emulator
> > and the latter because this emulator is designed to be used with
> > OVMF
> > which itself does TPM startup and I wanted to validate that.
> > 
> > To run this, simply download an emulator based on the MS
> > specification
> > (package ibmswtpm2 on openSUSE) and run it, then add these two
> > lines
> > to the qemu command and it will use the emulator.
> > 
> >  -tpmdev mssim,id=tpm0 \
> >  -device tpm-crb,tpmdev=tpm0 \
> > 
> > to use a remote emulator replace the first line with
> > 
> >  -tpmdev
> > "{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote'
> > ,'port':'2321'}}"
> > 
> > tpm-tis also works as the backend.
> 
> Since this device does not properly support migration you have to
> register a migration blocker.

Actually it seems to support migration just fine.  Currently the PCR's
get zero'd which is my fault for doing a TPM power off/on, but
switching that based on state should be an easy fix.

James

Re: [PATCH 2/2] tpm: add backend for mssim





On 12/15/22 13:01, James Bottomley wrote:

From: James Bottomley 

The Microsoft Simulator (mssim) is the reference emulation platform
for the TCG TPM 2.0 specification.

https://github.com/Microsoft/ms-tpm-20-ref.git

It exports a fairly simple network socket baset protocol on two
sockets, one for command (default 2321) and one for control (default
2322).  This patch adds a simple backend that can speak the mssim
protocol over the network.  It also allows the host, and two ports to
be specified on the qemu command line.  The benefits are twofold:
firstly it gives us a backend that actually speaks a standard TPM
emulation protocol instead of the linux specific TPM driver format of
the current emulated TPM backend and secondly, using the microsoft
protocol, the end point of the emulator can be anywhere on the
network, facilitating the cloud use case where a central TPM service
can be used over a control network.

The implementation does basic control commands like power off/on, but
doesn't implement cancellation or startup.  The former because
cancellation is pretty much useless on a fast operating TPM emulator
and the latter because this emulator is designed to be used with OVMF
which itself does TPM startup and I wanted to validate that.

To run this, simply download an emulator based on the MS specification
(package ibmswtpm2 on openSUSE) and run it, then add these two lines
to the qemu command and it will use the emulator.

 -tpmdev mssim,id=tpm0 \
 -device tpm-crb,tpmdev=tpm0 \

to use a remote emulator replace the first line with

 -tpmdev 
"{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote','port':'2321'}}"

tpm-tis also works as the backend.


Since this device does not properly support migration you have to register a 
migration blocker.

   Stefan

[PATCH 2/2] tpm: add backend for mssim

From: James Bottomley 

The Microsoft Simulator (mssim) is the reference emulation platform
for the TCG TPM 2.0 specification.

https://github.com/Microsoft/ms-tpm-20-ref.git

It exports a fairly simple network socket baset protocol on two
sockets, one for command (default 2321) and one for control (default
2322).  This patch adds a simple backend that can speak the mssim
protocol over the network.  It also allows the host, and two ports to
be specified on the qemu command line.  The benefits are twofold:
firstly it gives us a backend that actually speaks a standard TPM
emulation protocol instead of the linux specific TPM driver format of
the current emulated TPM backend and secondly, using the microsoft
protocol, the end point of the emulator can be anywhere on the
network, facilitating the cloud use case where a central TPM service
can be used over a control network.

The implementation does basic control commands like power off/on, but
doesn't implement cancellation or startup.  The former because
cancellation is pretty much useless on a fast operating TPM emulator
and the latter because this emulator is designed to be used with OVMF
which itself does TPM startup and I wanted to validate that.

To run this, simply download an emulator based on the MS specification
(package ibmswtpm2 on openSUSE) and run it, then add these two lines
to the qemu command and it will use the emulator.

-tpmdev mssim,id=tpm0 \
-device tpm-crb,tpmdev=tpm0 \

to use a remote emulator replace the first line with

-tpmdev 
"{'type':'mssim','id':'tpm0','command':{'type':inet,'host':'remote','port':'2321'}}"

tpm-tis also works as the backend.

Signed-off-by: James Bottomley 

---

v2: convert to SocketAddr json and use qio_channel_socket_connect_sync()
---
 MAINTAINERS  |   5 +
 backends/tpm/Kconfig |   5 +
 backends/tpm/meson.build |   1 +
 backends/tpm/tpm_mssim.c | 251 +++
 backends/tpm/tpm_mssim.h |  43 +++
 monitor/hmp-cmds.c   |   7 ++
 qapi/tpm.json|  25 +++-
 7 files changed, 334 insertions(+), 3 deletions(-)
 create mode 100644 backends/tpm/tpm_mssim.c
 create mode 100644 backends/tpm/tpm_mssim.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6966490c94..a4a3bf9ab4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3046,6 +3046,11 @@ F: backends/tpm/
 F: tests/qtest/*tpm*
 T: git https://github.com/stefanberger/qemu-tpm.git tpm-next
 
+MSSIM TPM Backend
+M: James Bottomley 
+S: Maintained
+F: backends/tpm/tpm_mssim.*
+
 Checkpatch
 S: Odd Fixes
 F: scripts/checkpatch.pl
diff --git a/backends/tpm/Kconfig b/backends/tpm/Kconfig
index 5d91eb89c2..d6d6fa53e9 100644
--- a/backends/tpm/Kconfig
+++ b/backends/tpm/Kconfig
@@ -12,3 +12,8 @@ config TPM_EMULATOR
 bool
 default y
 depends on TPM_BACKEND
+
+config TPM_MSSIM
+bool
+default y
+depends on TPM_BACKEND
diff --git a/backends/tpm/meson.build b/backends/tpm/meson.build
index 7f2503f84e..c7c3c79125 100644
--- a/backends/tpm/meson.build
+++ b/backends/tpm/meson.build
@@ -3,4 +3,5 @@ if have_tpm
   softmmu_ss.add(files('tpm_util.c'))
   softmmu_ss.add(when: 'CONFIG_TPM_PASSTHROUGH', if_true: 
files('tpm_passthrough.c'))
   softmmu_ss.add(when: 'CONFIG_TPM_EMULATOR', if_true: files('tpm_emulator.c'))
+  softmmu_ss.add(when: 'CONFIG_TPM_MSSIM', if_true: files('tpm_mssim.c'))
 endif
diff --git a/backends/tpm/tpm_mssim.c b/backends/tpm/tpm_mssim.c
new file mode 100644
index 00..7c10ce2944
--- /dev/null
+++ b/backends/tpm/tpm_mssim.c
@@ -0,0 +1,251 @@
+/*
+ * Emulator TPM driver which connects over the mssim protocol
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2022
+ * Author: James Bottomley 
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+#include "qemu/sockets.h"
+
+#include "qapi/clone-visitor.h"
+#include "qapi/qapi-visit-tpm.h"
+
+#include "io/channel-socket.h"
+
+#include "sysemu/tpm_backend.h"
+#include "sysemu/tpm_util.h"
+
+#include "qom/object.h"
+
+#include "tpm_int.h"
+#include "tpm_mssim.h"
+
+#define ERROR_PREFIX "TPM mssim Emulator: "
+
+#define TYPE_TPM_MSSIM "tpm-mssim"
+OBJECT_DECLARE_SIMPLE_TYPE(TPMmssim, TPM_MSSIM)
+
+struct TPMmssim {
+TPMBackend parent;
+
+TpmTypeOptions *opts;
+
+QIOChannelSocket *cmd_qc, *ctrl_qc;
+};
+
+static int tpm_send_ctrl(TPMmssim *t, uint32_t cmd, Error **errp)
+{
+int ret;
+
+cmd = htonl(cmd);
+ret = qio_channel_write_all(QIO_CHANNEL(t->ctrl_qc), (char *)&cmd, 
sizeof(cmd), errp);
+if (ret != 0)
+return ret;
+ret = qio_channel_read_all(QIO_CHANNEL(t->ctrl_qc), (char *)&cmd, 
sizeof(cmd), errp);
+if (ret != 0)
+return ret;
+if (cmd != 0) {
+error_setg(errp, ERROR_PREFIX "Incorrect ACK recieved on control 
channel 0x%x\n", cmd);
+return -1;
+}
+return 0;
+}
+
+static void tpm_mssim_instance_init(Object *obj)
+{
+}
+
+static void tpm_mssim_instance_finalize(Object *obj)
+{
+TPMmssim *t = TPM_MSSIM(obj);

[PATCH 1/2] tpm: convert tpmdev options processing to new visitor format

From: James Bottomley 

Instead of processing the tpmdev options using the old qemu options,
convert to the new visitor format which also allows the passing of
json on the command line.

Signed-off-by: James Bottomley 
---
 backends/tpm/tpm_emulator.c| 35 ++
 backends/tpm/tpm_passthrough.c | 37 +--
 include/sysemu/tpm.h   |  2 +-
 include/sysemu/tpm_backend.h   |  2 +-
 monitor/hmp-cmds.c |  4 +-
 qapi/tpm.json  | 26 ++-
 softmmu/tpm.c  | 84 +++---
 softmmu/vl.c   |  4 +-
 8 files changed, 71 insertions(+), 123 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index 49cc3d749d..82988a2986 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -69,7 +69,7 @@ typedef struct TPMBlobBuffers {
 struct TPMEmulator {
 TPMBackend parent;
 
-TPMEmulatorOptions *options;
+TpmTypeOptions *options;
 CharBackend ctrl_chr;
 QIOChannel *data_ioc;
 TPMVersion tpm_version;
@@ -584,33 +584,28 @@ err_exit:
 return -1;
 }
 
-static int tpm_emulator_handle_device_opts(TPMEmulator *tpm_emu, QemuOpts 
*opts)
+static int tpm_emulator_handle_device_opts(TPMEmulator *tpm_emu, 
TpmTypeOptions *opts)
 {
-const char *value;
 Error *err = NULL;
 Chardev *dev;
 
-value = qemu_opt_get(opts, "chardev");
-if (!value) {
-error_report("tpm-emulator: parameter 'chardev' is missing");
-goto err;
-}
+tpm_emu->options = opts;
+tpm_emu->data_ioc = NULL;
 
-dev = qemu_chr_find(value);
+dev = qemu_chr_find(opts->u.emulator.chardev);
 if (!dev) {
-error_report("tpm-emulator: tpm chardev '%s' not found", value);
+error_report("tpm-emulator: tpm chardev '%s' not found",
+opts->u.emulator.chardev);
 goto err;
 }
 
 if (!qemu_chr_fe_init(&tpm_emu->ctrl_chr, dev, &err)) {
 error_prepend(&err, "tpm-emulator: No valid chardev found at '%s':",
-  value);
+  opts->u.emulator.chardev);
 error_report_err(err);
 goto err;
 }
 
-tpm_emu->options->chardev = g_strdup(value);
-
 if (tpm_emulator_prepare_data_fd(tpm_emu) < 0) {
 goto err;
 }
@@ -621,7 +616,7 @@ static int tpm_emulator_handle_device_opts(TPMEmulator 
*tpm_emu, QemuOpts *opts)
 if (tpm_util_test_tpmdev(QIO_CHANNEL_SOCKET(tpm_emu->data_ioc)->fd,
  &tpm_emu->tpm_version)) {
 error_report("'%s' is not emulating TPM device. Error: %s",
-  tpm_emu->options->chardev, strerror(errno));
+  tpm_emu->options->u.emulator.chardev, strerror(errno));
 goto err;
 }
 
@@ -649,7 +644,7 @@ err:
 return -1;
 }
 
-static TPMBackend *tpm_emulator_create(QemuOpts *opts)
+static TPMBackend *tpm_emulator_create(TpmTypeOptions *opts)
 {
 TPMBackend *tb = TPM_BACKEND(object_new(TYPE_TPM_EMULATOR));
 
@@ -664,10 +659,9 @@ static TPMBackend *tpm_emulator_create(QemuOpts *opts)
 static TpmTypeOptions *tpm_emulator_get_tpm_options(TPMBackend *tb)
 {
 TPMEmulator *tpm_emu = TPM_EMULATOR(tb);
-TpmTypeOptions *options = g_new0(TpmTypeOptions, 1);
+TpmTypeOptions *options;
 
-options->type = TPM_TYPE_EMULATOR;
-options->u.emulator.data = QAPI_CLONE(TPMEmulatorOptions, 
tpm_emu->options);
+options = QAPI_CLONE(TpmTypeOptions, tpm_emu->options);
 
 return options;
 }
@@ -972,7 +966,6 @@ static void tpm_emulator_inst_init(Object *obj)
 
 trace_tpm_emulator_inst_init();
 
-tpm_emu->options = g_new0(TPMEmulatorOptions, 1);
 tpm_emu->cur_locty_number = ~0;
 qemu_mutex_init(&tpm_emu->mutex);
 tpm_emu->vmstate =
@@ -990,7 +983,7 @@ static void tpm_emulator_shutdown(TPMEmulator *tpm_emu)
 {
 ptm_res res;
 
-if (!tpm_emu->options->chardev) {
+if (!tpm_emu->data_ioc) {
 /* was never properly initialized */
 return;
 }
@@ -1015,7 +1008,7 @@ static void tpm_emulator_inst_finalize(Object *obj)
 
 qemu_chr_fe_deinit(&tpm_emu->ctrl_chr, false);
 
-qapi_free_TPMEmulatorOptions(tpm_emu->options);
+qapi_free_TpmTypeOptions(tpm_emu->options);
 
 if (tpm_emu->migration_blocker) {
 migrate_del_blocker(tpm_emu->migration_blocker);
diff --git a/backends/tpm/tpm_passthrough.c b/backends/tpm/tpm_passthrough.c
index 5a2f74db1b..2ce39b2167 100644
--- a/backends/tpm/tpm_passthrough.c
+++ b/backends/tpm/tpm_passthrough.c
@@ -41,7 +41,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(TPMPassthruState, TPM_PASSTHROUGH)
 struct TPMPassthruState {
 TPMBackend parent;
 
-TPMPassthroughOptions *options;
+TpmTypeOptions *options;
 const char *tpm_dev;
 int tpm_fd;
 bool tpm_executing;
@@ -214,8 +214,8 @@ static int 
tpm_passthrough_open_sysfs_cancel(TPMPassthruState *tpm_pt)
 char *dev;
 char path[PATH_MAX];
 
-if (tpm_pt->options->cancel_path) {

[PATCH 0/2] tpm: add mssim backend

From: James Bottomley 

The requested feedback was to convert the tpmdev handler to being json
based, which requires rethreading all the backends.  The good news is
this reduced quite a bit of code (especially as I converted it to
error_fatal handling as well, which removes the return status
threading).  The bad news is I can't test any of the conversions.
swtpm still isn't building on opensuse and, apparently, passthrough
doesn't like my native TPM because it doesn't allow cancellation.

James

---

James Bottomley (2):
  tpm: convert tpmdev options processing to new visitor format
  tpm: add backend for mssim

 MAINTAINERS|   5 +
 backends/tpm/Kconfig   |   5 +
 backends/tpm/meson.build   |   1 +
 backends/tpm/tpm_emulator.c|  35 ++---
 backends/tpm/tpm_mssim.c   | 251 +
 backends/tpm/tpm_mssim.h   |  43 ++
 backends/tpm/tpm_passthrough.c |  37 ++---
 include/sysemu/tpm.h   |   2 +-
 include/sysemu/tpm_backend.h   |   2 +-
 monitor/hmp-cmds.c |  11 +-
 qapi/tpm.json  |  37 ++---
 softmmu/tpm.c  |  84 +--
 softmmu/vl.c   |   4 +-
 13 files changed, 398 insertions(+), 119 deletions(-)
 create mode 100644 backends/tpm/tpm_mssim.c
 create mode 100644 backends/tpm/tpm_mssim.h

-- 
2.35.3

Re: [PATCH 6/5] include/hw/cxl: Break inclusion loop

2022-12-15 Thread Jonathan Cameron via

On Thu, 15 Dec 2022 08:34:10 +0100
Markus Armbruster  wrote:

> Jonathan Cameron  writes:
> 
> > On Sat, 10 Dec 2022 08:09:06 +0100
> > Markus Armbruster  wrote:
> >  
> >> Markus Armbruster  writes:
> >>   
> >> > hw/cxl/cxl_pci.h and hw/cxl/cxl_cdat.h include each other.  Neither
> >> > header actually needs the other one.  Drop both #include directives.
> >> >
> >> > Signed-off-by: Markus Armbruster 
> >> > ---
> >> >  include/hw/cxl/cxl_cdat.h | 1 -
> >> >  include/hw/cxl/cxl_pci.h  | 1 -
> >> >  2 files changed, 2 deletions(-)
> >> >
> >> > diff --git a/include/hw/cxl/cxl_cdat.h b/include/hw/cxl/cxl_cdat.h
> >> > index 7f67638685..e3fd737f9d 100644
> >> > --- a/include/hw/cxl/cxl_cdat.h
> >> > +++ b/include/hw/cxl/cxl_cdat.h
> >> > @@ -10,7 +10,6 @@
> >> >  #ifndef CXL_CDAT_H
> >> >  #define CXL_CDAT_H
> >> >  
> >> > -#include "hw/cxl/cxl_pci.h"
> >> >  #include "hw/pci/pcie_doe.h"  
> >
> > The include was to get to CXL_VENDOR_ID which is in hw/cxl/cxl_pci.h
> > Can move that elsewhere perhaps, though I don't think we need to
> > if we break the loop by dropping the other one.  
> 
> It's used only in a macro.  If you use the macro, you need to include
> cxl_pci.h.
> 
> Would you like me to keep this #include?

yes. That would be my preference.

> 
> >> >  /*
> >> > diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
> >> > index aca14845ab..01e15ed5b4 100644
> >> > --- a/include/hw/cxl/cxl_pci.h
> >> > +++ b/include/hw/cxl/cxl_pci.h
> >> > @@ -11,7 +11,6 @@
> >> >  #define CXL_PCI_H
> >> >  
> >> >  #include "qemu/compiler.h"
> >> > -#include "hw/cxl/cxl_cdat.h"  
> > Guess that's a left over of some earlier refactoring. Good to get rid
> > of this one.
> >  
> >> >  
> >> >  #define CXL_VENDOR_ID 0x1e98
> >> 
> >> Friday afternoon post with insufficient testing...  Everything still
> >> builds fine, but cxl_component.h is no longer self-contained.  I'll
> >> squash in the appended patch and revise the commit message.  
> >
> > By staring at the code rather than any automation I'm failing to spot
> > what it needs from cxl_pci.h.  Can you add that info to the commit message? 
> >  
> 
> It's CXL20_MAX_DVSEC.
ah. Make sense. Thanks.
> 
> >> diff --git a/include/hw/cxl/cxl_component.h 
> >> b/include/hw/cxl/cxl_component.h
> >> index 5dca21e95b..78f83ed742 100644
> >> --- a/include/hw/cxl/cxl_component.h
> >> +++ b/include/hw/cxl/cxl_component.h
> >> @@ -19,6 +19,7 @@
> >>  #include "qemu/range.h"
> >>  #include "qemu/typedefs.h"
> >>  #include "hw/cxl/cxl_cdat.h"
> >> +#include "hw/cxl/cxl_pci.h"
> >>  #include "hw/register.h"
> >>  #include "qapi/error.h"
> >>  
> >>   
>

[PATCH 1/2] coroutine: annotate coroutine_fn for libclang

2022-12-15 Thread Paolo Bonzini

From: Alberto Faria 

Clang has a generic __annotate__ attribute that can be used by
static analyzers to understand properties of functions and
analyze the control flow.  Furthermore, unlike TSA annotations, the
__annotate__ attribute applies to function pointers as well.

As a first step towards static analysis of coroutine_fn markers,
attach the attribute to the marker when compiling with clang.

Signed-off-by: Alberto Faria 
Signed-off-by: Paolo Bonzini 
---
 include/qemu/coroutine.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 89650a2d7fab..b0c97f6fb7ad 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -42,7 +42,11 @@
  *   
  *   }
  */
+#ifdef __clang__
+#define coroutine_fn __attribute__((__annotate__("coroutine_fn")))
+#else
 #define coroutine_fn
+#endif
 
 typedef struct Coroutine Coroutine;
 
-- 
2.38.1

[PATCH 0/2] Make coroutine annotations ready for static analysis

2022-12-15 Thread Paolo Bonzini

Clang has a generic __annotate__ attribute that can be used by
static analyzers to understand properties of functions and
analyze the control flow.

Unlike TSA annotations, the __annotate__ attribute applies to function
pointers as well, which is very fortunate because many BlockDriver
function driver run in coroutines.

Paolo

Alberto Faria (2):
  coroutine: annotate coroutine_fn for libclang
  block: Add no_coroutine_fn and coroutine_mixed_fn marker

 include/block/block-common.h | 11 +++
 include/qemu/coroutine.h | 37 
 2 files changed, 44 insertions(+), 4 deletions(-)

-- 
2.38.1

[PATCH 2/2] block: Add no_coroutine_fn and coroutine_mixed_fn marker

2022-12-15 Thread Paolo Bonzini

From: Alberto Faria 

Add more annotations to functions, describing valid and invalid
calls from coroutine to non-coroutine context.

When applied to a function, no_coroutine_fn advertises that it should
not be called from coroutine_fn functions.  This can be because the
function blocks or, in the case of generated_co_wrapper, to enforce
that coroutine_fn functions directly call the coroutine_fn that backs
the generated_co_wrapper.

coroutine_mixed_fn instead is for function that can be called in
both coroutine and non-coroutine context, but will suspend when
called in coroutine context.  Annotating them is a first step
towards enforcing that non-annotated functions are absolutely
not going to suspend.

These can be used for example with the vrc tool from
https://github.com/bonzini/vrc:

# find functions that *really* cannot be called from no_coroutine_fn
(vrc) load --loader clang 
libblock.fa.p/meson-generated_.._block_block-gen.c.o
# The comma is an "AND".  The "path" here consists of a single node
(vrc) paths [no_coroutine_fn,!coroutine_mixed_fn]
bdrv_remove_persistent_dirty_bitmap
bdrv_create
bdrv_can_store_new_dirty_bitmap

# find how coroutine_fns end up calling a mixed function
(vrc) load --loader clang --force libblock.fa.p/*.c.o
# regular expression search
(vrc) paths [coroutine_fn] [!no_coroutine_fn]* [coroutine_mixed_fn]
...
bdrv_pread <- vhdx_log_write <- vhdx_log_write_and_flush <- vhdx_co_writev
...

Signed-off-by: Alberto Faria 
[Rebase, add coroutine_mixed_fn. - Paolo]
Signed-off-by: Paolo Bonzini 
---
 include/block/block-common.h | 11 +++
 include/qemu/coroutine.h | 33 +
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index 4749c46a5e7e..cce79bd00135 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -50,11 +50,14 @@
  * - co_wrapper_mixed_bdrv_rdlock are co_wrapper_mixed functions but
  *   automatically take and release the graph rdlock when creating a new
  *   coroutine.
+ *
+ * These functions should not be called from a coroutine_fn; instead,
+ * call the wrapped function directly.
  */
-#define co_wrapper
-#define co_wrapper_mixed
-#define co_wrapper_bdrv_rdlock
-#define co_wrapper_mixed_bdrv_rdlock
+#define co_wrapper no_coroutine_fn
+#define co_wrapper_mixed   no_coroutine_fn coroutine_mixed_fn
+#define co_wrapper_bdrv_rdlock no_coroutine_fn
+#define co_wrapper_mixed_bdrv_rdlock   no_coroutine_fn coroutine_mixed_fn
 
 #include "block/dirty-bitmap.h"
 #include "block/blockjob.h"
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index b0c97f6fb7ad..5f5ab8136a3a 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -28,6 +28,27 @@
  * These functions are re-entrant and may be used outside the global mutex.
  */
 
+/**
+ * Mark a function that can suspend when executed in coroutine context,
+ * but can handle running in non-coroutine context too.
+ *
+ * Functions that execute in coroutine context cannot be called directly from
+ * normal functions.  In the future it would be nice to enable compiler or
+ * static checker support for catching such errors.  This annotation might make
+ * it possible and in the meantime it serves as documentation.
+ *
+ * For example:
+ *
+ *   static void coroutine_fn foo(void) {
+ *   
+ *   }
+ */
+#ifdef __clang__
+#define coroutine_mixed_fn __attribute__((__annotate__("coroutine_mixed_fn")))
+#else
+#define coroutine_mixed_fn
+#endif
+
 /**
  * Mark a function that executes in coroutine context
  *
@@ -48,6 +69,18 @@
 #define coroutine_fn
 #endif
 
+/**
+ * Mark a function that should never be called from a coroutine context
+ *
+ * This typically means that there is an analogous, coroutine_fn function that
+ * should be used instead.
+ */
+#ifdef __clang__
+#define no_coroutine_fn __attribute__((__annotate__("no_coroutine_fn")))
+#else
+#define no_coroutine_fn
+#endif
+
 typedef struct Coroutine Coroutine;
 
 /**
-- 
2.38.1

[PULL v2 00/28] target-arm queue

drop the sysregs patch as the tcg sysregs test fails
(probably a bug in the test)

-- PMM

The following changes since commit ae2b87341b5ddb0dcb1b3f2d4f586ef18de75873:

  Merge tag 'pull-qapi-2022-12-14-v2' of https://repo.or.cz/qemu/armbru into 
staging (2022-12-14 22:42:14 +)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20221215-1

for you to fetch changes up to 9e406eea309bbe44c7fb17f6af112d2b756854ad:

  target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator (2022-12-15 
17:37:48 +)


target-arm queue:
 * hw/arm/virt: Add properties to allow more granular
   configuration of use of highmem space
 * target/arm: Add Cortex-A55 CPU
 * hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
 * Implement FEAT_EVT
 * Some 3-phase-reset conversions for Arm GIC, SMMU
 * hw/arm/boot: set initrd with #address-cells type in fdt
 * hw/misc: Move some arm-related files from specific_ss into softmmu_ss
 * Restrict arm_cpu_exec_interrupt() to TCG accelerator


Gavin Shan (7):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  hw/arm/virt: Improve high memory region address assignment
  hw/arm/virt: Add 'compact-highmem' property
  hw/arm/virt: Add properties to disable high memory regions

Luke Starrett (1):
  hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement

Mihai Carabas (1):
  hw/arm/virt: build SMBIOS 19 table

Peter Maydell (15):
  target/arm: Allow relevant HCR bits to be written for FEAT_EVT
  target/arm: Implement HCR_EL2.TTLBIS traps
  target/arm: Implement HCR_EL2.TTLBOS traps
  target/arm: Implement HCR_EL2.TICAB,TOCU traps
  target/arm: Implement HCR_EL2.TID4 traps
  target/arm: Report FEAT_EVT for TCG '-cpu max'
  hw/arm: Convert TYPE_ARM_SMMU to 3-phase reset
  hw/arm: Convert TYPE_ARM_SMMUV3 to 3-phase reset
  hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset
  hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset
  hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset
  hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset
  hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset
  hw/intc: Convert TYPE_ARM_GICV3_ITS to 3-phase reset
  hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset

Philippe Mathieu-Daudé (1):
  target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator

Schspa Shi (1):
  hw/arm/boot: set initrd with #address-cells type in fdt

Thomas Huth (1):
  hw/misc: Move some arm-related files from specific_ss into softmmu_ss

Timofey Kutergin (1):
  target/arm: Add Cortex-A55 CPU

 docs/system/arm/emulation.rst  |   1 +
 docs/system/arm/virt.rst   |  18 +++
 include/hw/arm/smmuv3.h|   2 +-
 include/hw/arm/virt.h  |   2 +
 include/hw/misc/xlnx-zynqmp-apu-ctrl.h |   2 +-
 target/arm/cpu.h   |  30 +
 target/arm/kvm-consts.h|   8 +-
 hw/arm/boot.c  |  10 +-
 hw/arm/smmu-common.c   |   7 +-
 hw/arm/smmuv3.c|  12 +-
 hw/arm/virt.c  | 202 +++--
 hw/intc/arm_gic_common.c   |   7 +-
 hw/intc/arm_gic_kvm.c  |  14 ++-
 hw/intc/arm_gicv3_common.c |   7 +-
 hw/intc/arm_gicv3_dist.c   |   4 +-
 hw/intc/arm_gicv3_its.c|  14 ++-
 hw/intc/arm_gicv3_its_common.c |   7 +-
 hw/intc/arm_gicv3_its_kvm.c|  14 ++-
 hw/intc/arm_gicv3_kvm.c|  14 ++-
 hw/misc/imx6_src.c |   2 +-
 hw/misc/iotkit-sysctl.c|   1 -
 target/arm/cpu.c   |   5 +-
 target/arm/cpu64.c |  70 
 target/arm/cpu_tcg.c   |   1 +
 target/arm/helper.c| 135 ++
 hw/misc/meson.build|  11 +-
 26 files changed, 459 insertions(+), 141 deletions(-)

Re: [RFC PATCH] includes: move tb_flush into its own header

2022-12-15 Thread Philippe Mathieu-Daudé


On 15/12/22 15:09, Alex Bennée wrote:

This aids subsystems (like gdbstub) that want to trigger a flush
without pulling target specific headers.

[AJB: RFC because this is part of a larger gdbstub series but I wanted
to post for feedback in case anyone wants to suggest better naming].

Signed-off-by: Alex Bennée 
---
  include/exec/exec-all.h | 1 -
  linux-user/user-internals.h | 1 +
  accel/tcg/tb-maint.c| 1 +
  accel/tcg/translate-all.c   | 1 +
  cpu.c   | 1 +
  gdbstub/gdbstub.c   | 1 +
  hw/ppc/spapr_hcall.c| 1 +
  plugins/core.c  | 1 +
  plugins/loader.c| 2 +-
  target/alpha/sys_helper.c   | 1 +
  target/riscv/csr.c  | 1 +
  11 files changed, 10 insertions(+), 2 deletions(-)


While playing there you might want to review a companion series:
https://lore.kernel.org/qemu-devel/20221209093649.43738-1-phi...@linaro.org/
"Restrict page_collection structure to system TB maintainance"

Re: [PATCH 1/5] include/hw/pci: Clean up superfluous inclusion of pci/.h cxl/*.h

2022-12-15 Thread Jonathan Cameron via

On Thu, 15 Dec 2022 08:14:52 +0100
Markus Armbruster  wrote:

> Jonathan Cameron  writes:
> 
> > On Fri,  9 Dec 2022 14:47:58 +0100
> > Markus Armbruster  wrote:
> >
> > Hi Markus,
> >
> > One comment on the CXL ones.  Others CXL related changes
> > all looks fine to me.
> >
> > Thanks for cleaning these up.
> >
> > Jonathan
> >
> >  
> >> diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
> >> index 38e0e271d5..5129557bee 100644
> >> --- a/include/hw/cxl/cxl.h
> >> +++ b/include/hw/cxl/cxl.h
> >> @@ -13,7 +13,6 @@
> >>  
> >>  #include "qapi/qapi-types-machine.h"
> >>  #include "qapi/qapi-visit-machine.h"
> >> -#include "hw/pci/pci_bridge.h"  
> >
> > If we drop this, we probably want a forwards def of
> > struct PXBDev  
> 
> Why?  Because it's used in the header?
> 
> > I should probably be using the typedef in here as well rather
> > than struct PXBDev * in CXLFixed Window so we'd need
> > to deal with making that visible too.  
> 
> We have two typedef struct PXBDev PXBDev, one n pci_bridge.h, and one in
> pci_expander_bridge.c.  Both include cxl.h.  Move it to cxl.h?

Sure.

> 
> >>  #include "hw/pci/pci_host.h"
> >>  #include "cxl_pci.h"
> >>  #include "cxl_component.h"  
> >  
> >>  #define CXL_VENDOR_ID 0x1e98  
>

Re: [PATCH 1/5] io: Add support for MSG_PEEK for socket channel

2022-12-15 Thread Peter Xu

On Thu, Dec 15, 2022 at 09:40:41AM +, Daniel P. Berrangé wrote:
> On Wed, Dec 14, 2022 at 04:30:48PM -0500, Peter Xu wrote:
> > On Wed, Dec 14, 2022 at 09:14:09AM +, Daniel P. Berrangé wrote:
> > > On Tue, Dec 13, 2022 at 04:38:46PM -0500, Peter Xu wrote:
> > > > From: "manish.mishra" 
> > > > 
> > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > unread and the next read shall still return this data. This
> > > > support is currently added only for socket class. Extra parameter
> > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > MSG_PEEK.
> > > > 
> > > > Reviewed-by: Daniel P. Berrang??  > > > Suggested-by: Daniel P. Berrang??  > > 
> > > The last letter of my name has been mangled - whatever tools used
> > > to pull in manish's patches seem to not be UTF-8 clean.
> > > 
> > > Also the email addr isn't terminated, but that was pre-existing
> > > in manish's previous posting.
> > 
> > I'll fix at least the latter in my next post, sorry.
> > 
> > For the 1st one - I am still looking at what went wrong.
> > 
> > Here from the web interfaces it all looks good (besides the wrong
> > ending..), e.g. on lore or patchew:
> > 
> > https://lore.kernel.org/all/20221213213850.1481858-2-pet...@redhat.com/
> > https://patchew.org/QEMU/20221213213850.1481858-1-pet...@redhat.com/20221213213850.1481858-2-pet...@redhat.com/
> > 
> > It also looks good with e.g. Gmail webclient.
> > 
> > Then I digged into the email headers and I found that comparing to Manish's
> > original message, the patches I posted has one more line of "Content-type":
> > 
> >   Content-Type: text/plain; charset="utf-8"
> >   Content-type: text/plain
> >   https://patchew.org/QEMU/20221213213850.1481858-2-pet...@redhat.com/mbox
> > 
> > While Manish's patch only has one line:
> > 
> >   Content-Type: text/plain; charset="utf-8"
> >   
> > https://patchew.org/QEMU/20221123172735.25181-2-manish.mis...@nutanix.com/mbox
> 
> Don't trust what is shown by patchew, as that's been through many
> hops.
> 
> The copy I receieved came directly to me via CC, so didn't hit mailman,
> nor patchew, and that *only* has  "Content-type: text/plain".  So the
> extra Content-type line with utf8 must have been added either by
> mailman or patchew.
> 
> So it probably looks like a config problem in the tool you use to send
> the patches originally.

Ouch... for misterious reasons I had one line in .gitconfig:

177 [format]
178 headers = "Content-type: text/plain"

And that'll also affect git-publish too...  I have it dropped now.

Thanks,

-- 
Peter Xu

Re: regression: insmod module failed in VM with nvdimm on

2022-12-15 Thread Thorsten Leemhuis

Hi, this is your Linux kernel regression tracker. Top-posting for once,
to make this easily accessible to everyone.

Was there some progress to get this regression resolved? From here it
looks stalled, but maybe I missed something.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

On 02.12.22 14:44, Ard Biesheuvel wrote:
> On Fri, 2 Dec 2022 at 03:48, chenxiang (M)  wrote:
>>
>> Hi Ard,
>>
>>
>> 在 2022/12/1 19:07, Ard Biesheuvel 写道:
>>> On Thu, 1 Dec 2022 at 09:07, Ard Biesheuvel  wrote:
 On Thu, 1 Dec 2022 at 08:15, chenxiang (M)  
 wrote:
> Hi Ard,
>
>
> 在 2022/11/30 16:18, Ard Biesheuvel 写道:
>> On Wed, 30 Nov 2022 at 08:53, Marc Zyngier  wrote:
>>> On Wed, 30 Nov 2022 02:52:35 +,
>>> "chenxiang (M)"  wrote:
 Hi,

 We boot the VM using following commands (with nvdimm on)  (qemu
 version 6.1.50, kernel 6.0-r4):
>>> How relevant is the presence of the nvdimm? Do you observe the failure
>>> without this?
>>>
 qemu-system-aarch64 -machine
 virt,kernel_irqchip=on,gic-version=3,nvdimm=on  -kernel
 /home/kernel/Image -initrd /home/mini-rootfs/rootfs.cpio.gz -bios
 /root/QEMU_EFI.FD -cpu host -enable-kvm -net none -nographic -m
 2G,maxmem=64G,slots=3 -smp 4 -append 'rdinit=init console=ttyAMA0
 ealycon=pl0ll,0x9000 pcie_ports=native pciehp.pciehp_debug=1'
 -object memory-backend-ram,id=ram1,size=10G -device
 nvdimm,id=dimm1,memdev=ram1  -device ioh3420,id=root_port1,chassis=1
 -device vfio-pci,host=7d:01.0,id=net0,bus=root_port1

 Then in VM we insmod a module, vmalloc error occurs as follows (kernel
 5.19-rc4 is normal, and the issue is still on kernel 6.1-rc4):

 estuary:/$ insmod /lib/modules/$(uname -r)/hnae3.ko
 [8.186563] vmap allocation for size 20480 failed: use
 vmalloc= to increase size
>>> Have you tried increasing the vmalloc size to check that this is
>>> indeed the problem?
>>>
>>> [...]
>>>
 We git bisect the code, and find the patch c5a89f75d2a ("arm64: kaslr:
 defer initialization to initcall where permitted").
>>> I guess you mean commit fc5a89f75d2a instead, right?
>>>
 Do you have any idea about the issue?
>>> I sort of suspect that the nvdimm gets vmap-ed and consumes a large
>>> portion of the vmalloc space, but you give very little information
>>> that could help here...
>>>
>> Ouch. I suspect what's going on here: that patch defers the
>> randomization of the module region, so that we can decouple it from
>> the very early init code.
>>
>> Obviously, it is happening too late now, and the randomized module
>> region is overlapping with a vmalloc region that is in use by the time
>> the randomization occurs.
>>
>> Does the below fix the issue?
> The issue still occurs, but it seems decrease the probability, before it
> occured almost every time, after the change, i tried 2-3 times, and it
> occurs.
> But i change back "subsys_initcall" to "core_initcall", and i test more
> than 20 times, and it is still ok.
>
 Thank you for confirming. I will send out a patch today.

>>> ...but before I do that, could you please check whether the change
>>> below fixes your issue as well?
>>>
>>> diff --git a/arch/arm64/kernel/kaslr.c b/arch/arm64/kernel/kaslr.c
>>> index 6ccc7ef600e7c1e1..c8c205b630da1951 100644
>>> --- a/arch/arm64/kernel/kaslr.c
>>> +++ b/arch/arm64/kernel/kaslr.c
>>> @@ -20,7 +20,11 @@
>>>   #include 
>>>   #include 
>>>
>>> -u64 __ro_after_init module_alloc_base;
>>> +/*
>>> + * Set a reasonable default for module_alloc_base in case
>>> + * we end up running with module randomization disabled.
>>> + */
>>> +u64 __ro_after_init module_alloc_base = (u64)_etext - MODULES_VSIZE;
>>>   u16 __initdata memstart_offset_seed;
>>>
>>>   struct arm64_ftr_override kaslr_feature_override __initdata;
>>> @@ -30,12 +34,6 @@ static int __init kaslr_init(void)
>>>  u64 module_range;
>>>  u32 seed;
>>>
>>> -   /*
>>> -* Set a reasonable default for module_alloc_base in case
>>> -* we end up running with module randomization disabled.
>>> -*/
>>> -   module_alloc_base = (u64)_etext - MODULES_VSIZE;
>>> -
>>>  if (kaslr_feature_override.val & kaslr_feature_override.mask & 
>>> 0xf) {
>>>  pr_info("KASLR disabled on command line\n");
>>>  return 0;
>>> .
>>
>> We have tested this change, the issue is still and it doesn't fix the issue.
>>
> 
> Thanks for the report.
> 
> _

Re: [PATCH v2] hw/cxl/device: Add Flex Bus Port DVSEC

2022-12-15 Thread Ira Weiny

On Thu, Dec 15, 2022 at 05:16:33PM +, Jonathan Cameron wrote:
> On Wed, 14 Dec 2022 12:54:11 -0800
> Ira Weiny  wrote:
> 
> > The Flex Bus Port DVSEC was missing on type 3 devices which was blocking
> > RAS checks.[1]
> > 
> > Add the Flex Bus Port DVSEC to type 3 devices as per CXL 3.0 8.2.1.3.
> > 
> > [1] 
> > https://lore.kernel.org/linux-cxl/167096738875.2861540.11815053323626849940.st...@djiang5-desk3.ch.intel.com/
> > 
> > Cc: Dave Jiang 
> > Cc: Jonathan Cameron 
> > Cc: Ben Widawsky 
> > Cc: qemu-devel@nongnu.org
> > Cc: linux-...@vger.kernel.org
> > Signed-off-by: Ira Weiny 
> Looks good to me.
> 
> Reviewed-by: Jonathan Cameron 
> 
> As Michael wasn't cc'd on patch posting, so might not get this directly I'll 
> add
> it to the front of the series adding the RAS event emulation on basis that's 
> the
> first time we'll see a failure in Linux (I think?)

Ah thanks!

Sorry, I thought you were the 'maintainer' of the CXL stuff for qemu.

> 
> Michael, if you want to pick this up directly that's great too!

Should I send directly to Michael in future?

> 
> As a side note the WTF? is because we made up a hardware related time delay
> number having no idea whatsoever on what a realistic value was. Cut and paste
> from the instances of this structure in the root port and the switch ports.
> 

Yep I just followed that based off the other code.

Ira

> Jonathan
> 
> 
> 
> > ---
> > Changes in v2:
> > Jonathan
> > type 3 device does not support CACHE
> > Comment the 68B bit 
> > 
> > - Link to v1: 
> > https://lore.kernel.org/r/20221213-ira-flexbus-port-v1-1-86afd4f30...@intel.com
> > ---
> >  hw/mem/cxl_type3.c | 11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index 0317bd96a6fb..e6beac143fc1 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -416,6 +416,17 @@ static void build_dvsecs(CXLType3Dev *ct3d)
> >  cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
> > GPF_DEVICE_DVSEC_LENGTH, GPF_DEVICE_DVSEC,
> > GPF_DEVICE_DVSEC_REVID, dvsec);
> > +
> > +dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){
> > +.cap = 0x26, /* 68B, IO, Mem, non-MLD */
> > +.ctrl= 0x02, /* IO always enabled */
> > +.status  = 0x26, /* same as capabilities */
> > +.rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */
> > +};
> > +cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
> > +   PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
> > +   PCIE_FLEXBUS_PORT_DVSEC,
> > +   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
> >  }
> >  
> >  static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
> > 
> > ---
> > base-commit: e11b57108b0cb746bb9f3887054f34a2f818ed79
> > change-id: 20221213-ira-flexbus-port-ce526de8111d
> > 
> > Best regards,
>

Re: [PATCH v2] hw/cxl/device: Add Flex Bus Port DVSEC

2022-12-15 Thread Jonathan Cameron via

On Wed, 14 Dec 2022 12:54:11 -0800
Ira Weiny  wrote:

> The Flex Bus Port DVSEC was missing on type 3 devices which was blocking
> RAS checks.[1]
> 
> Add the Flex Bus Port DVSEC to type 3 devices as per CXL 3.0 8.2.1.3.
> 
> [1] 
> https://lore.kernel.org/linux-cxl/167096738875.2861540.11815053323626849940.st...@djiang5-desk3.ch.intel.com/
> 
> Cc: Dave Jiang 
> Cc: Jonathan Cameron 
> Cc: Ben Widawsky 
> Cc: qemu-devel@nongnu.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Ira Weiny 
Looks good to me.

Reviewed-by: Jonathan Cameron 

As Michael wasn't cc'd on patch posting, so might not get this directly I'll add
it to the front of the series adding the RAS event emulation on basis that's the
first time we'll see a failure in Linux (I think?)

Michael, if you want to pick this up directly that's great too!

As a side note the WTF? is because we made up a hardware related time delay
number having no idea whatsoever on what a realistic value was. Cut and paste
from the instances of this structure in the root port and the switch ports.

Jonathan



> ---
> Changes in v2:
> Jonathan
> type 3 device does not support CACHE
> Comment the 68B bit 
> 
> - Link to v1: 
> https://lore.kernel.org/r/20221213-ira-flexbus-port-v1-1-86afd4f30...@intel.com
> ---
>  hw/mem/cxl_type3.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 0317bd96a6fb..e6beac143fc1 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -416,6 +416,17 @@ static void build_dvsecs(CXLType3Dev *ct3d)
>  cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
> GPF_DEVICE_DVSEC_LENGTH, GPF_DEVICE_DVSEC,
> GPF_DEVICE_DVSEC_REVID, dvsec);
> +
> +dvsec = (uint8_t *)&(CXLDVSECPortFlexBus){
> +.cap = 0x26, /* 68B, IO, Mem, non-MLD */
> +.ctrl= 0x02, /* IO always enabled */
> +.status  = 0x26, /* same as capabilities */
> +.rcvd_mod_ts_data_phase1 = 0xef, /* WTF? */
> +};
> +cxl_component_create_dvsec(cxl_cstate, CXL2_TYPE3_DEVICE,
> +   PCIE_FLEXBUS_PORT_DVSEC_LENGTH_2_0,
> +   PCIE_FLEXBUS_PORT_DVSEC,
> +   PCIE_FLEXBUS_PORT_DVSEC_REVID_2_0, dvsec);
>  }
>  
>  static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
> 
> ---
> base-commit: e11b57108b0cb746bb9f3887054f34a2f818ed79
> change-id: 20221213-ira-flexbus-port-ce526de8111d
> 
> Best regards,

Re: [RFC v3 1/3] memory: add depth assert in address_space_to_flatview

On Tue, 13 Dec 2022 at 13:36, Chuang Xu  wrote:
>
> Before using any flatview, sanity check we're not during a memory
> region transaction or the map can be invalid.
>
> Signed-off-by: Chuang Xu 
> ---
>  include/exec/memory.h | 9 +
>  softmmu/memory.c  | 1 -
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 91f8a2395a..b43cd46084 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1069,8 +1069,17 @@ struct FlatView {
>  MemoryRegion *root;
>  };
>
> +static unsigned memory_region_transaction_depth;

This looks odd. If you define a static variable in a
header file then each .c file which directly or indirectly
includes the header will get its own private copy of the
variable. This probably isn't what you want...

thanks
-- PMM

Re: [RFC PATCH] includes: move tb_flush into its own header


On 12/15/22 08:46, Alex Bennée wrote:

I'll rename and include when I send the gdbstub stuff. I don't know how
far you want to go to eliminate target specific handling from the rest
of TB maintenance - indeed I'm not sure anything else is possible?


I can't think that anything else is possible.


r~

Re: [RFC PATCH] includes: move tb_flush into its own header

2022-12-15 Thread Alex Bennée



Richard Henderson  writes:

> On 12/15/22 06:09, Alex Bennée wrote:
>> This aids subsystems (like gdbstub) that want to trigger a flush
>> without pulling target specific headers.
>> [AJB: RFC because this is part of a larger gdbstub series but I
>> wanted
>> to post for feedback in case anyone wants to suggest better naming].
>> Signed-off-by: Alex Bennée 
>> ---
>>   include/exec/exec-all.h | 1 -
>>   linux-user/user-internals.h | 1 +
>>   accel/tcg/tb-maint.c| 1 +
>>   accel/tcg/translate-all.c   | 1 +
>>   cpu.c   | 1 +
>>   gdbstub/gdbstub.c   | 1 +
>>   hw/ppc/spapr_hcall.c| 1 +
>>   plugins/core.c  | 1 +
>>   plugins/loader.c| 2 +-
>>   target/alpha/sys_helper.c   | 1 +
>>   target/riscv/csr.c  | 1 +
>>   11 files changed, 10 insertions(+), 2 deletions(-)
>
> It appears as if you forgot to add tb-common.h.
> That said, if this is intended to have exactly one thing, tb-flush.h
> might be better.

I'll rename and include when I send the gdbstub stuff. I don't know how
far you want to go to eliminate target specific handling from the rest
of TB maintenance - indeed I'm not sure anything else is possible? 

>
>
> r~


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PULL 00/19] Next 8.0 patches

On Thu, 15 Dec 2022 at 09:39, Juan Quintela  wrote:
>
> The following changes since commit 5204b499a6cae4dfd9fe762d5e6e82224892383b:
>
>   mailmap: Fix Stefan Weil author email (2022-12-13 15:56:57 -0500)
>
> are available in the Git repository at:
>
>   https://gitlab.com/juan.quintela/qemu.git tags/next-8.0-pull-request
>
> for you to fetch changes up to 7f401b80445e8746202a6d643410ba1b9eeb3cb1:
>
>   migration: Drop rs->f (2022-12-15 10:30:37 +0100)
>
> 
> Migration patches for 8.0
>
> Hi
>
> This are the patches that I had to drop form the last PULL request because 
> they werent fixes:
> - AVX2 is dropped, intel posted a fix, I have to redo it
> - Fix for out of order channels is out
>   Daniel nacked it and I need to redo it
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [RFC PATCH] includes: move tb_flush into its own header


On 12/15/22 06:09, Alex Bennée wrote:

This aids subsystems (like gdbstub) that want to trigger a flush
without pulling target specific headers.

[AJB: RFC because this is part of a larger gdbstub series but I wanted
to post for feedback in case anyone wants to suggest better naming].

Signed-off-by: Alex Bennée 
---
  include/exec/exec-all.h | 1 -
  linux-user/user-internals.h | 1 +
  accel/tcg/tb-maint.c| 1 +
  accel/tcg/translate-all.c   | 1 +
  cpu.c   | 1 +
  gdbstub/gdbstub.c   | 1 +
  hw/ppc/spapr_hcall.c| 1 +
  plugins/core.c  | 1 +
  plugins/loader.c| 2 +-
  target/alpha/sys_helper.c   | 1 +
  target/riscv/csr.c  | 1 +
  11 files changed, 10 insertions(+), 2 deletions(-)


It appears as if you forgot to add tb-common.h.
That said, if this is intended to have exactly one thing, tb-flush.h might be 
better.


r~

Re: [RFC v3 1/3] memory: add depth assert in address_space_to_flatview

2022-12-15 Thread Peter Xu

On Wed, Dec 14, 2022 at 04:38:52PM -0500, Peter Xu wrote:
> On Wed, Dec 14, 2022 at 08:03:38AM -0800, Chuang Xu wrote:
> > On 2022/12/13 下午9:35, Chuang Xu wrote:
> > 
> > Before using any flatview, sanity check we're not during a memory
> > region transaction or the map can be invalid.
> > 
> > Signed-off-by: Chuang Xu 
> > 
> > ---
> >  include/exec/memory.h | 9 +
> >  softmmu/memory.c  | 1 -
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 91f8a2395a..b43cd46084 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -1069,8 +1069,17 @@ struct FlatView {
> >  MemoryRegion *root;
> >  };
> > 
> > +static unsigned memory_region_transaction_depth;
> > +
> >  static inline FlatView *address_space_to_flatview(AddressSpace *as)
> >  {
> > +/*
> > + * Before using any flatview, sanity check we're not during a memory
> > + * region transaction or the map can be invalid.  Note that this can
> > + * also be called during commit phase of memory transaction, but that
> > + * should also only happen when the depth decreases to 0 first.
> > + */
> > +assert(memory_region_transaction_depth == 0);
> >  return qatomic_rcu_read(&as->current_map);
> >  }
> > 
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index bc0be3f62c..f177c40cd8 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -37,7 +37,6 @@
> > 
> >  //#define DEBUG_UNASSIGNED
> > 
> > -static unsigned memory_region_transaction_depth;
> >  static bool memory_region_update_pending;
> >  static bool ioeventfd_update_pending;
> >  unsigned int global_dirty_tracking;
> > 
> > Here are some new situations to be synchronized.
> > 
> > I found that there is a probability to trigger assert in the QEMU startup 
> > phase.
> > 
> > Here is the coredump backtrace:
> > 
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x7f7825e33535 in __GI_abort () at abort.c:79
> > #2  0x7f7825e3340f in __assert_fail_base (fmt=0x7f7825f94ef0
> > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5653c643add8
> > "memory_region_transaction_depth == 0",
> > file=0x5653c63dad78
> > "/data00/migration/qemu-open/include/exec/memory.h", line=1082,
> > function=) at assert.c:92
> > #3  0x7f7825e411a2 in __GI___assert_fail
> > (assertion=assertion@entry=0x5653c643add8
> > "memory_region_transaction_depth == 0",
> > file=file@entry=0x5653c63dad78
> > "/data00/migration/qemu-open/include/exec/memory.h",
> > line=line@entry=1082,
> > function=function@entry=0x5653c643bd00 <__PRETTY_FUNCTION__.18101>
> > "address_space_to_flatview") at assert.c:101
> > #4  0x5653c60f0383 in address_space_to_flatview (as=0x5653c6af2340
> > ) at
> > /data00/migration/qemu-open/include/exec/memory.h:1082
> > #5  address_space_to_flatview (as=0x5653c6af2340
> > ) at
> > /data00/migration/qemu-open/include/exec/memory.h:1074
> > #6  address_space_get_flatview (as=0x5653c6af2340
> > ) at ../softmmu/memory.c:809
> > #7  0x5653c60fef04 in address_space_cache_init
> > (cache=cache@entry=0x7f781fff8420, as=,
> > addr=63310635776, len=48, is_write=is_write@entry=false)
> > at ../softmmu/physmem.c:3352
> > #8  0x5653c60c08c5 in virtqueue_split_pop (vq=0x7f781c576270,
> > sz=264) at ../hw/virtio/virtio.c:1959
> > #9  0x5653c60c0b7d in virtqueue_pop (vq=vq@entry=0x7f781c576270,
> > sz=) at ../hw/virtio/virtio.c:2177
> > #10 0x5653c609f14f in virtio_scsi_pop_req
> > (s=s@entry=0x5653c9034300, vq=vq@entry=0x7f781c576270) at
> > ../hw/scsi/virtio-scsi.c:219
> > #11 0x5653c60a04a3 in virtio_scsi_handle_cmd_vq
> > (vq=0x7f781c576270, s=0x5653c9034300) at ../hw/scsi/virtio-scsi.c:735
> > #12 virtio_scsi_handle_cmd (vdev=0x5653c9034300, vq=0x7f781c576270) at
> > ../hw/scsi/virtio-scsi.c:776
> > #13 0x5653c60ba72f in virtio_queue_notify_vq (vq=0x7f781c576270)
> > at ../hw/virtio/virtio.c:2847
> > #14 0x5653c62d9706 in aio_dispatch_handler
> > (ctx=ctx@entry=0x5653c84909e0, node=0x7f68e4007840) at
> > ../util/aio-posix.c:369
> > #15 0x5653c62da254 in aio_dispatch_ready_handlers
> > (ready_list=0x7f781fffe6a8, ctx=0x5653c84909e0) at
> > ../util/aio-posix.c:399
> > #16 aio_poll (ctx=0x5653c84909e0, blocking=blocking@entry=true) at
> > ../util/aio-posix.c:713
> > #17 0x5653c61b2296 in iothread_run
> > (opaque=opaque@entry=0x5653c822c390) at ../iothread.c:67
> > #18 0x5653c62dcd8a in qemu_thread_start (args=) at
> > ../util/qemu-thread-posix.c:505
> > #19 0x7f7825fd8fa3 in start_thread (arg=) at
> > pthread_create.c:486
> > #20 0x7f7825f0a06f in clone () at
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> This does look like a bug to me.
> 
> Paolo/Michael?

Hmm, I found that virtqueue_split_pop() took the rcu lock.. then I think
it's fine.

Chuang, I think what you can try next is add a helper to detect holding of
rcu lock, then assert with "de

Re: [PATCH] linux-user: Add translation for argument of msync()


On 12/14/22 23:58, Philippe Mathieu-Daudé wrote:

--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -3,6 +3,10 @@

  #define TARGET_MADV_DONTNEED 6

+#define TARGET_MS_ASYNC 1
+#define TARGET_MS_SYNC 2
+#define TARGET_MS_INVALIDATE 4
+
  #include "../generic/target_mman.h"

  #endif
diff --git a/linux-user/generic/target_mman.h b/linux-user/generic/target_mman.h
index 1436a3c543..32bf1a52d0 100644
--- a/linux-user/generic/target_mman.h
+++ b/linux-user/generic/target_mman.h
@@ -89,4 +89,17 @@
  #define TARGET_MADV_DONTNEED_LOCKED 24
  #endif

+
+#ifndef TARGET_MS_ASYNC
+#define TARGET_MS_ASYNC 1


Hmm don't we want to keep the host flag instead?

    #define TARGET_MS_ASYNC MS_ASYNC


No.  What if the host has an odd value, like Alpha.


r~

Re: [PATCH 3/8] tcg/loongarch64: Update tcg-insn-defs.c.inc