Hi, On 11/22/22 10:43, Michael S. Tsirkin wrote: > On Tue, Nov 22, 2022 at 11:01:11AM +0800, Jason Wang wrote: >> When vIOMMU is enabled, the vq->used_phys is actually the IOVA not >> GPA. So we need to translate it to GPA before the syncing otherwise we >> may hit the following crash since IOVA could be out of the scope of >> the GPA log size. This could be noted when using virtio-IOMMU with >> vhost using 1G memory. > Noted how exactly? What does "using 1G memory" mean?
We hit the following assertion: qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed. On the last time vhost_get_log_size() is called it takes into account 2 regions when computing the log_size: qemu-system-x86_64: vhost_get_log_size region 0 last=0x9ffff updated log_size=0x3 qemu-system-x86_64: vhost_get_log_size region 1 last=0x3fffffff updated log_size=0x1000 so in vhost_migration_log() vhost_get_log_size(dev) returns 0x1000 In the test case, memory_region_sync_dirty_bitmap() gets called for mem-machine_mem, vga.vram (several times) and eventually on pc.bios. This latter is reponsible for the assertion: qemu-system-x86_64: vhost_log_sync calls sync_dirty_map on pc.bios for the full range qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on region 0 qemu-system-x86_64: vhost_dev_sync_region end=0x9ffff < start=0xfffc0000 qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on region 1 qemu-system-x86_64: vhost_dev_sync_region end=0x3fffffff < start=0xfffc0000 qemu-system-x86_64: vhost_sync_dirty_bitmap calls vhost_dev_sync_region on vq 0 <----- qemu-system-x86_64: vhost_dev_sync_region pc.bios mfirst=0xfffc0000 mlast=0xffffffff rfirst=0xfffff240 rlast=0xfffffa45 qemu-system-x86_64: vhost_dev_sync_region pc.bios end=0xfffffa45 VHOST_LOG_CHUNK=0x40000 end/VHOST_LOG_CHUNK=0x3fff dev->log_size=0x1000 qemu-system-x86_64: ../hw/virtio/vhost.c:85: vhost_dev_sync_region: Assertion `end / VHOST_LOG_CHUNK < dev->log_size' failed. "using 1G memory": We hit the issue with a guest started with 1GB initial RAM. > >> Fixes: c471ad0e9bd46 ("vhost_net: device IOTLB support") >> Cc: qemu-sta...@nongnu.org >> Reported-by: Yalan Zhang <yalzh...@redhat.com> >> Tested-by: Eric Auger <eric.au...@redhat.com> >> Tested-by: Lei Yang <leiy...@redhat.com> >> Signed-off-by: Jason Wang <jasow...@redhat.com> >> --- >> hw/virtio/vhost.c | 65 ++++++++++++++++++++++++++++++++--------------- >> 1 file changed, 45 insertions(+), 20 deletions(-) >> >> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >> index d1c4c20b8c..26b319f34e 100644 >> --- a/hw/virtio/vhost.c >> +++ b/hw/virtio/vhost.c >> @@ -106,11 +106,30 @@ static void vhost_dev_sync_region(struct vhost_dev >> *dev, >> } >> } >> >> +static bool vhost_dev_has_iommu(struct vhost_dev *dev) >> +{ >> + VirtIODevice *vdev = dev->vdev; >> + >> + /* >> + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support >> + * incremental memory mapping API via IOTLB API. For platform that >> + * does not have IOMMU, there's no need to enable this feature >> + * which may cause unnecessary IOTLB miss/update transactions. >> + */ >> + if (vdev) { >> + return virtio_bus_device_iommu_enabled(vdev) && >> + virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM); >> + } else { >> + return false; >> + } >> +} >> + >> static int vhost_sync_dirty_bitmap(struct vhost_dev *dev, >> MemoryRegionSection *section, >> hwaddr first, >> hwaddr last) >> { >> + IOMMUTLBEntry iotlb; > why don't we move this inside the scope where it's used? > >> int i; >> hwaddr start_addr; >> hwaddr end_addr; >> @@ -132,13 +151,37 @@ static int vhost_sync_dirty_bitmap(struct vhost_dev >> *dev, >> } >> for (i = 0; i < dev->nvqs; ++i) { >> struct vhost_virtqueue *vq = dev->vqs + i; >> + hwaddr used_phys = vq->used_phys, used_size = vq->used_size; >> + hwaddr phys, s; > these two, too. > >> >> if (!vq->used_phys && !vq->used_size) { >> continue; >> } >> >> - vhost_dev_sync_region(dev, section, start_addr, end_addr, >> vq->used_phys, >> - range_get_last(vq->used_phys, vq->used_size)); >> + if (vhost_dev_has_iommu(dev)) { >> + while (used_size) { >> + rcu_read_lock(); >> + iotlb = address_space_get_iotlb_entry(dev->vdev->dma_as, >> + used_phys, >> + true, >> MEMTXATTRS_UNSPECIFIED); >> + rcu_read_unlock(); >> + >> + if (iotlb.target_as == NULL) { >> + return -EINVAL; > I am not sure how this can trigger. I don't like == NULL: > !iotlb.target_as is more succint. But a bigger question is how to > handle this. callers ignore the return value so maybe > log guest error? iommu seems misconfigured ... > > >> + } >> + >> + phys = iotlb.translated_addr; >> + s = MIN(iotlb.addr_mask + 1, used_size); > Note, that iotlb.translated_addr here is an aligned address and > iotlb.addr_mask + 1 is size from there. > > So I think phys that you want is actually > phys = iotlb.translated_addr + (used_phys & iotlb.addr_mask); > > > > accordingly, the size would be from there until end of mask: > s = MIN(iotlb.addr_mask + 1 - (used_phys & iotlb.addr_mask), used_size); > > > Also, it bothers me that the math here will give you 0 if addr_mask is > all ones. Then MIN will give 0 too and we loop forever. I think this > can not trigger, but I'd rather we play it safe and add outside of MIN > after it's capped to a reasonable value. So we end up with: > > /* Distance from start of used ring until last byte of IOMMU page */ > s = iotlb.addr_mask - (used_phys & iotlb.addr_mask); > /* size of used ring, or of the part of it until end of IOMMU page */ > s = MIN(s, used_size - 1) + 1; > > > > > >> + >> + vhost_dev_sync_region(dev, section, start_addr, end_addr, >> phys, >> + range_get_last(phys, used_size)); > why are you syncing used_size here? Shouldn't it be s? > > > >> + used_size -= s; >> + used_phys += s; >> + } >> + } else { >> + vhost_dev_sync_region(dev, section, start_addr, end_addr, >> used_phys, >> + range_get_last(used_phys, used_size)); >> + } >> } >> return 0; >> } >> @@ -306,24 +349,6 @@ static inline void vhost_dev_log_resize(struct >> vhost_dev *dev, uint64_t size) >> dev->log_size = size; >> } >> >> -static bool vhost_dev_has_iommu(struct vhost_dev *dev) >> -{ >> - VirtIODevice *vdev = dev->vdev; >> - >> - /* >> - * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support >> - * incremental memory mapping API via IOTLB API. For platform that >> - * does not have IOMMU, there's no need to enable this feature >> - * which may cause unnecessary IOTLB miss/update transactions. >> - */ >> - if (vdev) { >> - return virtio_bus_device_iommu_enabled(vdev) && >> - virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM); >> - } else { >> - return false; >> - } >> -} >> - >> static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr, >> hwaddr *plen, bool is_write) >> { >> -- >> 2.25.1 Thanks Eric