On Tue, 6 Oct 2020 06:49:28 -0400 "Michael S. Tsirkin" <m...@redhat.com> wrote:
> On Tue, Oct 06, 2020 at 11:58:50AM +0200, Greg Kurz wrote: > > On Mon, 5 Oct 2020 10:18:03 -0400 > > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > > > On Mon, Sep 28, 2020 at 09:37:18AM +0200, Greg Kurz wrote: > > > > On Mon, 28 Sep 2020 16:23:43 +1000 > > > > David Gibson <da...@gibson.dropbear.id.au> wrote: > > > > > > > > > On Fri, Sep 25, 2020 at 07:29:43PM +0200, Greg Kurz wrote: > > > > > > When a vIOMMU is present, any address comming from the guest is an > > > > > > IO > > > > > > virtual address, including those of the vrings. The backend's > > > > > > accesses > > > > > > to the vrings happen through vIOMMU translation : the backend hence > > > > > > only logs the final guest physical address, not the IO virtual one. > > > > > > It thus doesn't make sense to make room for the vring addresses in > > > > > > the > > > > > > dirty log in this case. > > > > > > > > > > > > This fixes a crash of the source when migrating a guest using > > > > > > in-kernel > > > > > > vhost-net and iommu_platform=on on POWER, because DMA regions are > > > > > > put > > > > > > at very high addresses and the resulting log size is likely to cause > > > > > > g_malloc0() to abort. > > > > > > > > > > > > BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1879349 > > > > > > Signed-off-by: Greg Kurz <gr...@kaod.org> > > > > > > > > > > I'm a little confused as to what's going on here. Obviously > > > > > allocating dirty bitmaps in IOVA space doesn't make much sense. > > > > > But.. in all cases isn't the ring ending up in guest memory, whether > > > > > translated or not. So why do specific addresses of the ring make a > > > > > difference in *any* case. > > > > > > > > > > > > > I admit I'm a bit surprised as well... I can't think of a scenario > > > > where the address of the used ring would be higher than the guest > > > > memory... Maybe MST can shed some light here ? > > > > > > So the original idea was that vring itself is specified in > > > terms of HVA as opposed to rest of stuff which is specified > > > in terms of GPA. > > > > The vring itself is indeed described in term of HVA (vq->used) but > > when it comes to the dirty log, QEMU passes the GPA of the used > > structure to the vhost backend: > > > > >From vhost_virtqueue_set_addr(): > > > > addr.log_guest_addr = vq->used_phys; > > ^^ GPA ^^ > > addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0; > > r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr); > > Right. The point being we either use the log bitmap or the ring aliasing Not sure what the "ring aliasing trick" is referring too... > trick to track memory changes, not both. If we used the ring aliasing > trick then presumably VHOST_VRING_F_LOG would be clear and then > log_guest_addr is unused. > Ok, so IIUC this means that vhost_get_log_size() never needs to special case vq->used_phys (ie. log_guest_addr) since it is a GPA (ie. <= guest RAM size and thus already covered by the first loop), right ? > > And the sizing of the bitmap computed in vhost_get_log_size() is > > also based on this GPA: > > > > for (i = 0; i < dev->nvqs; ++i) { > > struct vhost_virtqueue *vq = dev->vqs + i; > > > > if (!vq->used_phys && !vq->used_size) { > > continue; > > } > > > > uint64_t last = vq->used_phys + vq->used_size - 1; > > ^^ GPA ^^ > > log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1); > > } > > > > With platform_iommu=off, I couldn't find a case where this second > > loop in vhost_get_log_size() increases the log size actually... and > > when platform_iommu=on, vq->used_phys is a GIOVA that my other patches > > you've already merged in the kernel explicitly ignore when it comes to > > the dirty log... So it really seems that this loop is useless for the > > general case. If it is there to address some corner case, I guess it > > deserves a comment at the very least. > > > > Cheers, > > > > -- > > Greg > > > > > This way we wanted to support e.g. migration by vhost writing into > > > qemu address space, qemu copying data to guest memory. > > > > > > > > > > > > > > > > > > --- > > > > > > hw/virtio/vhost.c | 38 ++++++++++++++++++++++++-------------- > > > > > > 1 file changed, 24 insertions(+), 14 deletions(-) > > > > > > > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > > > index 1a1384e7a642..0b83d6b8e65e 100644 > > > > > > --- a/hw/virtio/vhost.c > > > > > > +++ b/hw/virtio/vhost.c > > > > > > @@ -106,6 +106,20 @@ static void vhost_dev_sync_region(struct > > > > > > vhost_dev *dev, > > > > > > } > > > > > > } > > > > > > > > > > > > +static int vhost_dev_has_iommu(struct vhost_dev *dev) > > > > > > +{ > > > > > > + VirtIODevice *vdev = dev->vdev; > > > > > > + > > > > > > + /* > > > > > > + * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support > > > > > > + * incremental memory mapping API via IOTLB API. For platform > > > > > > that > > > > > > + * does not have IOMMU, there's no need to enable this feature > > > > > > + * which may cause unnecessary IOTLB miss/update trnasactions. > > > > > > + */ > > > > > > + return vdev->dma_as != &address_space_memory && > > > > > > + virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM); > > > > > > +} > > > > > > + > > > > > > static int vhost_sync_dirty_bitmap(struct vhost_dev *dev, > > > > > > MemoryRegionSection *section, > > > > > > hwaddr first, > > > > > > @@ -130,6 +144,11 @@ static int vhost_sync_dirty_bitmap(struct > > > > > > vhost_dev *dev, > > > > > > range_get_last(reg->guest_phys_addr, > > > > > > reg->memory_size)); > > > > > > } > > > > > > + > > > > > > + if (vhost_dev_has_iommu(dev)) { > > > > > > + return 0; > > > > > > + } > > > > > > + > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > > > struct vhost_virtqueue *vq = dev->vqs + i; > > > > > > > > > > > > @@ -172,6 +191,11 @@ static uint64_t vhost_get_log_size(struct > > > > > > vhost_dev *dev) > > > > > > reg->memory_size); > > > > > > log_size = MAX(log_size, last / VHOST_LOG_CHUNK + 1); > > > > > > } > > > > > > + > > > > > > + if (vhost_dev_has_iommu(dev)) { > > > > > > + return log_size; > > > > > > + } > > > > > > + > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > > > struct vhost_virtqueue *vq = dev->vqs + i; > > > > > > > > > > > > @@ -287,20 +311,6 @@ static inline void vhost_dev_log_resize(struct > > > > > > vhost_dev *dev, uint64_t size) > > > > > > dev->log_size = size; > > > > > > } > > > > > > > > > > > > -static int vhost_dev_has_iommu(struct vhost_dev *dev) > > > > > > -{ > > > > > > - VirtIODevice *vdev = dev->vdev; > > > > > > - > > > > > > - /* > > > > > > - * For vhost, VIRTIO_F_IOMMU_PLATFORM means the backend support > > > > > > - * incremental memory mapping API via IOTLB API. For platform > > > > > > that > > > > > > - * does not have IOMMU, there's no need to enable this feature > > > > > > - * which may cause unnecessary IOTLB miss/update trnasactions. > > > > > > - */ > > > > > > - return vdev->dma_as != &address_space_memory && > > > > > > - virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM); > > > > > > -} > > > > > > - > > > > > > static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr, > > > > > > hwaddr *plen, bool is_write) > > > > > > { > > > > > > > > > > > > > > > > > > > > > > > > > > > >