Re: [Intel-gfx] [PATCH v4 1/2] vfio: Replace the DMA unmapping notifier with a callback
On Wed, 2022-07-20 at 17:04 -0600, Alex Williamson wrote: > On Wed, 20 Jul 2022 17:08:29 -0300 > Jason Gunthorpe wrote: > > > On Wed, Jul 20, 2022 at 01:41:13PM -0600, Alex Williamson wrote: > > > > > ie. we don't need the gfn, we only need the iova. > > > > Right, that makes sense > > > > > However then I start to wonder why we're passing in 1 for the > > > number of > > > pages because this previously notifier, now callback is called > > > for the > > > entire vfio_dma range when we find any pinned pages. > > > > Well, it is doing this because it only ever pins one page. > > Of course that page is not necessarily the page it unpins given the > contract misunderstanding below. > > > The drivers are confused about what the contract is. vfio is > > calling > > the notifier with the entire IOVA range that is being unmapped and > > the > > drivers are expecting to receive notifications only for the IOVA > > they > > have actually pinned. > > > > > Should ap and ccw implementations of .dma_unmap just be replaced > > > with a > > > BUG_ON(1)? > > > > The point of these callbacks is to halt concurrent DMA, and ccw > > does > > that today. > > ccw essentially only checks whether the starting iova of the unmap is > currently mapped. If not it does nothing, if it is it tries to reset > the device and unpin everything. Chances are the first iova is not > the > one pinned, so we don't end up removing the pinned page and type1 > will > eventually BUG_ON after a few tries. > > > It looks like AP is missing a call to ap_aqic(), so it is > > probably double wrong. > > Thankfully the type1 unpinning path can't be tricked into unpinning > something that wasn't pinned, so chances are the unpin call does > nothing, with a small risk that it unpins another driver's pinned > page, > which might not yet have been notified and could still be using the > page. In the end, if ap did have a page pinned in the range, we'll > hit > the same BUG_ON as above. > > > What I'd suggest is adding a WARN_ON that the dma->pfn_list is not > > empty and leave these functions alone. > > The BUG_ON still exists in type1. > > Eric, Matt, Tony, Halil, JasonH, any quick fixes here? ccw looks > like > it would be pretty straightforward to test against a range rather > than > a single iova. Agreed, ccw looks pretty easy. Should I send something to go before this series to make stable easier? (It's a trivial change in either direction, so either is fine to me.) Eric > > > Most likely AP should be fixed to call vfio_ap_irq_disable() and to > > check the q->saved_pfn against the IOVA. > > Right, the q->saved_iova, perhaps calling vfio_ap_irq_disable() on > finding a matching queue. > > > But I'm inclined to leave this as-is for this series given we are > > at > > rc7. > > On the grounds that it's no worse, maybe, but given the changes > around this code hopefully we can submit fixes patches to stable if > the > backport isn't obvious and the BUG_ON in type1 is reachable. Thanks, > > Alex >
Re: [Intel-gfx] [PATCH v4 1/2] vfio: Replace the DMA unmapping notifier with a callback
On Thu, 21 Jul 2022 12:01:47 -0400 Eric Farman wrote: > On Wed, 2022-07-20 at 17:04 -0600, Alex Williamson wrote: > > On Wed, 20 Jul 2022 17:08:29 -0300 > > Jason Gunthorpe wrote: > > > > > On Wed, Jul 20, 2022 at 01:41:13PM -0600, Alex Williamson wrote: > > > > > > > ie. we don't need the gfn, we only need the iova. > > > > > > Right, that makes sense > > > > > > > However then I start to wonder why we're passing in 1 for the > > > > number of > > > > pages because this previously notifier, now callback is called > > > > for the > > > > entire vfio_dma range when we find any pinned pages. > > > > > > Well, it is doing this because it only ever pins one page. > > > > Of course that page is not necessarily the page it unpins given the > > contract misunderstanding below. > > > > > The drivers are confused about what the contract is. vfio is > > > calling > > > the notifier with the entire IOVA range that is being unmapped and > > > the > > > drivers are expecting to receive notifications only for the IOVA > > > they > > > have actually pinned. > > > > > > > Should ap and ccw implementations of .dma_unmap just be replaced > > > > with a > > > > BUG_ON(1)? > > > > > > The point of these callbacks is to halt concurrent DMA, and ccw > > > does > > > that today. > > > > ccw essentially only checks whether the starting iova of the unmap is > > currently mapped. If not it does nothing, if it is it tries to reset > > the device and unpin everything. Chances are the first iova is not > > the > > one pinned, so we don't end up removing the pinned page and type1 > > will > > eventually BUG_ON after a few tries. > > > > > It looks like AP is missing a call to ap_aqic(), so it is > > > probably double wrong. > > > > Thankfully the type1 unpinning path can't be tricked into unpinning > > something that wasn't pinned, so chances are the unpin call does > > nothing, with a small risk that it unpins another driver's pinned > > page, > > which might not yet have been notified and could still be using the > > page. In the end, if ap did have a page pinned in the range, we'll > > hit > > the same BUG_ON as above. > > > > > What I'd suggest is adding a WARN_ON that the dma->pfn_list is not > > > empty and leave these functions alone. > > > > The BUG_ON still exists in type1. > > > > Eric, Matt, Tony, Halil, JasonH, any quick fixes here? ccw looks > > like > > it would be pretty straightforward to test against a range rather > > than > > a single iova. > > Agreed, ccw looks pretty easy. Should I send something to go before > this series to make stable easier? (It's a trivial change in either > direction, so either is fine to me.) It looks like we're expecting an rc8 for this development cycle, so the merge window will be pushed out a week (which works better for some upcoming PTO on my end), but if it's trivial either way let's plan for the fix to follow Nicolin's and Jason's series and we can always post a backport to the stable list if there's any trouble. Thanks, Alex
Re: [Intel-gfx] [PATCH v4 1/2] vfio: Replace the DMA unmapping notifier with a callback
On Wed, 20 Jul 2022 17:08:29 -0300 Jason Gunthorpe wrote: > On Wed, Jul 20, 2022 at 01:41:13PM -0600, Alex Williamson wrote: > > > ie. we don't need the gfn, we only need the iova. > > Right, that makes sense > > > However then I start to wonder why we're passing in 1 for the number of > > pages because this previously notifier, now callback is called for the > > entire vfio_dma range when we find any pinned pages. > > Well, it is doing this because it only ever pins one page. Of course that page is not necessarily the page it unpins given the contract misunderstanding below. > The drivers are confused about what the contract is. vfio is calling > the notifier with the entire IOVA range that is being unmapped and the > drivers are expecting to receive notifications only for the IOVA they > have actually pinned. > > > Should ap and ccw implementations of .dma_unmap just be replaced with a > > BUG_ON(1)? > > The point of these callbacks is to halt concurrent DMA, and ccw does > that today. ccw essentially only checks whether the starting iova of the unmap is currently mapped. If not it does nothing, if it is it tries to reset the device and unpin everything. Chances are the first iova is not the one pinned, so we don't end up removing the pinned page and type1 will eventually BUG_ON after a few tries. > It looks like AP is missing a call to ap_aqic(), so it is > probably double wrong. Thankfully the type1 unpinning path can't be tricked into unpinning something that wasn't pinned, so chances are the unpin call does nothing, with a small risk that it unpins another driver's pinned page, which might not yet have been notified and could still be using the page. In the end, if ap did have a page pinned in the range, we'll hit the same BUG_ON as above. > What I'd suggest is adding a WARN_ON that the dma->pfn_list is not > empty and leave these functions alone. The BUG_ON still exists in type1. Eric, Matt, Tony, Halil, JasonH, any quick fixes here? ccw looks like it would be pretty straightforward to test against a range rather than a single iova. > Most likely AP should be fixed to call vfio_ap_irq_disable() and to > check the q->saved_pfn against the IOVA. Right, the q->saved_iova, perhaps calling vfio_ap_irq_disable() on finding a matching queue. > But I'm inclined to leave this as-is for this series given we are at > rc7. On the grounds that it's no worse, maybe, but given the changes around this code hopefully we can submit fixes patches to stable if the backport isn't obvious and the BUG_ON in type1 is reachable. Thanks, Alex
Re: [Intel-gfx] [PATCH v4 1/2] vfio: Replace the DMA unmapping notifier with a callback
On Tue, 19 Jul 2022 21:02:48 -0300 Jason Gunthorpe wrote: > diff --git a/drivers/s390/crypto/vfio_ap_ops.c > b/drivers/s390/crypto/vfio_ap_ops.c > index a7d2a95796d360..bb1a1677c5c230 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -1226,34 +1226,14 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev > *matrix_mdev, > return 0; > } > > -/** > - * vfio_ap_mdev_iommu_notifier - IOMMU notifier callback > - * > - * @nb: The notifier block > - * @action: Action to be taken > - * @data: data associated with the request > - * > - * For an UNMAP request, unpin the guest IOVA (the NIB guest address we > - * pinned before). Other requests are ignored. > - * > - * Return: for an UNMAP request, NOFITY_OK; otherwise NOTIFY_DONE. > - */ > -static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb, > -unsigned long action, void *data) > +static void vfio_ap_mdev_dma_unmap(struct vfio_device *vdev, u64 iova, > +u64 length) > { > - struct ap_matrix_mdev *matrix_mdev; > - > - matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier); > - > - if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) { > - struct vfio_iommu_type1_dma_unmap *unmap = data; > - unsigned long g_pfn = unmap->iova >> PAGE_SHIFT; > - > - vfio_unpin_pages(_mdev->vdev, _pfn, 1); > - return NOTIFY_OK; > - } > + struct ap_matrix_mdev *matrix_mdev = > + container_of(vdev, struct ap_matrix_mdev, vdev); > + unsigned long g_pfn = iova >> PAGE_SHIFT; > > - return NOTIFY_DONE; > + vfio_unpin_pages(_mdev->vdev, _pfn, 1); > } > > /** I tried to apply this on top of Nicolin's series which results in a conflict that can be resolved as below: diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index e8856a7e151c..d7c38c82f694 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -1219,33 +1219,13 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev, return 0; } -/** - * vfio_ap_mdev_iommu_notifier - IOMMU notifier callback - * - * @nb: The notifier block - * @action: Action to be taken - * @data: data associated with the request - * - * For an UNMAP request, unpin the guest IOVA (the NIB guest address we - * pinned before). Other requests are ignored. - * - * Return: for an UNMAP request, NOFITY_OK; otherwise NOTIFY_DONE. - */ -static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb, - unsigned long action, void *data) +static void vfio_ap_mdev_dma_unmap(struct vfio_device *vdev, u64 iova, + u64 length) { - struct ap_matrix_mdev *matrix_mdev; - - matrix_mdev = container_of(nb, struct ap_matrix_mdev, iommu_notifier); - - if (action == VFIO_IOMMU_NOTIFY_DMA_UNMAP) { - struct vfio_iommu_type1_dma_unmap *unmap = data; - - vfio_unpin_pages(_mdev->vdev, unmap->iova, 1); - return NOTIFY_OK; - } + struct ap_matrix_mdev *matrix_mdev = + container_of(vdev, struct ap_matrix_mdev, vdev); - return NOTIFY_DONE; + vfio_unpin_pages(_mdev->vdev, iova, 1); } /** ie. we don't need the gfn, we only need the iova. However then I start to wonder why we're passing in 1 for the number of pages because this previously notifier, now callback is called for the entire vfio_dma range when we find any pinned pages. It makes no sense for a driver to assume that the first iova is pinned and is the only pinned page. ccw has the same issue: static void vfio_ccw_dma_unmap(struct vfio_device *vdev, u64 iova, u64 length) { struct vfio_ccw_private *private = container_of(vdev, struct vfio_ccw_private, vdev); /* Drivers MUST unpin pages in response to an invalidation. */ if (!cp_iova_pinned(>cp, iova)) return; vfio_ccw_mdev_reset(private); } Entirely ignoring the length arg. It seems only GVT-g has this correct to actually look through the extent of the range being unmapped: static void intel_vgpu_dma_unmap(struct vfio_device *vfio_dev, u64 iova, u64 length) { struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev); struct gvt_dma *entry; u64 iov_pfn = iova >> PAGE_SHIFT; u64 end_iov_pfn = iov_pfn + length / PAGE_SIZE; mutex_lock(>cache_lock); for (; iov_pfn < end_iov_pfn; iov_pfn++) { entry = __gvt_cache_find_gfn(vgpu, iov_pfn); if (!entry) continue; gvt_dma_unmap_page(vgpu, entry->gfn, entry->dma_addr, entry->size); __gvt_cache_remove_entry(vgpu, entry); } mutex_unlock(>cache_lock); } Should ap and