Re: [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-24 Thread Daniel Vetter
On Mon, Jun 20, 2022 at 08:18:04AM -0700, Rob Clark wrote:
> On Mon, Jun 20, 2022 at 7:09 AM Dmitry Osipenko
>  wrote:
> >
> > On 6/19/22 20:53, Rob Clark wrote:
> > ...
> > >> +static unsigned long
> > >> +drm_gem_shmem_shrinker_count_objects(struct shrinker *shrinker,
> > >> +struct shrink_control *sc)
> > >> +{
> > >> +   struct drm_gem_shmem_shrinker *gem_shrinker = 
> > >> to_drm_shrinker(shrinker);
> > >> +   struct drm_gem_shmem_object *shmem;
> > >> +   unsigned long count = 0;
> > >> +
> > >> +   if (!mutex_trylock(_shrinker->lock))
> > >> +   return 0;
> > >> +
> > >> +   list_for_each_entry(shmem, _shrinker->lru_evictable, 
> > >> madv_list) {
> > >> +   count += shmem->base.size;
> > >> +
> > >> +   if (count >= SHRINK_EMPTY)
> > >> +   break;
> > >> +   }
> > >> +
> > >> +   mutex_unlock(_shrinker->lock);
> > >
> > > As I mentioned on other thread, count_objects, being approximate but
> > > lockless and fast is the important thing.  Otherwise when you start
> > > hitting the shrinker on many threads, you end up serializing them all,
> > > even if you have no pages to return to the system at that point.
> >
> > Daniel's point for dropping the lockless variant was that we're already
> > in trouble if we're hitting shrinker too often and extra optimizations
> > won't bring much benefits to us.
> 
> At least with zram swap (which I highly recommend using even if you
> are not using a physical swap file/partition), swapin/out is actually
> quite fast.  And if you are leaning on zram swap to fit 8GB of chrome
> browser on a 4GB device, the shrinker gets hit quite a lot.  Lower
> spec (4GB RAM) chromebooks can be under constant memory pressure and
> can quite easily get into a situation where you are hitting the
> shrinker on many threads simultaneously.  So it is pretty important
> for all shrinkers in the system (not just drm driver) to be as
> concurrent as possible.  As long as you avoid serializing reclaim on
> all the threads, performance can still be quite good, but if you don't
> performance will fall off a cliff.
> 
> jfwiw, we are seeing pretty good results (iirc 40-70% increase in open
> tab counts) with the combination of eviction + multigen LRU[1] +
> sizing zram swap to be 2x physical RAM
> 
> [1] https://lwn.net/Articles/856931/
> 
> > Alright, I'll add back the lockless variant (or will use yours
> > drm_gem_lru) in the next revision. The code difference is very small
> > after all.
> >
> > ...
> > >> +   /* prevent racing with the dma-buf importing/exporting */
> > >> +   if 
> > >> (!mutex_trylock(_shrinker->dev->object_name_lock)) {
> > >> +   *lock_contention |= true;
> > >> +   goto resv_unlock;
> > >> +   }
> > >
> > > I'm not sure this is a good idea to serialize on object_name_lock.
> > > Purgeable buffers should never be shared (imported or exported).  So
> > > at best you are avoiding evicting and immediately swapping back in, in
> > > a rare case, at the cost of serializing multiple threads trying to
> > > reclaim pages in parallel.
> >
> > The object_name_lock shouldn't cause contention in practice. But objects
> > are also pinned on attachment, hence maybe this lock is indeed
> > unnecessary.. I'll re-check it.
> 
> I'm not worried about contention with export/import/etc, but
> contention between multiple threads hitting the shrinker in parallel.
> I guess since you are using trylock, it won't *block* the other
> threads hitting shrinker, but they'll just end up looping in
> do_shrink_slab() because they are hitting contention.
> 
> I'd have to do some experiments to see how it works out in practice,
> but my gut feel is that it isn't a good idea

Yeah trylock on anything else than the object lock is No Good in the
shrinker. And it really shouldn't be needed, since import/export should
pin stuff as needed. Which should be protected by the dma_resv object
lock. If not, we need to fix that.

Picking a random drm-internal lock like this is definitely no good design.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 17/22] drm/shmem-helper: Add generic memory shrinker

2022-06-24 Thread Daniel Vetter
On Sun, Jun 19, 2022 at 10:53:03AM -0700, Rob Clark wrote:
> On Thu, May 26, 2022 at 4:55 PM Dmitry Osipenko
>  wrote:
> > +   mutex_unlock(_shrinker->lock);
> 
> As I mentioned on other thread, count_objects, being approximate but
> lockless and fast is the important thing.  Otherwise when you start
> hitting the shrinker on many threads, you end up serializing them all,
> even if you have no pages to return to the system at that point.

Yeah agreed, seems like I was wrong here :-) Atomic counter or something
would also be in link the the lru_list stuff.

It would be to record this in the kerneldoc for the shrinker structure
though, to make sure this is all understood.

> > +   /* prevent racing with the dma-buf importing/exporting */
> > +   if (!mutex_trylock(_shrinker->dev->object_name_lock)) {
> > +   *lock_contention |= true;
> > +   goto resv_unlock;
> > +   }
> 
> I'm not sure this is a good idea to serialize on object_name_lock.
> Purgeable buffers should never be shared (imported or exported).  So
> at best you are avoiding evicting and immediately swapping back in, in
> a rare case, at the cost of serializing multiple threads trying to
> reclaim pages in parallel.

Yeah this sounds really bad. Plus this is a per-device lock, and doing
those with trylock means the shrinker will fail to find shrinkable memory
way too often. We need to engineer this out somehow.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-24 Thread Robin Murphy

On 2022-06-24 14:16, Jason Gunthorpe wrote:

On Wed, Jun 22, 2022 at 08:54:45AM +0100, Robin Murphy wrote:

On 2022-06-16 23:23, Nicolin Chen wrote:

On Thu, Jun 16, 2022 at 06:40:14AM +, Tian, Kevin wrote:


The domain->ops validation was added, as a precaution, for mixed-driver
systems. However, at this moment only one iommu driver is possible. So
remove it.


It's true on a physical platform. But I'm not sure whether a virtual platform
is allowed to include multiple e.g. one virtio-iommu alongside a virtual VT-d
or a virtual smmu. It might be clearer to claim that (as Robin pointed out)
there is plenty more significant problems than this to solve instead of simply
saying that only one iommu driver is possible if we don't have explicit code
to reject such configuration. 


Will edit this part. Thanks!


Oh, physical platforms with mixed IOMMUs definitely exist already. The main
point is that while bus_set_iommu still exists, the core code effectively
*does* prevent multiple drivers from registering - even in emulated cases
like the example above, virtio-iommu and VT-d would both try to
bus_set_iommu(_bus_type), and one of them will lose. The aspect which
might warrant clarification is that there's no combination of supported
drivers which claim non-overlapping buses *and* could appear in the same
system - even if you tried to contrive something by emulating, say, VT-d
(PCI) alongside rockchip-iommu (platform), you could still only describe one
or the other due to ACPI vs. Devicetree.


Right, and that is still something we need to protect against with
this ops check. VFIO is not checking that the bus's are the same
before attempting to re-use a domain.

So it is actually functional and does protect against systems with
multiple iommu drivers on different busses.


But as above, which systems *are* those? Everything that's on my radar 
would have drivers all competing for the platform bus - Intel and s390 
are somewhat the odd ones out in that respect, but are also non-issues 
as above. FWIW my iommu/bus dev branch has got as far as the final bus 
ops removal and allowing multiple driver registrations, and before it 
allows that, it does now have the common attach check that I sketched 
out in the previous discussion of this.


It's probably also noteworthy that domain->ops is no longer the same 
domain->ops that this code was written to check, and may now be 
different between domains from the same driver.


Thanks,
Robin.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: virtio_balloon regression in 5.19-rc3

2022-06-24 Thread Michael S. Tsirkin
On Tue, Jun 21, 2022 at 06:10:00PM +0200, Ben Hutchings wrote:
> On Tue, 2022-06-21 at 17:34 +0800, Jason Wang wrote:
> > On Tue, Jun 21, 2022 at 5:24 PM David Hildenbrand  wrote:
> > > 
> > > On 20.06.22 20:49, Ben Hutchings wrote:
> > > > I've tested a 5.19-rc3 kernel on top of QEMU/KVM with machine type
> > > > pc-q35-5.2.  It has a virtio balloon device defined in libvirt as:
> > > > 
> > > > 
> > > >> > > function="0x0"/>
> > > > 
> > > > 
> > > > but the virtio_balloon driver fails to bind to it:
> > > > 
> > > > virtio_balloon virtio4: init_vqs: add stat_vq failed
> > > > virtio_balloon: probe of virtio4 failed with error -5
> > > > 
> > > 
> > > Hmm, I don't see any recent changes to drivers/virtio/virtio_balloon.c
> > > 
> > > virtqueue_add_outbuf() fails with -EIO if I'm not wrong. That's the
> > > first call of virtqueue_add_outbuf() when virtio_balloon initializes.
> > > 
> > > 
> > > Maybe something in generic virtio code changed?
> > 
> > Yes, we introduced the IRQ hardening. That could be the root cause and
> > we've received lots of reports so we decide to disable it by default.
> > 
> > Ben, could you please try this patch: (and make sure
> > CONFIG_VIRTIO_HARDEN_NOTIFICATION is not set)
> > 
> > https://lore.kernel.org/lkml/20220620024158.2505-1-jasow...@redhat.com/T/
> 
> Yes, that patch fixes the regression for me.
> 
> Ben.


Jason are you going to fix balloon to call device_ready before
registering device with linux?
> -- 
> Ben Hutchings
> Any smoothly functioning technology is indistinguishable
> from a rigged demo.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] virtio: disable notification hardening by default

2022-06-24 Thread Michael S. Tsirkin
On Fri, Jun 24, 2022 at 04:41:55PM +0800, Xuan Zhuo wrote:
> On Wed, 22 Jun 2022 09:29:40 +0800, Jason Wang  wrote:
> > We try to harden virtio device notifications in 8b4ec69d7e09 ("virtio:
> > harden vring IRQ"). It works with the assumption that the driver or
> > core can properly call virtio_device_ready() at the right
> > place. Unfortunately, this seems to be not true and uncover various
> > bugs of the existing drivers, mainly the issue of using
> > virtio_device_ready() incorrectly.
> >
> > So let's having a Kconfig option and disable it by default. It gives
> > us a breath to fix the drivers and then we can consider to enable it
> > by default.
> 
> 
> hi, I have a question.
> 
> What should we do when the queue is reset and 
> CONFIG_VIRTIO_HARDEN_NOTIFICATION
> is off?
> 
> Since disable_irq() cannot be used, we should trust the device not to send 
> irqs
> again. So we don't have to do anything?

yes

> Thanks.
> 
> >
> > Signed-off-by: Jason Wang 
> > ---
> > Changes since V2:
> > - Tweak the Kconfig help
> > - Add comment for the read_lock() pairing in virtio_ccw
> > ---
> >  drivers/s390/virtio/virtio_ccw.c |  9 -
> >  drivers/virtio/Kconfig   | 13 +
> >  drivers/virtio/virtio.c  |  2 ++
> >  drivers/virtio/virtio_ring.c | 12 
> >  include/linux/virtio_config.h|  2 ++
> >  5 files changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > b/drivers/s390/virtio/virtio_ccw.c
> > index 97e51c34e6cf..1f6a358f65f0 100644
> > --- a/drivers/s390/virtio/virtio_ccw.c
> > +++ b/drivers/s390/virtio/virtio_ccw.c
> > @@ -1136,8 +1136,13 @@ static void virtio_ccw_int_handler(struct ccw_device 
> > *cdev,
> > vcdev->err = -EIO;
> > }
> > virtio_ccw_check_activity(vcdev, activity);
> > -   /* Interrupts are disabled here */
> > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > +   /*
> > +* Paried with virtio_ccw_synchronize_cbs() and interrupts are
> > +* disabled here.
> > +*/
> > read_lock(>irq_lock);
> > +#endif
> > for_each_set_bit(i, indicators(vcdev),
> >  sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
> > /* The bit clear must happen before the vring kick. */
> > @@ -1146,7 +1151,9 @@ static void virtio_ccw_int_handler(struct ccw_device 
> > *cdev,
> > vq = virtio_ccw_vq_by_ind(vcdev, i);
> > vring_interrupt(0, vq);
> > }
> > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > read_unlock(>irq_lock);
> > +#endif
> > if (test_bit(0, indicators2(vcdev))) {
> > virtio_config_changed(>vdev);
> > clear_bit(0, indicators2(vcdev));
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index b5adf6abd241..c04f370a1e5c 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -35,6 +35,19 @@ menuconfig VIRTIO_MENU
> >
> >  if VIRTIO_MENU
> >
> > +config VIRTIO_HARDEN_NOTIFICATION
> > +bool "Harden virtio notification"
> > +help
> > +  Enable this to harden the device notifications and suppress
> > +  those that happen at a time where notifications are illegal.
> > +
> > +  Experimental: Note that several drivers still have bugs that
> > +  may cause crashes or hangs when correct handling of
> > +  notifications is enforced; depending on the subset of
> > +  drivers and devices you use, this may or may not work.
> > +
> > +  If unsure, say N.
> > +
> >  config VIRTIO_PCI
> > tristate "PCI driver for virtio devices"
> > depends on PCI
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index ef04a96942bf..21dc08d2f32d 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -220,6 +220,7 @@ static int virtio_features_ok(struct virtio_device *dev)
> >   * */
> >  void virtio_reset_device(struct virtio_device *dev)
> >  {
> > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > /*
> >  * The below virtio_synchronize_cbs() guarantees that any
> >  * interrupt for this line arriving after
> > @@ -228,6 +229,7 @@ void virtio_reset_device(struct virtio_device *dev)
> >  */
> > virtio_break_device(dev);
> > virtio_synchronize_cbs(dev);
> > +#endif
> >
> > dev->config->reset(dev);
> >  }
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 13a7348cedff..d9d3b6e201fb 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1688,7 +1688,11 @@ static struct virtqueue 
> > *vring_create_virtqueue_packed(
> > vq->we_own_ring = true;
> > vq->notify = notify;
> > vq->weak_barriers = weak_barriers;
> > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > vq->broken = true;
> > +#else
> > +   vq->broken = false;
> > +#endif
> > vq->last_used_idx = 0;
> > vq->event_triggered = false;
> > vq->num_added = 0;
> > @@ -2135,9 +2139,13 @@ 

Re: [PATCH V3] virtio: disable notification hardening by default

2022-06-24 Thread Xuan Zhuo
On Wed, 22 Jun 2022 09:29:40 +0800, Jason Wang  wrote:
> We try to harden virtio device notifications in 8b4ec69d7e09 ("virtio:
> harden vring IRQ"). It works with the assumption that the driver or
> core can properly call virtio_device_ready() at the right
> place. Unfortunately, this seems to be not true and uncover various
> bugs of the existing drivers, mainly the issue of using
> virtio_device_ready() incorrectly.
>
> So let's having a Kconfig option and disable it by default. It gives
> us a breath to fix the drivers and then we can consider to enable it
> by default.


hi, I have a question.

What should we do when the queue is reset and CONFIG_VIRTIO_HARDEN_NOTIFICATION
is off?

Since disable_irq() cannot be used, we should trust the device not to send irqs
again. So we don't have to do anything?

Thanks.

>
> Signed-off-by: Jason Wang 
> ---
> Changes since V2:
> - Tweak the Kconfig help
> - Add comment for the read_lock() pairing in virtio_ccw
> ---
>  drivers/s390/virtio/virtio_ccw.c |  9 -
>  drivers/virtio/Kconfig   | 13 +
>  drivers/virtio/virtio.c  |  2 ++
>  drivers/virtio/virtio_ring.c | 12 
>  include/linux/virtio_config.h|  2 ++
>  5 files changed, 37 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/s390/virtio/virtio_ccw.c 
> b/drivers/s390/virtio/virtio_ccw.c
> index 97e51c34e6cf..1f6a358f65f0 100644
> --- a/drivers/s390/virtio/virtio_ccw.c
> +++ b/drivers/s390/virtio/virtio_ccw.c
> @@ -1136,8 +1136,13 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>   vcdev->err = -EIO;
>   }
>   virtio_ccw_check_activity(vcdev, activity);
> - /* Interrupts are disabled here */
> +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> + /*
> +  * Paried with virtio_ccw_synchronize_cbs() and interrupts are
> +  * disabled here.
> +  */
>   read_lock(>irq_lock);
> +#endif
>   for_each_set_bit(i, indicators(vcdev),
>sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
>   /* The bit clear must happen before the vring kick. */
> @@ -1146,7 +1151,9 @@ static void virtio_ccw_int_handler(struct ccw_device 
> *cdev,
>   vq = virtio_ccw_vq_by_ind(vcdev, i);
>   vring_interrupt(0, vq);
>   }
> +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
>   read_unlock(>irq_lock);
> +#endif
>   if (test_bit(0, indicators2(vcdev))) {
>   virtio_config_changed(>vdev);
>   clear_bit(0, indicators2(vcdev));
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index b5adf6abd241..c04f370a1e5c 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -35,6 +35,19 @@ menuconfig VIRTIO_MENU
>
>  if VIRTIO_MENU
>
> +config VIRTIO_HARDEN_NOTIFICATION
> +bool "Harden virtio notification"
> +help
> +  Enable this to harden the device notifications and suppress
> +  those that happen at a time where notifications are illegal.
> +
> +  Experimental: Note that several drivers still have bugs that
> +  may cause crashes or hangs when correct handling of
> +  notifications is enforced; depending on the subset of
> +  drivers and devices you use, this may or may not work.
> +
> +  If unsure, say N.
> +
>  config VIRTIO_PCI
>   tristate "PCI driver for virtio devices"
>   depends on PCI
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index ef04a96942bf..21dc08d2f32d 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -220,6 +220,7 @@ static int virtio_features_ok(struct virtio_device *dev)
>   * */
>  void virtio_reset_device(struct virtio_device *dev)
>  {
> +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
>   /*
>* The below virtio_synchronize_cbs() guarantees that any
>* interrupt for this line arriving after
> @@ -228,6 +229,7 @@ void virtio_reset_device(struct virtio_device *dev)
>*/
>   virtio_break_device(dev);
>   virtio_synchronize_cbs(dev);
> +#endif
>
>   dev->config->reset(dev);
>  }
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..d9d3b6e201fb 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1688,7 +1688,11 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   vq->we_own_ring = true;
>   vq->notify = notify;
>   vq->weak_barriers = weak_barriers;
> +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
>   vq->broken = true;
> +#else
> + vq->broken = false;
> +#endif
>   vq->last_used_idx = 0;
>   vq->event_triggered = false;
>   vq->num_added = 0;
> @@ -2135,9 +2139,13 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   }
>
>   if (unlikely(vq->broken)) {
> +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
>   dev_warn_once(>vq.vdev->dev,
> "virtio vring IRQ raised before DRIVER_OK");
> 

[PATCH] vringh: iterate on iotlb_translate to handle large translations

2022-06-24 Thread Stefano Garzarella
iotlb_translate() can return -ENOBUFS if the bio_vec is not big enough
to contain all the ranges for translation.
This can happen for example if the VMM maps a large bounce buffer,
without using hugepages, that requires more than 16 ranges to translate
the addresses.

To handle this case, let's extend iotlb_translate() to also return the
number of bytes successfully translated.
In copy_from_iotlb()/copy_to_iotlb() loops by calling iotlb_translate()
several times until we complete the translation.

Signed-off-by: Stefano Garzarella 
---
 drivers/vhost/vringh.c | 78 ++
 1 file changed, 56 insertions(+), 22 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index eab55accf381..11f59dd06a74 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1095,7 +1095,8 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
 #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
 
 static int iotlb_translate(const struct vringh *vrh,
-  u64 addr, u64 len, struct bio_vec iov[],
+  u64 addr, u64 len, u64 *translated,
+  struct bio_vec iov[],
   int iov_size, u32 perm)
 {
struct vhost_iotlb_map *map;
@@ -1136,43 +1137,76 @@ static int iotlb_translate(const struct vringh *vrh,
 
spin_unlock(vrh->iotlb_lock);
 
+   if (translated)
+   *translated = min(len, s);
+
return ret;
 }
 
 static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
  void *src, size_t len)
 {
-   struct iov_iter iter;
-   struct bio_vec iov[16];
-   int ret;
+   u64 total_translated = 0;
 
-   ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
- len, iov, 16, VHOST_MAP_RO);
-   if (ret < 0)
-   return ret;
+   while (total_translated < len) {
+   struct bio_vec iov[16];
+   struct iov_iter iter;
+   u64 translated;
+   int ret;
 
-   iov_iter_bvec(, READ, iov, ret, len);
+   ret = iotlb_translate(vrh, (u64)(uintptr_t)src,
+ len - total_translated, ,
+ iov, ARRAY_SIZE(iov), VHOST_MAP_RO);
+   if (ret == -ENOBUFS)
+   ret = ARRAY_SIZE(iov);
+   else if (ret < 0)
+   return ret;
 
-   ret = copy_from_iter(dst, len, );
+   iov_iter_bvec(, READ, iov, ret, translated);
 
-   return ret;
+   ret = copy_from_iter(dst, translated, );
+   if (ret < 0)
+   return ret;
+
+   src += translated;
+   dst += translated;
+   total_translated += translated;
+   }
+
+   return total_translated;
 }
 
 static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
void *src, size_t len)
 {
-   struct iov_iter iter;
-   struct bio_vec iov[16];
-   int ret;
+   u64 total_translated = 0;
 
-   ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
- len, iov, 16, VHOST_MAP_WO);
-   if (ret < 0)
-   return ret;
+   while (total_translated < len) {
+   struct bio_vec iov[16];
+   struct iov_iter iter;
+   u64 translated;
+   int ret;
+
+   ret = iotlb_translate(vrh, (u64)(uintptr_t)dst,
+ len - total_translated, ,
+ iov, ARRAY_SIZE(iov), VHOST_MAP_WO);
+   if (ret == -ENOBUFS)
+   ret = ARRAY_SIZE(iov);
+   else if (ret < 0)
+   return ret;
 
-   iov_iter_bvec(, WRITE, iov, ret, len);
+   iov_iter_bvec(, WRITE, iov, ret, translated);
+
+   ret = copy_to_iter(src, translated, );
+   if (ret < 0)
+   return ret;
+
+   src += translated;
+   dst += translated;
+   total_translated += translated;
+   }
 
-   return copy_to_iter(src, len, );
+   return total_translated;
 }
 
 static inline int getu16_iotlb(const struct vringh *vrh,
@@ -1183,7 +1217,7 @@ static inline int getu16_iotlb(const struct vringh *vrh,
int ret;
 
/* Atomic read is needed for getu16 */
-   ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+   ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
  , 1, VHOST_MAP_RO);
if (ret < 0)
return ret;
@@ -1204,7 +1238,7 @@ static inline int putu16_iotlb(const struct vringh *vrh,
int ret;
 
/* Atomic write is needed for putu16 */
-   ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p),
+   ret = iotlb_translate(vrh, (u64)(uintptr_t)p, sizeof(*p), NULL,
 

Re: [PATCH v10 00/41] virtio pci support VIRTIO_F_RING_RESET

2022-06-24 Thread Xuan Zhuo
On Fri, 24 Jun 2022 03:00:12 -0400, "Michael S. Tsirkin"  
wrote:
> On Fri, Jun 24, 2022 at 10:55:40AM +0800, Xuan Zhuo wrote:
> > The virtio spec already supports the virtio queue reset function. This 
> > patch set
> > is to add this function to the kernel. The relevant virtio spec information 
> > is
> > here:
> >
> > https://github.com/oasis-tcs/virtio-spec/issues/124
> > https://github.com/oasis-tcs/virtio-spec/issues/139
> >
> > Also regarding MMIO support for queue reset, I plan to support it after this
> > patch is passed.
> >
> > This patch set implements the refactoring of vring. Finally, the
> > virtuque_resize() interface is provided based on the reset function of the
> > transport layer.
> >
> > Test environment:
> > Host: 4.19.91
> > Qemu: QEMU emulator version 6.2.50 (with vq reset support)
> > Test Cmd:  ethtool -G eth1 rx $1 tx $2; ethtool -g eth1
> >
> > The default is split mode, modify Qemu virtio-net to add PACKED feature 
> > to test
> > packed mode.
> >
> > Qemu code:
> > 
> > https://github.com/fengidri/qemu/compare/89f3bfa3265554d1d591ee4d7f1197b6e3397e84...master
>
>
> Pls rebase on top of my latest tree, there are some conflicts.

OK, I'll pull your latest version before committing the next version.

Thanks.

>
> > In order to simplify the review of this patch set, the function of reusing
> > the old buffers after resize will be introduced in subsequent patch sets.
> >
> > Please review. Thanks.
> >
> > v10:
> >   1. on top of the harden vring IRQ
> >   2. factor out split and packed from struct vring_virtqueue
> >   3. some suggest from @Jason Wang
> >
> > v9:
> >   1. Provide a virtqueue_resize() interface directly
> >   2. A patch set including vring resize, virtio pci reset, virtio-net resize
> >   3. No more separate structs
> >
> > v8:
> >   1. Provide a virtqueue_reset() interface directly
> >   2. Split the two patch sets, this is the first part
> >   3. Add independent allocation helper for allocating state, extra
> >
> > v7:
> >   1. fix #6 subject typo
> >   2. fix #6 ring_size_in_bytes is uninitialized
> >   3. check by: make W=12
> >
> > v6:
> >   1. virtio_pci: use synchronize_irq(irq) to sync the irq callbacks
> >   2. Introduce virtqueue_reset_vring() to implement the reset of vring 
> > during
> >  the reset process. May use the old vring if num of the vq not change.
> >   3. find_vqs() support sizes to special the max size of each vq
> >
> > v5:
> >   1. add virtio-net support set_ringparam
> >
> > v4:
> >   1. just the code of virtio, without virtio-net
> >   2. Performing reset on a queue is divided into these steps:
> > 1. reset_vq: reset one vq
> > 2. recycle the buffer from vq by virtqueue_detach_unused_buf()
> > 3. release the ring of the vq by vring_release_virtqueue()
> > 4. enable_reset_vq: re-enable the reset queue
> >   3. Simplify the parameters of enable_reset_vq()
> >   4. add container structures for virtio_pci_common_cfg
> >
> > v3:
> >   1. keep vq, irq unreleased
> >
> > *** BLURB HERE ***
> >
> > Xuan Zhuo (41):
> >   remoteproc: rename len of rpoc_vring to num
> >   virtio: add helper virtqueue_get_vring_max_size()
> >   virtio: struct virtio_config_ops add callbacks for queue_reset
> >   virtio_ring: update the document of the virtqueue_detach_unused_buf
> > for queue reset
> >   virtio_ring: remove the arg vq of vring_alloc_desc_extra()
> >   virtio_ring: extract the logic of freeing vring
> >   virtio_ring: split vring_virtqueue
> >   virtio_ring: introduce virtqueue_init()
> >   virtio_ring: split: introduce vring_free_split()
> >   virtio_ring: split: extract the logic of alloc queue
> >   virtio_ring: split: extract the logic of alloc state and extra
> >   virtio_ring: split: extract the logic of attach vring
> >   virtio_ring: split: extract the logic of vring init
> >   virtio_ring: split: introduce virtqueue_reinit_split()
> >   virtio_ring: split: reserve vring_align, may_reduce_num
> >   virtio_ring: split: introduce virtqueue_resize_split()
> >   virtio_ring: packed: introduce vring_free_packed
> >   virtio_ring: packed: extract the logic of alloc queue
> >   virtio_ring: packed: extract the logic of alloc state and extra
> >   virtio_ring: packed: extract the logic of attach vring
> >   virtio_ring: packed: extract the logic of vring init
> >   virtio_ring: packed: introduce virtqueue_reinit_packed()
> >   virtio_ring: packed: introduce virtqueue_resize_packed()
> >   virtio_ring: introduce virtqueue_resize()
> >   virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
> >   virtio: queue_reset: add VIRTIO_F_RING_RESET
> >   virtio: allow to unbreak/break virtqueue individually
> >   virtio_pci: update struct virtio_pci_common_cfg
> >   virtio_pci: introduce helper to get/set queue reset
> >   virtio_pci: extract the logic of active vq for modern pci
> >   virtio_pci: support VIRTIO_F_RING_RESET
> >   virtio: find_vqs() add arg sizes
> >   virtio_pci: support the 

Re: [PATCH v10 25/41] virtio_pci: struct virtio_pci_common_cfg add queue_notify_data

2022-06-24 Thread Xuan Zhuo
On Fri, 24 Jun 2022 02:59:39 -0400, "Michael S. Tsirkin"  
wrote:
> On Fri, Jun 24, 2022 at 10:56:05AM +0800, Xuan Zhuo wrote:
> > Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
> > here https://github.com/oasis-tcs/virtio-spec/issues/89
> >
> > For not breaks uABI, add a new struct virtio_pci_common_cfg_notify.
>
> What exactly is meant by not breaking uABI?
> Users are supposed to be prepared for struct size to change ... no?

This was a previous discussion with Jason Wang, who was concerned about
affecting some existing programs.

https://lore.kernel.org/all/CACGkMEshTp8vSP9=pkj82y8+ddqfu9tfak1eghmzlvxue-o...@mail.gmail.com/

Thanks.

>
>
> > Since I want to add queue_reset after queue_notify_data, I submitted
> > this patch first.
> >
> > Signed-off-by: Xuan Zhuo 
> > Acked-by: Jason Wang 
> > ---
> >  include/uapi/linux/virtio_pci.h | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/include/uapi/linux/virtio_pci.h 
> > b/include/uapi/linux/virtio_pci.h
> > index 3a86f36d7e3d..22bec9bd0dfc 100644
> > --- a/include/uapi/linux/virtio_pci.h
> > +++ b/include/uapi/linux/virtio_pci.h
> > @@ -166,6 +166,13 @@ struct virtio_pci_common_cfg {
> > __le32 queue_used_hi;   /* read-write */
> >  };
> >
> > +struct virtio_pci_common_cfg_notify {
> > +   struct virtio_pci_common_cfg cfg;
> > +
> > +   __le16 queue_notify_data;   /* read-write */
> > +   __le16 padding;
> > +};
> > +
> >  /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
> >  struct virtio_pci_cfg_cap {
> > struct virtio_pci_cap cap;
> > --
> > 2.31.0
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v10 00/41] virtio pci support VIRTIO_F_RING_RESET

2022-06-24 Thread Michael S. Tsirkin
On Fri, Jun 24, 2022 at 10:55:40AM +0800, Xuan Zhuo wrote:
> The virtio spec already supports the virtio queue reset function. This patch 
> set
> is to add this function to the kernel. The relevant virtio spec information is
> here:
> 
> https://github.com/oasis-tcs/virtio-spec/issues/124
> https://github.com/oasis-tcs/virtio-spec/issues/139
> 
> Also regarding MMIO support for queue reset, I plan to support it after this
> patch is passed.
> 
> This patch set implements the refactoring of vring. Finally, the
> virtuque_resize() interface is provided based on the reset function of the
> transport layer.
> 
> Test environment:
> Host: 4.19.91
> Qemu: QEMU emulator version 6.2.50 (with vq reset support)
> Test Cmd:  ethtool -G eth1 rx $1 tx $2; ethtool -g eth1
> 
> The default is split mode, modify Qemu virtio-net to add PACKED feature 
> to test
> packed mode.
> 
> Qemu code:
> 
> https://github.com/fengidri/qemu/compare/89f3bfa3265554d1d591ee4d7f1197b6e3397e84...master


Pls rebase on top of my latest tree, there are some conflicts.

> In order to simplify the review of this patch set, the function of reusing
> the old buffers after resize will be introduced in subsequent patch sets.
> 
> Please review. Thanks.
> 
> v10:
>   1. on top of the harden vring IRQ
>   2. factor out split and packed from struct vring_virtqueue
>   3. some suggest from @Jason Wang
> 
> v9:
>   1. Provide a virtqueue_resize() interface directly
>   2. A patch set including vring resize, virtio pci reset, virtio-net resize
>   3. No more separate structs
> 
> v8:
>   1. Provide a virtqueue_reset() interface directly
>   2. Split the two patch sets, this is the first part
>   3. Add independent allocation helper for allocating state, extra
> 
> v7:
>   1. fix #6 subject typo
>   2. fix #6 ring_size_in_bytes is uninitialized
>   3. check by: make W=12
> 
> v6:
>   1. virtio_pci: use synchronize_irq(irq) to sync the irq callbacks
>   2. Introduce virtqueue_reset_vring() to implement the reset of vring during
>  the reset process. May use the old vring if num of the vq not change.
>   3. find_vqs() support sizes to special the max size of each vq
> 
> v5:
>   1. add virtio-net support set_ringparam
> 
> v4:
>   1. just the code of virtio, without virtio-net
>   2. Performing reset on a queue is divided into these steps:
> 1. reset_vq: reset one vq
> 2. recycle the buffer from vq by virtqueue_detach_unused_buf()
> 3. release the ring of the vq by vring_release_virtqueue()
> 4. enable_reset_vq: re-enable the reset queue
>   3. Simplify the parameters of enable_reset_vq()
>   4. add container structures for virtio_pci_common_cfg
> 
> v3:
>   1. keep vq, irq unreleased
> 
> *** BLURB HERE ***
> 
> Xuan Zhuo (41):
>   remoteproc: rename len of rpoc_vring to num
>   virtio: add helper virtqueue_get_vring_max_size()
>   virtio: struct virtio_config_ops add callbacks for queue_reset
>   virtio_ring: update the document of the virtqueue_detach_unused_buf
> for queue reset
>   virtio_ring: remove the arg vq of vring_alloc_desc_extra()
>   virtio_ring: extract the logic of freeing vring
>   virtio_ring: split vring_virtqueue
>   virtio_ring: introduce virtqueue_init()
>   virtio_ring: split: introduce vring_free_split()
>   virtio_ring: split: extract the logic of alloc queue
>   virtio_ring: split: extract the logic of alloc state and extra
>   virtio_ring: split: extract the logic of attach vring
>   virtio_ring: split: extract the logic of vring init
>   virtio_ring: split: introduce virtqueue_reinit_split()
>   virtio_ring: split: reserve vring_align, may_reduce_num
>   virtio_ring: split: introduce virtqueue_resize_split()
>   virtio_ring: packed: introduce vring_free_packed
>   virtio_ring: packed: extract the logic of alloc queue
>   virtio_ring: packed: extract the logic of alloc state and extra
>   virtio_ring: packed: extract the logic of attach vring
>   virtio_ring: packed: extract the logic of vring init
>   virtio_ring: packed: introduce virtqueue_reinit_packed()
>   virtio_ring: packed: introduce virtqueue_resize_packed()
>   virtio_ring: introduce virtqueue_resize()
>   virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
>   virtio: queue_reset: add VIRTIO_F_RING_RESET
>   virtio: allow to unbreak/break virtqueue individually
>   virtio_pci: update struct virtio_pci_common_cfg
>   virtio_pci: introduce helper to get/set queue reset
>   virtio_pci: extract the logic of active vq for modern pci
>   virtio_pci: support VIRTIO_F_RING_RESET
>   virtio: find_vqs() add arg sizes
>   virtio_pci: support the arg sizes of find_vqs()
>   virtio_mmio: support the arg sizes of find_vqs()
>   virtio: add helper virtio_find_vqs_ctx_size()
>   virtio_net: set the default max ring size by find_vqs()
>   virtio_net: get ringparam by virtqueue_get_vring_max_size()
>   virtio_net: split free_unused_bufs()
>   virtio_net: support rx queue resize
>   virtio_net: support tx queue resize

Re: [PATCH v10 25/41] virtio_pci: struct virtio_pci_common_cfg add queue_notify_data

2022-06-24 Thread Michael S. Tsirkin
On Fri, Jun 24, 2022 at 10:56:05AM +0800, Xuan Zhuo wrote:
> Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
> here https://github.com/oasis-tcs/virtio-spec/issues/89
> 
> For not breaks uABI, add a new struct virtio_pci_common_cfg_notify.

What exactly is meant by not breaking uABI?
Users are supposed to be prepared for struct size to change ... no?


> Since I want to add queue_reset after queue_notify_data, I submitted
> this patch first.
> 
> Signed-off-by: Xuan Zhuo 
> Acked-by: Jason Wang 
> ---
>  include/uapi/linux/virtio_pci.h | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
> index 3a86f36d7e3d..22bec9bd0dfc 100644
> --- a/include/uapi/linux/virtio_pci.h
> +++ b/include/uapi/linux/virtio_pci.h
> @@ -166,6 +166,13 @@ struct virtio_pci_common_cfg {
>   __le32 queue_used_hi;   /* read-write */
>  };
>  
> +struct virtio_pci_common_cfg_notify {
> + struct virtio_pci_common_cfg cfg;
> +
> + __le16 queue_notify_data;   /* read-write */
> + __le16 padding;
> +};
> +
>  /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
>  struct virtio_pci_cfg_cap {
>   struct virtio_pci_cap cap;
> -- 
> 2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] virtio: disable notification hardening by default

2022-06-24 Thread Michael S. Tsirkin
On Wed, Jun 22, 2022 at 03:09:31PM +0800, Jason Wang wrote:
> On Wed, Jun 22, 2022 at 3:03 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 22, 2022 at 09:29:40AM +0800, Jason Wang wrote:
> > > We try to harden virtio device notifications in 8b4ec69d7e09 ("virtio:
> > > harden vring IRQ"). It works with the assumption that the driver or
> > > core can properly call virtio_device_ready() at the right
> > > place. Unfortunately, this seems to be not true and uncover various
> > > bugs of the existing drivers, mainly the issue of using
> > > virtio_device_ready() incorrectly.
> > >
> > > So let's having a Kconfig option and disable it by default. It gives
> > > us a breath to fix the drivers and then we can consider to enable it
> > > by default.
> > >
> > > Signed-off-by: Jason Wang 
> >
> >
> > OK I will queue, but I think the problem is fundamental.
> 
> If I understand correctly, you want some core IRQ work?

Yes.

> As discussed
> before, it doesn't solve all the problems, we still need to do per
> driver audit.
> 
> Thanks

Maybe, but we don't need to tie things to device_ready then.
We can do

- disable irqs
- device ready
- setup everything
- enable irqs


and this works for most things, the only issue is
this deadlocks if "setup everything" waits for interrupts.


With the current approach there's really no good time:
1.- setup everything
- device ready

can cause kicks before device is ready

2.- device ready
- setup everything

can cause callbacks before setup.

So I prefer the 1. and fix the hardening in the core.


> >
> >
> > > ---
> > > Changes since V2:
> > > - Tweak the Kconfig help
> > > - Add comment for the read_lock() pairing in virtio_ccw
> > > ---
> > >  drivers/s390/virtio/virtio_ccw.c |  9 -
> > >  drivers/virtio/Kconfig   | 13 +
> > >  drivers/virtio/virtio.c  |  2 ++
> > >  drivers/virtio/virtio_ring.c | 12 
> > >  include/linux/virtio_config.h|  2 ++
> > >  5 files changed, 37 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/s390/virtio/virtio_ccw.c 
> > > b/drivers/s390/virtio/virtio_ccw.c
> > > index 97e51c34e6cf..1f6a358f65f0 100644
> > > --- a/drivers/s390/virtio/virtio_ccw.c
> > > +++ b/drivers/s390/virtio/virtio_ccw.c
> > > @@ -1136,8 +1136,13 @@ static void virtio_ccw_int_handler(struct 
> > > ccw_device *cdev,
> > >   vcdev->err = -EIO;
> > >   }
> > >   virtio_ccw_check_activity(vcdev, activity);
> > > - /* Interrupts are disabled here */
> > > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > > + /*
> > > +  * Paried with virtio_ccw_synchronize_cbs() and interrupts are
> > > +  * disabled here.
> > > +  */
> > >   read_lock(>irq_lock);
> > > +#endif
> > >   for_each_set_bit(i, indicators(vcdev),
> > >sizeof(*indicators(vcdev)) * BITS_PER_BYTE) {
> > >   /* The bit clear must happen before the vring kick. */
> > > @@ -1146,7 +1151,9 @@ static void virtio_ccw_int_handler(struct 
> > > ccw_device *cdev,
> > >   vq = virtio_ccw_vq_by_ind(vcdev, i);
> > >   vring_interrupt(0, vq);
> > >   }
> > > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > >   read_unlock(>irq_lock);
> > > +#endif
> > >   if (test_bit(0, indicators2(vcdev))) {
> > >   virtio_config_changed(>vdev);
> > >   clear_bit(0, indicators2(vcdev));
> > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > index b5adf6abd241..c04f370a1e5c 100644
> > > --- a/drivers/virtio/Kconfig
> > > +++ b/drivers/virtio/Kconfig
> > > @@ -35,6 +35,19 @@ menuconfig VIRTIO_MENU
> > >
> > >  if VIRTIO_MENU
> > >
> > > +config VIRTIO_HARDEN_NOTIFICATION
> > > +bool "Harden virtio notification"
> > > +help
> > > +  Enable this to harden the device notifications and suppress
> > > +  those that happen at a time where notifications are illegal.
> > > +
> > > +  Experimental: Note that several drivers still have bugs that
> > > +  may cause crashes or hangs when correct handling of
> > > +  notifications is enforced; depending on the subset of
> > > +  drivers and devices you use, this may or may not work.
> > > +
> > > +  If unsure, say N.
> > > +
> > >  config VIRTIO_PCI
> > >   tristate "PCI driver for virtio devices"
> > >   depends on PCI
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index ef04a96942bf..21dc08d2f32d 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -220,6 +220,7 @@ static int virtio_features_ok(struct virtio_device 
> > > *dev)
> > >   * */
> > >  void virtio_reset_device(struct virtio_device *dev)
> > >  {
> > > +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
> > >   /*
> > >* The below virtio_synchronize_cbs() guarantees that any
> > >* interrupt for this line arriving after
> > > @@ -228,6 +229,7 @@ void 

Re: [PATCH v4] virtio_ring : keep used_wrap_counter in vq->last_used_idx

2022-06-24 Thread Michael S. Tsirkin
On Thu, Jun 23, 2022 at 09:30:47AM +0800, Jason Wang wrote:
> On Wed, Jun 22, 2022 at 8:16 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 22, 2022 at 04:51:22PM +0800, Jason Wang wrote:
> > > On Fri, Jun 17, 2022 at 10:04 AM Albert Huang
> > >  wrote:
> > > >
> > > > From: "huangjie.albert" 
> > > >
> > > > the used_wrap_counter and the vq->last_used_idx may get
> > > > out of sync if they are separate assignment,and interrupt
> > > > might use an incorrect value to check for the used index.
> > > >
> > > > for example:OOB access
> > > > ksoftirqd may consume the packet and it will call:
> > > > virtnet_poll
> > > > -->virtnet_receive
> > > > -->virtqueue_get_buf_ctx
> > > > -->virtqueue_get_buf_ctx_packed
> > > > and in virtqueue_get_buf_ctx_packed:
> > > >
> > > > vq->last_used_idx += vq->packed.desc_state[id].num;
> > > > if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
> > > >  vq->last_used_idx -= vq->packed.vring.num;
> > > >  vq->packed.used_wrap_counter ^= 1;
> > > > }
> > > >
> > > > if at the same time, there comes a vring interrupt,in vring_interrupt:
> > > > we will call:
> > > > vring_interrupt
> > > > -->more_used
> > > > -->more_used_packed
> > > > -->is_used_desc_packed
> > > > in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num.
> > > > so this could case a memory out of bounds bug.
> > > >
> > > > this patch is to keep the used_wrap_counter in vq->last_used_idx
> > > > so we can get the correct value to check for used index in interrupt.
> > > >
> > > > v3->v4:
> > > > - use READ_ONCE/WRITE_ONCE to get/set vq->last_used_idx
> > > >
> > > > v2->v3:
> > > > - add inline function to get used_wrap_counter and last_used
> > > > - when use vq->last_used_idx, only read once
> > > >   if vq->last_used_idx is read twice, the values can be inconsistent.
> > > > - use last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))
> > > >   to get the all bits below VRING_PACKED_EVENT_F_WRAP_CTR
> > > >
> > > > v1->v2:
> > > > - reuse the VRING_PACKED_EVENT_F_WRAP_CTR
> > > > - Remove parameter judgment in is_used_desc_packed,
> > > > because it can't be illegal
> > > >
> > > > Signed-off-by: huangjie.albert 
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 75 ++--
> > > >  1 file changed, 47 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 13a7348cedff..719fbbe716d6 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -111,7 +111,12 @@ struct vring_virtqueue {
> > > > /* Number we've added since last sync. */
> > > > unsigned int num_added;
> > > >
> > > > -   /* Last used index we've seen. */
> > > > +   /* Last used index  we've seen.
> > > > +* for split ring, it just contains last used index
> > > > +* for packed ring:
> > > > +* bits up to VRING_PACKED_EVENT_F_WRAP_CTR include the last 
> > > > used index.
> > > > +* bits from VRING_PACKED_EVENT_F_WRAP_CTR include the used 
> > > > wrap counter.
> > > > +*/
> > > > u16 last_used_idx;
> > > >
> > > > /* Hint for event idx: already triggered no need to disable. */
> > > > @@ -154,9 +159,6 @@ struct vring_virtqueue {
> > > > /* Driver ring wrap counter. */
> > > > bool avail_wrap_counter;
> > > >
> > > > -   /* Device ring wrap counter. */
> > > > -   bool used_wrap_counter;
> > > > -
> > > > /* Avail used flags. */
> > > > u16 avail_used_flags;
> > > >
> > > > @@ -973,6 +975,15 @@ static struct virtqueue 
> > > > *vring_create_virtqueue_split(
> > > >  /*
> > > >   * Packed ring specific functions - *_packed().
> > > >   */
> > > > +static inline bool packed_used_wrap_counter(u16 last_used_idx)
> > > > +{
> > > > +   return !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> > > > +}
> > > > +
> > > > +static inline u16 packed_last_used(u16 last_used_idx)
> > > > +{
> > > > +   return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> > > > +}
> > >
> > > Any reason we need a minus after the shift?
> >
> > The point is to say "all bits above VRING_PACKED_EVENT_F_WRAP_CTR".
> > Has no effect currently but will if last_used_idx is extended to 32 bit.
> 
> Ok, but we don't do this for other uses for VRING_PACKED_EVENT_F_WRAP_CTR.
> 
> I wonder how much value we do it only here.
> 
> Thanks

I don't care much either way. Feel free to go ahead and play with
different versions so see which works better.

> >
> >
> > > Others look good.
> > >
> > > Thanks
> > >
> > > >
> > > >  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > > >  struct 

RE: [PATCH v3 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-24 Thread Tian, Kevin
> From: Yong Wu
> Sent: Friday, June 24, 2022 1:39 PM
> 
> On Thu, 2022-06-23 at 19:44 -0700, Nicolin Chen wrote:
> > On Fri, Jun 24, 2022 at 09:35:49AM +0800, Baolu Lu wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On 2022/6/24 04:00, Nicolin Chen wrote:
> > > > diff --git a/drivers/iommu/mtk_iommu_v1.c
> > > > b/drivers/iommu/mtk_iommu_v1.c
> > > > index e1cb51b9866c..5386d889429d 100644
> > > > --- a/drivers/iommu/mtk_iommu_v1.c
> > > > +++ b/drivers/iommu/mtk_iommu_v1.c
> > > > @@ -304,7 +304,7 @@ static int mtk_iommu_v1_attach_device(struct
> > > > iommu_domain *domain, struct device
> > > >   /* Only allow the domain created internally. */
> > > >   mtk_mapping = data->mapping;
> > > >   if (mtk_mapping->domain != domain)
> > > > - return 0;
> > > > + return -EMEDIUMTYPE;
> > > >
> > > >   if (!data->m4u_dom) {
> > > >   data->m4u_dom = dom;
> > >
> > > This change looks odd. It turns the return value from success to
> > > failure. Is it a bug? If so, it should go through a separated fix
> > > patch.
> 
> Thanks for the review:)
> 
> >
> > Makes sense.
> >
> > I read the commit log of the original change:
> >
> https://lore.kernel.org/r/1589530123-30240-1-git-send-email-
> yong...@mediatek.com
> >
> > It doesn't seem to allow devices to get attached to different
> > domains other than the shared mapping->domain, created in the
> > in the mtk_iommu_probe_device(). So it looks like returning 0
> > is intentional. Though I am still very confused by this return
> > value here, I doubt it has ever been used in a VFIO context.
> 
> It's not used in VFIO context. "return 0" just satisfy the iommu
> framework to go ahead. and yes, here we only allow the shared "mapping-
> >domain" (All the devices share a domain created internally).
> 
> thus I think we should still keep "return 0" here.
> 

What prevent this driver from being used in VFIO context?

and why would we want to go ahead when an obvious error occurs
i.e. when a device is attached to an unexpected domain?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization