Re: [PATCH V6 8/9] virtio: harden vring IRQ

2022-06-16 Thread Michael S. Tsirkin
On Fri, Jun 17, 2022 at 09:24:57AM +0800, Jason Wang wrote:
> On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason 
> > > > > > > > > > > > > > > Wang wrote:
> > > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening 
> > > > > > > > > > > > > > > > that is done for
> > > > > > > > > > > > > > > > virtio-pci where several drawbacks were found 
> > > > > > > > > > > > > > > > and were reverted:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not 
> > > > > > > > > > > > > > > > friendly to affinity managed IRQ
> > > > > > > > > > > > > > > >that is used by some device such as 
> > > > > > > > > > > > > > > > virtio-blk
> > > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The vq->broken is re-used in this patch for 
> > > > > > > > > > > > > > > > implementing the IRQ
> > > > > > > > > > > > > > > > hardening. The vq->broken is set to true during 
> > > > > > > > > > > > > > > > both initialization
> > > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() 
> > > > > > > > > > > > > > > > can check and return
> > > > > > > > > > > > > > > > when vq->broken is true. And in this case, 
> > > > > > > > > > > > > > > > switch to return IRQ_NONE
> > > > > > > > > > > > > > > > to let the interrupt core aware of such invalid 
> > > > > > > > > > > > > > > > interrupt to prevent
> > > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The reason of using a per queue variable 
> > > > > > > > > > > > > > > > instead of a per device one
> > > > > > > > > > > > > > > > is that we may need it for per queue reset 
> > > > > > > > > > > > > > > > hardening in the future.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Note that the hardening is only done for vring 
> > > > > > > > > > > > > > > > interrupt since the
> > > > > > > > > > > > > > > > config interrupt hardening is already done in 
> > > > > > > > > > > > > > > > commit 22b7050a024d7
> > > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). 
> > > > > > > > > > > > > > > > But the method that is
> > > > > > > > > > > > > > > > used by config interrupt can't be reused by the 
> > > > > > > > > > > > > > > > vring interrupt
> > > > > > > > > > > > > > > > handler because it uses spinlock to do the 
> > > > > > > > > > > > > > > > synchronization which is
> > > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > > > > > > Cc: Vineeth Vijayan 
> > > > > > > > > > > > > > > > Cc: Peter Oberparleiter 
> > > > > > > > > > > > > > > > Cc: linux-s...@vger.kernel.org
> > > > > > > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > > I propose adding a flag to suppress the hardening 
> > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > > >
> 

Re: [PATCH V6 8/9] virtio: harden vring IRQ

2022-06-16 Thread Jason Wang
On Thu, Jun 16, 2022 at 2:24 AM Cristian Marussi
 wrote:
>
> On Wed, Jun 15, 2022 at 09:41:18AM +0800, Jason Wang wrote:
> > On Wed, Jun 15, 2022 at 12:46 AM Cristian Marussi
> >  wrote:
>
> Hi Jason,
>
> > >
> > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > >
>
> [snip]
>
> > > >
> > > > >  arm_scmi
> > > >
> > > > It looks to me the singleton device could be used by SCMI immediately 
> > > > after
> > > >
> > > > /* Ensure initialized scmi_vdev is visible */
> > > > smp_store_mb(scmi_vdev, vdev);
> > > >
> > > > So we probably need to do virtio_device_ready() before that. It has an
> > > > optional rx queue but the filling is done after the above assignment,
> > > > so it's safe. And the callback looks safe is a callback is triggered
> > > > after virtio_device_ready() buy before the above assignment.
> > > >
> > >
> > > I wanted to give it a go at this series testing it on the context of
> > > SCMI but it does not apply
> > >
> > > - not on a v5.18:
> > >
> > > 17:33 $ git rebase -i v5.18
> > > 17:33 $ git am 
> > > ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > > Applying: virtio: use virtio_reset_device() when possible
> > > Applying: virtio: introduce config op to synchronize vring callbacks
> > > Applying: virtio-pci: implement synchronize_cbs()
> > > Applying: virtio-mmio: implement synchronize_cbs()
> > > error: patch failed: drivers/virtio/virtio_mmio.c:345
> > > error: drivers/virtio/virtio_mmio.c: patch does not apply
> > > Patch failed at 0005 virtio-mmio: implement synchronize_cbs()
> > >
> > > - neither on a v5.19-rc2:
> > >
> > > 17:33 $ git rebase -i v5.19-rc2
> > > 17:35 $ git am 
> > > ./v6_20220527_jasowang_rework_on_the_irq_hardening_of_virtio.mbx
> > > Applying: virtio: use virtio_device_ready() in virtio_device_restore()
> > > error: patch failed: drivers/virtio/virtio.c:526
> > > error: drivers/virtio/virtio.c: patch does not apply
> > > Patch failed at 0001 virtio: use virtio_device_ready() in
> > > virtio_device_restore()
> > > hint: Use 'git am --show-current-patch=diff' to see the failed patch
> > > When you have resolved this problem, run "git am --continue".
> > >
> > > ... what I should take as base ?
> >
> > It should have already been included in rc2, so there's no need to
> > apply patch manually.
> >
>
> I tested this series as included in v5.19-rc2 (WITHOUT adding a 
> virtio_device_ready
> in SCMI virtio as you mentioned above ... if I got it right) and I have NOT 
> seen any
> issue around SCMI virtio using my usual test setup (using both SCMI vqueues).
>
> No anomalies even when using SCMI virtio in atomic/polling mode.
>
> Adding a virtio_device_ready() at the end of the SCMI virtio probe()
> works fine either, it does not make any difference in my setup.
> (both using QEMU and kvmtool with this latter NOT supporting
>  virtio_V1...not sure if it makes a difference but I thought was worth
>  mentioning)

Thanks a lot for the testing.

We want to prevent malicious hypervisors from attacking us. So more questions:

Assuming we do:

virtio_device_ready();
/* Ensure initialized scmi_vdev is visible */
smp_store_mb(scmi_vdev, vdev);

This means we allow the callbacks (scmi_vio_complete) to be called
before smp_store_mb(). We need to make sure the callbacks are robust.
And this looks fine since we have the check of
scmi_vio_channel_acquire() and if the notification is called before
smp_store_mb(), the acquire will fail.

If we put virtio_device_ready() after smp_store_mb() like:

/* Ensure initialized scmi_vdev is visible */
smp_store_mb(scmi_vdev, vdev);
virtio_device_ready();

If I understand correctly, there will be a race since the SCMI may try
to use the device before virtio_device_ready(), this violates the
virtio spec somehow.

Thanks

>
> Thanks,
> Cristian
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-16 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Friday, June 17, 2022 6:41 AM
> 
> > ...
> > > - if (resv_msi) {
> > > + if (resv_msi && !domain->msi_cookie) {
> > >   ret = iommu_get_msi_cookie(domain->domain,
> > > resv_msi_base);
> > >   if (ret && ret != -ENODEV)
> > >   goto out_detach;
> > > + domain->msi_cookie = true;
> > >   }
> >
> > why not moving to alloc_attach_domain() then no need for the new
> > domain field? It's required only when a new domain is allocated.
> 
> When reusing an existing domain that doesn't have an msi_cookie,
> we can do iommu_get_msi_cookie() if resv_msi is found. So it is
> not limited to a new domain.

Looks msi_cookie requirement is per platform (currently only
for smmu. see arm_smmu_get_resv_regions()). If there is
no mixed case then above check is not required.

But let's hear whether Robin has a different thought here.

> 
> > ...
> > > - if (list_empty(>group_list)) {
> > > - if (list_is_singular(>domain_list)) {
> > > - if (list_empty(
> > > >emulated_iommu_groups)) {
> > > - WARN_ON(iommu->notifier.head);
> > > -
> > >   vfio_iommu_unmap_unpin_all(iommu);
> > > - } else {
> > > -
> > >   vfio_iommu_unmap_unpin_reaccount(iommu);
> > > - }
> > > - }
> > > - iommu_domain_free(domain->domain);
> > > - list_del(>next);
> > > - kfree(domain);
> > > - vfio_iommu_aper_expand(iommu, _copy);
> >
> > Previously the aperture is adjusted when a domain is freed...
> >
> > > - vfio_update_pgsize_bitmap(iommu);
> > > - }
> > > - /*
> > > -  * Removal of a group without dirty tracking may allow
> > > -  * the iommu scope to be promoted.
> > > -  */
> > > - if (!group->pinned_page_dirty_scope) {
> > > - iommu->num_non_pinned_groups--;
> > > - if (iommu->dirty_page_tracking)
> > > - vfio_iommu_populate_bitmap_full(iommu);
> > > - }
> > > + vfio_iommu_detach_destroy_domain(domain, iommu,
> > > group);
> > >   kfree(group);
> > >   break;
> > >   }
> > >
> > > + vfio_iommu_aper_expand(iommu, _copy);
> >
> > but now it's done for every group detach. The aperture is decided
> > by domain geometry which is not affected by attached groups.
> 
> Yea, I've noticed this part. Actually Jason did this change for
> simplicity, and I think it'd be safe to do so?

Perhaps detach_destroy() can return a Boolean to indicate whether
a domain is destroyed.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-16 Thread Parav Pandit via Virtualization


> From: Jason Wang 
> Sent: Thursday, June 16, 2022 9:15 PM
> 
> On Fri, Jun 17, 2022 at 3:36 AM Parav Pandit  wrote:
> >
> >
> > > From: Jason Wang 
> > > Sent: Tuesday, June 14, 2022 9:29 PM
> > >
> > > Well, it's an example of how vDPA is implemented. I think we agree
> > > that for vDPA, vendors have the flexibility to implement their perferrable
> datapath.
> > >
> > Yes for the vdpa level and for the virtio level.
> >
> > > >
> > > > I remember few months back, you acked in the weekly meeting that
> > > > TC has
> > > approved the AQ direction.
> > > > And we are still in this circle of debating the AQ.
> > >
> > > I think not. Just to make sure we are on the same page, the proposal
> > > here is for vDPA, and hope it can provide forward compatibility to
> > > virtio. So in the context of vDPA, admin virtqueue is not a must.
> > In context of vdpa over virtio, an efficient transport interface is needed.
> > If AQ is not much any other interface such as hundreds to thousands of
> registers is not must either.
> >
> > AQ is one interface proposed with multiple benefits.
> > I haven’t seen any other alternatives that delivers all the benefits.
> > Only one I have seen is synchronous config registers.
> >
> > If you let vendors progress, handful of sensible interfaces can exist, each
> with different characteristics.
> > How would we proceed from here?
> 
> I'm pretty fine with having admin virtqueue in the virtio spec. If you
> remember, I've even submitted a proposal to use admin virtqueue as a
> transport last year.
> 
> Let's just proceed in the virtio-dev list.

o.k. thanks. I am aligned with your thoughts now.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 0/5] TUN/VirtioNet USO features support.

2022-06-16 Thread Jason Wang
On Thu, Jun 16, 2022 at 7:59 PM Andrew Melnichenko  wrote:
>
> Hi, Jason
> Apparently, your patch should work.
> For now, I have an issue where segmentation between two guests on one
> host still occurs.
> Neither previous "hack" nor your patch helps.
> Now I'm looking what the issue may be.
> If you have some suggestions on where may I look, it would be helpful, thanks!

I think maybe it's worth tracking which function did the segmentation.

Thanks

>
> On Thu, May 26, 2022 at 3:18 PM Andrew Melnichenko  wrote:
> >
> > I'll check it, thank you!
> >
> > On Thu, May 26, 2022 at 9:56 AM Jason Wang  wrote:
> > >
> > > On Tue, May 24, 2022 at 7:07 PM Andrew Melnichenko  
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > The issue is that host segments packets between guests on the same host.
> > > > Tests show that it happens because SKB_GSO_DODGY skb offload in
> > > > virtio_net_hdr_from_skb().
> > > > To do segmentation you need to remove SKB_GSO_DODGY or add 
> > > > SKB_GSO_PARTIAL
> > > > The solution with DODGY/PARTIAL offload looks like a dirty hack, so
> > > > for now, I've lived it as it is for further investigation.
> > >
> > > Ok, I managed to find the previous discussion. It looks to me the
> > > reason is that __udp_gso_segment will segment dodgy packets
> > > unconditionally.
> > >
> > > I wonder if the attached patch works? (compile test only).
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > On Tue, May 17, 2022 at 9:32 AM Jason Wang  wrote:
> > > > >
> > > > > On Thu, May 12, 2022 at 7:33 PM Andrew Melnychenko 
> > > > >  wrote:
> > > > > >
> > > > > > Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
> > > > > > Technically they enable NETIF_F_GSO_UDP_L4
> > > > > > (and only if USO4 & USO6 are set simultaneously).
> > > > > > It allows to transmission of large UDP packets.
> > > > > >
> > > > > > Different features USO4 and USO6 are required for qemu where 
> > > > > > Windows guests can
> > > > > > enable disable USO receives for IPv4 and IPv6 separately.
> > > > > > On the other side, Linux can't really differentiate USO4 and USO6, 
> > > > > > for now.
> > > > > > For now, to enable USO for TUN it requires enabling USO4 and USO6 
> > > > > > together.
> > > > > > In the future, there would be a mechanism to control UDP_L4 GSO 
> > > > > > separately.
> > > > > >
> > > > > > Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2
> > > > > >
> > > > > > New types for VirtioNet already on mailing:
> > > > > > https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html
> > > > > >
> > > > > > Also, there is a known issue with transmitting packages between two 
> > > > > > guests.
> > > > >
> > > > > Could you explain this more? It looks like a bug. (Or any pointer to
> > > > > the discussion)
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Without hacks with skb's GSO - packages are still segmented on the 
> > > > > > host's postrouting.
> > > > > >
> > > > > > Andrew (5):
> > > > > >   uapi/linux/if_tun.h: Added new offload types for USO4/6.
> > > > > >   driver/net/tun: Added features for USO.
> > > > > >   uapi/linux/virtio_net.h: Added USO types.
> > > > > >   linux/virtio_net.h: Support USO offload in vnet header.
> > > > > >   drivers/net/virtio_net.c: Added USO support.
> > > > > >
> > > > > >  drivers/net/tap.c   | 10 --
> > > > > >  drivers/net/tun.c   |  8 +++-
> > > > > >  drivers/net/virtio_net.c| 19 +++
> > > > > >  include/linux/virtio_net.h  |  9 +
> > > > > >  include/uapi/linux/if_tun.h |  2 ++
> > > > > >  include/uapi/linux/virtio_net.h |  4 
> > > > > >  6 files changed, 45 insertions(+), 7 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 2.35.1
> > > > > >
> > > > >
> > > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V6 8/9] virtio: harden vring IRQ

2022-06-16 Thread Jason Wang
On Fri, Jun 17, 2022 at 1:11 AM Michael S. Tsirkin  wrote:
>
> On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> > On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang 
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason 
> > > > > > > > > > > > > > Wang wrote:
> > > > > > > > > > > > > > > This is a rework on the previous IRQ hardening 
> > > > > > > > > > > > > > > that is done for
> > > > > > > > > > > > > > > virtio-pci where several drawbacks were found and 
> > > > > > > > > > > > > > > were reverted:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not 
> > > > > > > > > > > > > > > friendly to affinity managed IRQ
> > > > > > > > > > > > > > >that is used by some device such as virtio-blk
> > > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The vq->broken is re-used in this patch for 
> > > > > > > > > > > > > > > implementing the IRQ
> > > > > > > > > > > > > > > hardening. The vq->broken is set to true during 
> > > > > > > > > > > > > > > both initialization
> > > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can 
> > > > > > > > > > > > > > > check and return
> > > > > > > > > > > > > > > when vq->broken is true. And in this case, switch 
> > > > > > > > > > > > > > > to return IRQ_NONE
> > > > > > > > > > > > > > > to let the interrupt core aware of such invalid 
> > > > > > > > > > > > > > > interrupt to prevent
> > > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The reason of using a per queue variable instead 
> > > > > > > > > > > > > > > of a per device one
> > > > > > > > > > > > > > > is that we may need it for per queue reset 
> > > > > > > > > > > > > > > hardening in the future.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Note that the hardening is only done for vring 
> > > > > > > > > > > > > > > interrupt since the
> > > > > > > > > > > > > > > config interrupt hardening is already done in 
> > > > > > > > > > > > > > > commit 22b7050a024d7
> > > > > > > > > > > > > > > ("virtio: defer config changed notifications"). 
> > > > > > > > > > > > > > > But the method that is
> > > > > > > > > > > > > > > used by config interrupt can't be reused by the 
> > > > > > > > > > > > > > > vring interrupt
> > > > > > > > > > > > > > > handler because it uses spinlock to do the 
> > > > > > > > > > > > > > > synchronization which is
> > > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > > > > > Cc: Vineeth Vijayan 
> > > > > > > > > > > > > > > Cc: Peter Oberparleiter 
> > > > > > > > > > > > > > > Cc: linux-s...@vger.kernel.org
> > > > > > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by 
> > > > > > > > > > > > > default, it
> > > > > > > > > > > > > won't be used by the users so there's no way for us 
> > > > > > > > > > > > > to receive the bug
> > 

Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-16 Thread Jason Wang
On Fri, Jun 17, 2022 at 3:36 AM Parav Pandit  wrote:
>
>
> > From: Jason Wang 
> > Sent: Tuesday, June 14, 2022 9:29 PM
> >
> > Well, it's an example of how vDPA is implemented. I think we agree that for
> > vDPA, vendors have the flexibility to implement their perferrable datapath.
> >
> Yes for the vdpa level and for the virtio level.
>
> > >
> > > I remember few months back, you acked in the weekly meeting that TC has
> > approved the AQ direction.
> > > And we are still in this circle of debating the AQ.
> >
> > I think not. Just to make sure we are on the same page, the proposal here is
> > for vDPA, and hope it can provide forward compatibility to virtio. So in the
> > context of vDPA, admin virtqueue is not a must.
> In context of vdpa over virtio, an efficient transport interface is needed.
> If AQ is not much any other interface such as hundreds to thousands of 
> registers is not must either.
>
> AQ is one interface proposed with multiple benefits.
> I haven’t seen any other alternatives that delivers all the benefits.
> Only one I have seen is synchronous config registers.
>
> If you let vendors progress, handful of sensible interfaces can exist, each 
> with different characteristics.
> How would we proceed from here?

I'm pretty fine with having admin virtqueue in the virtio spec. If you
remember, I've even submitted a proposal to use admin virtqueue as a
transport last year.

Let's just proceed in the virtio-dev list.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-16 Thread Parav Pandit via Virtualization

> From: Jason Wang 
> Sent: Tuesday, June 14, 2022 9:29 PM
> 
> Well, it's an example of how vDPA is implemented. I think we agree that for
> vDPA, vendors have the flexibility to implement their perferrable datapath.
>
Yes for the vdpa level and for the virtio level.

> >
> > I remember few months back, you acked in the weekly meeting that TC has
> approved the AQ direction.
> > And we are still in this circle of debating the AQ.
> 
> I think not. Just to make sure we are on the same page, the proposal here is
> for vDPA, and hope it can provide forward compatibility to virtio. So in the
> context of vDPA, admin virtqueue is not a must.
In context of vdpa over virtio, an efficient transport interface is needed.
If AQ is not much any other interface such as hundreds to thousands of 
registers is not must either.

AQ is one interface proposed with multiple benefits.
I haven’t seen any other alternatives that delivers all the benefits.
Only one I have seen is synchronous config registers.

If you let vendors progress, handful of sensible interfaces can exist, each 
with different characteristics.
How would we proceed from here?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V6 8/9] virtio: harden vring IRQ

2022-06-16 Thread Michael S. Tsirkin
On Wed, Jun 15, 2022 at 09:38:18AM +0800, Jason Wang wrote:
> On Tue, Jun 14, 2022 at 11:49 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 14, 2022 at 03:40:21PM +0800, Jason Wang wrote:
> > > On Mon, Jun 13, 2022 at 5:28 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, Jun 13, 2022 at 05:14:59PM +0800, Jason Wang wrote:
> > > > > On Mon, Jun 13, 2022 at 5:08 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jun 13, 2022 at 4:59 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Jun 13, 2022 at 04:51:08PM +0800, Jason Wang wrote:
> > > > > > > > On Mon, Jun 13, 2022 at 4:19 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Jun 13, 2022 at 04:07:09PM +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Jun 13, 2022 at 3:23 PM Michael S. Tsirkin 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 13, 2022 at 01:26:59PM +0800, Jason Wang 
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Sat, Jun 11, 2022 at 1:12 PM Michael S. Tsirkin 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, May 27, 2022 at 02:01:19PM +0800, Jason Wang 
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > This is a rework on the previous IRQ hardening that 
> > > > > > > > > > > > > > is done for
> > > > > > > > > > > > > > virtio-pci where several drawbacks were found and 
> > > > > > > > > > > > > > were reverted:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) try to use IRQF_NO_AUTOEN which is not friendly 
> > > > > > > > > > > > > > to affinity managed IRQ
> > > > > > > > > > > > > >that is used by some device such as virtio-blk
> > > > > > > > > > > > > > 2) done only for PCI transport
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The vq->broken is re-used in this patch for 
> > > > > > > > > > > > > > implementing the IRQ
> > > > > > > > > > > > > > hardening. The vq->broken is set to true during 
> > > > > > > > > > > > > > both initialization
> > > > > > > > > > > > > > and reset. And the vq->broken is set to false in
> > > > > > > > > > > > > > virtio_device_ready(). Then vring_interrupt() can 
> > > > > > > > > > > > > > check and return
> > > > > > > > > > > > > > when vq->broken is true. And in this case, switch 
> > > > > > > > > > > > > > to return IRQ_NONE
> > > > > > > > > > > > > > to let the interrupt core aware of such invalid 
> > > > > > > > > > > > > > interrupt to prevent
> > > > > > > > > > > > > > IRQ storm.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The reason of using a per queue variable instead of 
> > > > > > > > > > > > > > a per device one
> > > > > > > > > > > > > > is that we may need it for per queue reset 
> > > > > > > > > > > > > > hardening in the future.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Note that the hardening is only done for vring 
> > > > > > > > > > > > > > interrupt since the
> > > > > > > > > > > > > > config interrupt hardening is already done in 
> > > > > > > > > > > > > > commit 22b7050a024d7
> > > > > > > > > > > > > > ("virtio: defer config changed notifications"). But 
> > > > > > > > > > > > > > the method that is
> > > > > > > > > > > > > > used by config interrupt can't be reused by the 
> > > > > > > > > > > > > > vring interrupt
> > > > > > > > > > > > > > handler because it uses spinlock to do the 
> > > > > > > > > > > > > > synchronization which is
> > > > > > > > > > > > > > expensive.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cc: Thomas Gleixner 
> > > > > > > > > > > > > > Cc: Peter Zijlstra 
> > > > > > > > > > > > > > Cc: "Paul E. McKenney" 
> > > > > > > > > > > > > > Cc: Marc Zyngier 
> > > > > > > > > > > > > > Cc: Halil Pasic 
> > > > > > > > > > > > > > Cc: Cornelia Huck 
> > > > > > > > > > > > > > Cc: Vineeth Vijayan 
> > > > > > > > > > > > > > Cc: Peter Oberparleiter 
> > > > > > > > > > > > > > Cc: linux-s...@vger.kernel.org
> > > > > > > > > > > > > > Signed-off-by: Jason Wang 
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jason, I am really concerned by all the fallout.
> > > > > > > > > > > > > I propose adding a flag to suppress the hardening -
> > > > > > > > > > > > > this will be a debugging aid and a work around for
> > > > > > > > > > > > > users if we find more buggy drivers.
> > > > > > > > > > > > >
> > > > > > > > > > > > > suppress_interrupt_hardening ?
> > > > > > > > > > > >
> > > > > > > > > > > > I can post a patch but I'm afraid if we disable it by 
> > > > > > > > > > > > default, it
> > > > > > > > > > > > won't be used by the users so there's no way for us to 
> > > > > > > > > > > > receive the bug
> > > > > > > > > > > > report. Or we need a plan to enable it by default.
> > > > > > > > > > > >
> > > > > > > > > > > > It's rc2, how about waiting for 1 and 2 rc? Or it looks 
> > > > > > > > > > > > better if we
> > > > > > > > > > > > simply 

Re: [PATCH v4] virtio_ring : keep used_wrap_counter in vq->last_used_idx

2022-06-16 Thread Michael S. Tsirkin
On Thu, Jun 16, 2022 at 08:57:36PM +0800, Albert Huang wrote:
> From: "huangjie.albert" 
> 
> the used_wrap_counter and the vq->last_used_idx may get
> out of sync if they are separate assignment,and interrupt
> might use an incorrect value to check for the used index.
> 
> for example:OOB access
> ksoftirqd may consume the packet and it will call:
> virtnet_poll
>   -->virtnet_receive
>   -->virtqueue_get_buf_ctx
>   -->virtqueue_get_buf_ctx_packed
> and in virtqueue_get_buf_ctx_packed:
> 
> vq->last_used_idx += vq->packed.desc_state[id].num;
> if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
>  vq->last_used_idx -= vq->packed.vring.num;
>  vq->packed.used_wrap_counter ^= 1;
> }
> 
> if at the same time, there comes a vring interrupt,in vring_interrupt:
> we will call:
> vring_interrupt
>   -->more_used
>   -->more_used_packed
>   -->is_used_desc_packed
> in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num.
> so this could case a memory out of bounds bug.
> 
> this patch is to keep the used_wrap_counter in vq->last_used_idx
> so we can get the correct value to check for used index in interrupt.
> 
> v3->v4:
> - use READ_ONCE/WRITE_ONCE to get/set vq->last_used_idx
> 
> v2->v3:
> - add inline function to get used_wrap_counter and last_used
> - when use vq->last_used_idx, only read once
>   if vq->last_used_idx is read twice, the values can be inconsistent.
> - use last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))
>   to get the all bits below VRING_PACKED_EVENT_F_WRAP_CTR
> 
> v1->v2:
> - reuse the VRING_PACKED_EVENT_F_WRAP_CTR
> - Remove parameter judgment in is_used_desc_packed,
> because it can't be illegal
> 
> Signed-off-by: huangjie.albert 
> ---
>  drivers/virtio/virtio_ring.c | 75 ++--
>  1 file changed, 47 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..719fbbe716d6 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -111,7 +111,12 @@ struct vring_virtqueue {
>   /* Number we've added since last sync. */
>   unsigned int num_added;
>  
> - /* Last used index we've seen. */
> + /* Last used index  we've seen.
> +  * for split ring, it just contains last used index
> +  * for packed ring:
> +  * bits up to VRING_PACKED_EVENT_F_WRAP_CTR include the last used index.
> +  * bits VRING_PACKED_EVENT_F_WRAP_CTR include the used wrap counter.

bits from VRING_PACKED_EVENT_F_WRAP_CTR

> +  */
>   u16 last_used_idx;
>  
>   /* Hint for event idx: already triggered no need to disable. */
> @@ -154,9 +159,6 @@ struct vring_virtqueue {
>   /* Driver ring wrap counter. */
>   bool avail_wrap_counter;
>  
> - /* Device ring wrap counter. */
> - bool used_wrap_counter;
> -
>   /* Avail used flags. */
>   u16 avail_used_flags;
>  
> @@ -973,6 +975,15 @@ static struct virtqueue *vring_create_virtqueue_split(
>  /*
>   * Packed ring specific functions - *_packed().
>   */
> +static inline bool packed_used_wrap_counter(u16 last_used_idx)
> +{
> + return !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> +}
> +
> +static inline u16 packed_last_used(u16 last_used_idx)
> +{
> + return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> +}
>  
>  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>struct vring_desc_extra *extra)
> @@ -1406,8 +1417,14 @@ static inline bool is_used_desc_packed(const struct 
> vring_virtqueue *vq,
>  
>  static inline bool more_used_packed(const struct vring_virtqueue *vq)
>  {
> - return is_used_desc_packed(vq, vq->last_used_idx,
> - vq->packed.used_wrap_counter);
> + u16 last_used;
> + u16 last_used_idx;
> + bool used_wrap_counter;
> +
> + last_used_idx = READ_ONCE(vq->last_used_idx);
> + last_used = packed_last_used(last_used_idx);
> + used_wrap_counter = packed_used_wrap_counter(last_used_idx);
> + return is_used_desc_packed(vq, last_used, used_wrap_counter);
>  }
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> @@ -1415,7 +1432,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
> void **ctx)
>  {
>   struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 last_used, id;
> + u16 last_used, id, last_used_idx;
> + bool used_wrap_counter;
>   void *ret;
>  
>   START_USE(vq);
> @@ -1434,7 +1452,9 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   /* Only get used elements after they have been exposed by host. */
>   virtio_rmb(vq->weak_barriers);
>  
> - last_used = vq->last_used_idx;
> +

Re: [PATCH v3] virtio_ring : keep used_wrap_counter in vq->last_used_idx

2022-06-16 Thread Michael S. Tsirkin
On Thu, Jun 16, 2022 at 05:54:59PM +0800, Albert Huang wrote:
> From: "huangjie.albert" 
> 
> the used_wrap_counter and the vq->last_used_idx may get
> out of sync if they are separate assignment,and interrupt
> might use an incorrect value to check for the used index.
> 
> for example:OOB access
> ksoftirqd may consume the packet and it will call:
> virtnet_poll
>   -->virtnet_receive
>   -->virtqueue_get_buf_ctx
>   -->virtqueue_get_buf_ctx_packed
> and in virtqueue_get_buf_ctx_packed:
> 
> vq->last_used_idx += vq->packed.desc_state[id].num;
> if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
>  vq->last_used_idx -= vq->packed.vring.num;
>  vq->packed.used_wrap_counter ^= 1;
> }
> 
> if at the same time, there comes a vring interrupt,in vring_interrupt:
> we will call:
> vring_interrupt
>   -->more_used
>   -->more_used_packed
>   -->is_used_desc_packed
> in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num.
> so this could case a memory out of bounds bug.
> 
> this patch is to keep the used_wrap_counter in vq->last_used_idx
> so we can get the correct value to check for used index in interrupt.
> 
> v2->v3:
> - add inline function to get used_wrap_counter and last_used
> - when use vq->last_used_idx, only read once
>   if vq->last_used_idx is read twice, the values can be inconsistent.
> - use last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))
>   to get the all bits below VRING_PACKED_EVENT_F_WRAP_CTR
> 
> v1->v2:
> - reuse the VRING_PACKED_EVENT_F_WRAP_CTR
> - Remove parameter judgment in is_used_desc_packed,
> because it can't be illegal
> 
> Signed-off-by: huangjie.albert 
> ---
>  drivers/virtio/virtio_ring.c | 75 ++--
>  1 file changed, 47 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..a253f50b8f86 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -111,7 +111,12 @@ struct vring_virtqueue {
>   /* Number we've added since last sync. */
>   unsigned int num_added;
>  
> - /* Last used index we've seen. */
> + /* Last used index  we've seen.
> +  * for split ring, it just contains last used index
> +  * for packed ring, it not only contains last used index, but also
> +  * used_wrap_counter, the VRING_PACKED_EVENT_F_WRAP_CTR is
> +  * the bit shift in last_used_idx


reword:

for packed ring, bits up to VRING_PACKED_EVENT_F_WRAP_CTR include the
last used index. Bits from VRING_PACKED_EVENT_F_WRAP_CTR include the
used wrap counter.

> +  */
>   u16 last_used_idx;
>  
>   /* Hint for event idx: already triggered no need to disable. */
> @@ -154,9 +159,6 @@ struct vring_virtqueue {
>   /* Driver ring wrap counter. */
>   bool avail_wrap_counter;
>  
> - /* Device ring wrap counter. */
> - bool used_wrap_counter;
> -
>   /* Avail used flags. */
>   u16 avail_used_flags;
>  
> @@ -973,6 +975,15 @@ static struct virtqueue *vring_create_virtqueue_split(
>  /*
>   * Packed ring specific functions - *_packed().
>   */
> +static inline bool packed_used_wrap_counter(u16 last_used_idx)
> +{
> + return !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> +}
> +
> +static inline u16 packed_last_used(u16 last_used_idx)
> +{
> + return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> +}
>  
>  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>struct vring_desc_extra *extra)
> @@ -1406,8 +1417,14 @@ static inline bool is_used_desc_packed(const struct 
> vring_virtqueue *vq,
>  
>  static inline bool more_used_packed(const struct vring_virtqueue *vq)
>  {
> - return is_used_desc_packed(vq, vq->last_used_idx,
> - vq->packed.used_wrap_counter);
> + u16 last_used;
> + u16 last_used_idx;
> + bool used_wrap_counter;
> +
> + last_used_idx = vq->last_used_idx;
> + last_used = packed_last_used(last_used_idx);
> + used_wrap_counter = packed_used_wrap_counter(last_used_idx);
> + return is_used_desc_packed(vq, last_used, used_wrap_counter);
>  }
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> @@ -1415,7 +1432,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
> void **ctx)
>  {
>   struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 last_used, id;
> + u16 last_used, id, last_used_idx;
> + bool used_wrap_counter;
>   void *ret;
>  
>   START_USE(vq);
> @@ -1434,7 +1452,9 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   /* Only get used elements after they have been exposed by host. */
>   virtio_rmb(vq->weak_barriers);
>  
> 

Re: [RFC PATCH v2 0/5] TUN/VirtioNet USO features support.

2022-06-16 Thread Andrew Melnichenko
Hi, Jason
Apparently, your patch should work.
For now, I have an issue where segmentation between two guests on one
host still occurs.
Neither previous "hack" nor your patch helps.
Now I'm looking what the issue may be.
If you have some suggestions on where may I look, it would be helpful, thanks!

On Thu, May 26, 2022 at 3:18 PM Andrew Melnichenko  wrote:
>
> I'll check it, thank you!
>
> On Thu, May 26, 2022 at 9:56 AM Jason Wang  wrote:
> >
> > On Tue, May 24, 2022 at 7:07 PM Andrew Melnichenko  
> > wrote:
> > >
> > > Hi all,
> > >
> > > The issue is that host segments packets between guests on the same host.
> > > Tests show that it happens because SKB_GSO_DODGY skb offload in
> > > virtio_net_hdr_from_skb().
> > > To do segmentation you need to remove SKB_GSO_DODGY or add SKB_GSO_PARTIAL
> > > The solution with DODGY/PARTIAL offload looks like a dirty hack, so
> > > for now, I've lived it as it is for further investigation.
> >
> > Ok, I managed to find the previous discussion. It looks to me the
> > reason is that __udp_gso_segment will segment dodgy packets
> > unconditionally.
> >
> > I wonder if the attached patch works? (compile test only).
> >
> > Thanks
> >
> > >
> > >
> > > On Tue, May 17, 2022 at 9:32 AM Jason Wang  wrote:
> > > >
> > > > On Thu, May 12, 2022 at 7:33 PM Andrew Melnychenko  
> > > > wrote:
> > > > >
> > > > > Added new offloads for TUN devices TUN_F_USO4 and TUN_F_USO6.
> > > > > Technically they enable NETIF_F_GSO_UDP_L4
> > > > > (and only if USO4 & USO6 are set simultaneously).
> > > > > It allows to transmission of large UDP packets.
> > > > >
> > > > > Different features USO4 and USO6 are required for qemu where Windows 
> > > > > guests can
> > > > > enable disable USO receives for IPv4 and IPv6 separately.
> > > > > On the other side, Linux can't really differentiate USO4 and USO6, 
> > > > > for now.
> > > > > For now, to enable USO for TUN it requires enabling USO4 and USO6 
> > > > > together.
> > > > > In the future, there would be a mechanism to control UDP_L4 GSO 
> > > > > separately.
> > > > >
> > > > > Test it WIP Qemu https://github.com/daynix/qemu/tree/Dev_USOv2
> > > > >
> > > > > New types for VirtioNet already on mailing:
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202110/msg00010.html
> > > > >
> > > > > Also, there is a known issue with transmitting packages between two 
> > > > > guests.
> > > >
> > > > Could you explain this more? It looks like a bug. (Or any pointer to
> > > > the discussion)
> > > >
> > > > Thanks
> > > >
> > > > > Without hacks with skb's GSO - packages are still segmented on the 
> > > > > host's postrouting.
> > > > >
> > > > > Andrew (5):
> > > > >   uapi/linux/if_tun.h: Added new offload types for USO4/6.
> > > > >   driver/net/tun: Added features for USO.
> > > > >   uapi/linux/virtio_net.h: Added USO types.
> > > > >   linux/virtio_net.h: Support USO offload in vnet header.
> > > > >   drivers/net/virtio_net.c: Added USO support.
> > > > >
> > > > >  drivers/net/tap.c   | 10 --
> > > > >  drivers/net/tun.c   |  8 +++-
> > > > >  drivers/net/virtio_net.c| 19 +++
> > > > >  include/linux/virtio_net.h  |  9 +
> > > > >  include/uapi/linux/if_tun.h |  2 ++
> > > > >  include/uapi/linux/virtio_net.h |  4 
> > > > >  6 files changed, 45 insertions(+), 7 deletions(-)
> > > > >
> > > > > --
> > > > > 2.35.1
> > > > >
> > > >
> > >
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 5/5] vfio/iommu_type1: Simplify group attachment

2022-06-16 Thread Tian, Kevin
> From: Nicolin Chen
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> Un-inline the domain specific logic from the attach/detach_group ops into
> two paired functions vfio_iommu_alloc_attach_domain() and
> vfio_iommu_detach_destroy_domain() that strictly deal with creating and
> destroying struct vfio_domains.
> 
> Add the logic to check for EMEDIUMTYPE return code of
> iommu_attach_group()
> and avoid the extra domain allocations and attach/detach sequences of the
> old code. This allows properly detecting an actual attach error, like
> -ENOMEM, vs treating all attach errors as an incompatible domain.
> 
> Co-developed-by: Jason Gunthorpe 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 298 +---
>  1 file changed, 163 insertions(+), 135 deletions(-)
> 

...
> +static struct vfio_domain *
> +vfio_iommu_alloc_attach_domain(struct bus_type *bus, struct vfio_iommu
> *iommu,
> +struct vfio_iommu_group *group)
> +{
> + struct iommu_domain *new_domain;
> + struct vfio_domain *domain;
> + int ret = 0;
> +
> + /* Try to match an existing compatible domain */
> + list_for_each_entry (domain, >domain_list, next) {
> + ret = iommu_attach_group(domain->domain, group-
> >iommu_group);
> + if (ret == -EMEDIUMTYPE)
> + continue;

Probably good to add one line comment here for what EMEDIUMTYPE
represents. It's not a widely-used retry type like EAGAIN. A comment
can save the time of digging out the fact by jumping to iommu file.

...
> - if (resv_msi) {
> + if (resv_msi && !domain->msi_cookie) {
>   ret = iommu_get_msi_cookie(domain->domain,
> resv_msi_base);
>   if (ret && ret != -ENODEV)
>   goto out_detach;
> + domain->msi_cookie = true;
>   }

why not moving to alloc_attach_domain() then no need for the new
domain field? It's required only when a new domain is allocated.

...
> - if (list_empty(>group_list)) {
> - if (list_is_singular(>domain_list)) {
> - if (list_empty(
> >emulated_iommu_groups)) {
> - WARN_ON(iommu->notifier.head);
> -
>   vfio_iommu_unmap_unpin_all(iommu);
> - } else {
> -
>   vfio_iommu_unmap_unpin_reaccount(iommu);
> - }
> - }
> - iommu_domain_free(domain->domain);
> - list_del(>next);
> - kfree(domain);
> - vfio_iommu_aper_expand(iommu, _copy);

Previously the aperture is adjusted when a domain is freed...

> - vfio_update_pgsize_bitmap(iommu);
> - }
> - /*
> -  * Removal of a group without dirty tracking may allow
> -  * the iommu scope to be promoted.
> -  */
> - if (!group->pinned_page_dirty_scope) {
> - iommu->num_non_pinned_groups--;
> - if (iommu->dirty_page_tracking)
> - vfio_iommu_populate_bitmap_full(iommu);
> - }
> + vfio_iommu_detach_destroy_domain(domain, iommu,
> group);
>   kfree(group);
>   break;
>   }
> 
> + vfio_iommu_aper_expand(iommu, _copy);

but now it's done for every group detach. The aperture is decided
by domain geometry which is not affected by attached groups.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 4/5] vfio/iommu_type1: Clean up update_dirty_scope in detach_group()

2022-06-16 Thread Tian, Kevin
> From: Nicolin Chen
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> All devices in emulated_iommu_groups have pinned_page_dirty_scope
> set, so the update_dirty_scope in the first list_for_each_entry
> is always false. Clean it up, and move the "if update_dirty_scope"
> part from the detach_group_done routine to the domain_list part.
> 
> Rename the "detach_group_done" goto label accordingly.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian , with one nit:

> @@ -2469,7 +2467,7 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
>   WARN_ON(iommu->notifier.head);
>   vfio_iommu_unmap_unpin_all(iommu);
>   }
> - goto detach_group_done;
> + goto out_unlock;
...
> +out_unlock:
>   mutex_unlock(>lock);
>  }
> 

I'd just replace the goto with a direct unlock and then return there. 
the readability is slightly better.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_ring : keep used_wrap_counter in vq->last_used_idx

2022-06-16 Thread Michael S. Tsirkin
On Thu, Jun 16, 2022 at 02:07:19AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jun 16, 2022 at 01:12:21PM +0800, Albert Huang wrote:
> > From: "huangjie.albert" 
> > 
> > the used_wrap_counter and the vq->last_used_idx may get
> > out of sync if they are separate assignment,and interrupt
> > might use an incorrect value to check for the used index.
> > 
> > for example:OOB access
> > ksoftirqd may consume the packet and it will call:
> > virtnet_poll
> > -->virtnet_receive
> > -->virtqueue_get_buf_ctx
> > -->virtqueue_get_buf_ctx_packed
> > and in virtqueue_get_buf_ctx_packed:
> > 
> > vq->last_used_idx += vq->packed.desc_state[id].num;
> > if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
> >  vq->last_used_idx -= vq->packed.vring.num;
> >  vq->packed.used_wrap_counter ^= 1;
> > }
> > 
> > if at the same time, there comes a vring interrupt,in vring_interrupt:
> > we will call:
> > vring_interrupt
> > -->more_used
> > -->more_used_packed
> > -->is_used_desc_packed
> > in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num.
> > so this could case a memory out of bounds bug.
> > 
> > this patch is to keep the used_wrap_counter in vq->last_used_idx
> > so we can get the correct value to check for used index in interrupt.
> > 
> > v1->v2:
> > - reuse the VRING_PACKED_EVENT_F_WRAP_CTR
> > - Remove parameter judgment in is_used_desc_packed,
> > because it can't be illegal
> > 
> > Signed-off-by: huangjie.albert 
> 
> 
> This looks good, just a small suggestion below:
> 
> > ---
> >  drivers/virtio/virtio_ring.c | 57 
> >  1 file changed, 31 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 13a7348cedff..b22d97c9a755 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -111,7 +111,12 @@ struct vring_virtqueue {
> > /* Number we've added since last sync. */
> > unsigned int num_added;
> >  
> > -   /* Last used index we've seen. */
> > +   /* Last used index  we've seen.
> > +* for split ring, it just contains last used index
> > +* for packed ring, it not only contains last used index, but also
> > +* used_wrap_counter, the VRING_PACKED_EVENT_F_WRAP_CTR is
> > +* the bit shift in last_used_idx
> > +*/
> > u16 last_used_idx;
> >  
> > /* Hint for event idx: already triggered no need to disable. */
> > @@ -154,9 +159,6 @@ struct vring_virtqueue {
> > /* Driver ring wrap counter. */
> > bool avail_wrap_counter;
> >  
> > -   /* Device ring wrap counter. */
> > -   bool used_wrap_counter;
> > -
> > /* Avail used flags. */
> > u16 avail_used_flags;
> >  
> > @@ -1406,8 +1408,12 @@ static inline bool is_used_desc_packed(const struct 
> > vring_virtqueue *vq,
> >  
> >  static inline bool more_used_packed(const struct vring_virtqueue *vq)
> >  {
> > -   return is_used_desc_packed(vq, vq->last_used_idx,
> > -   vq->packed.used_wrap_counter);
> > +   u16 last_used;
> > +   bool used_wrap_counter;
> > +
> > +   last_used = vq->last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR);
> 
> This only works if last_used_idx is 16 bit and
> VRING_PACKED_EVENT_F_WRAP_CTR is 15.
> 
> I think you want 
> /* all bits below VRING_PACKED_EVENT_F_WRAP_CTR */
> vq->last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> 
> 
> > +   used_wrap_counter = !!((vq->last_used_idx) >> 
> > VRING_PACKED_EVENT_F_WRAP_CTR);
> 
> 
> A bit more efficient and clear:
> 
> !!(q->last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR))
> 
> 
> 
> Also this logic is repeated in multiple places. Let's add a couple of inline
> functions:
> 
> static inline bool packed_used_wrap_counter(vq)
> 
> static inline u16 packed_last_used(vq)

Or better:

packed_used_wrap_counter(u16 last_used_idx)
packed_last_used(u16 last_used_idx)


> then use these everywhere.
> 
> 
> > +   return is_used_desc_packed(vq, last_used, used_wrap_counter);
> >  }
> >  
> >  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > @@ -1416,6 +1422,7 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> > virtqueue *_vq,
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > u16 last_used, id;
> > +   bool used_wrap_counter;
> > void *ret;
> >  
> > START_USE(vq);
> > @@ -1434,7 +1441,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> > virtqueue *_vq,
> > /* Only get used elements after they have been exposed by host. */
> > virtio_rmb(vq->weak_barriers);
> >  
> > -   last_used = vq->last_used_idx;
> > +   used_wrap_counter = !!((vq->last_used_idx >> 
> > VRING_PACKED_EVENT_F_WRAP_CTR));
> > +   last_used = (vq->last_used_idx) & (~(1 << 
> > VRING_PACKED_EVENT_F_WRAP_CTR));
> > id = 

Re: [PATCH v2] virtio_ring : keep used_wrap_counter in vq->last_used_idx

2022-06-16 Thread Michael S. Tsirkin
On Thu, Jun 16, 2022 at 01:12:21PM +0800, Albert Huang wrote:
> From: "huangjie.albert" 
> 
> the used_wrap_counter and the vq->last_used_idx may get
> out of sync if they are separate assignment,and interrupt
> might use an incorrect value to check for the used index.
> 
> for example:OOB access
> ksoftirqd may consume the packet and it will call:
> virtnet_poll
>   -->virtnet_receive
>   -->virtqueue_get_buf_ctx
>   -->virtqueue_get_buf_ctx_packed
> and in virtqueue_get_buf_ctx_packed:
> 
> vq->last_used_idx += vq->packed.desc_state[id].num;
> if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
>  vq->last_used_idx -= vq->packed.vring.num;
>  vq->packed.used_wrap_counter ^= 1;
> }
> 
> if at the same time, there comes a vring interrupt,in vring_interrupt:
> we will call:
> vring_interrupt
>   -->more_used
>   -->more_used_packed
>   -->is_used_desc_packed
> in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num.
> so this could case a memory out of bounds bug.
> 
> this patch is to keep the used_wrap_counter in vq->last_used_idx
> so we can get the correct value to check for used index in interrupt.
> 
> v1->v2:
> - reuse the VRING_PACKED_EVENT_F_WRAP_CTR
> - Remove parameter judgment in is_used_desc_packed,
> because it can't be illegal
> 
> Signed-off-by: huangjie.albert 
> ---
>  drivers/virtio/virtio_ring.c | 57 
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..b22d97c9a755 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -111,7 +111,12 @@ struct vring_virtqueue {
>   /* Number we've added since last sync. */
>   unsigned int num_added;
>  
> - /* Last used index we've seen. */
> + /* Last used index  we've seen.
> +  * for split ring, it just contains last used index
> +  * for packed ring, it not only contains last used index, but also
> +  * used_wrap_counter, the VRING_PACKED_EVENT_F_WRAP_CTR is
> +  * the bit shift in last_used_idx
> +  */
>   u16 last_used_idx;
>  
>   /* Hint for event idx: already triggered no need to disable. */
> @@ -154,9 +159,6 @@ struct vring_virtqueue {
>   /* Driver ring wrap counter. */
>   bool avail_wrap_counter;
>  
> - /* Device ring wrap counter. */
> - bool used_wrap_counter;
> -
>   /* Avail used flags. */
>   u16 avail_used_flags;
>  
> @@ -1406,8 +1408,12 @@ static inline bool is_used_desc_packed(const struct 
> vring_virtqueue *vq,
>  
>  static inline bool more_used_packed(const struct vring_virtqueue *vq)
>  {
> - return is_used_desc_packed(vq, vq->last_used_idx,
> - vq->packed.used_wrap_counter);
> + u16 last_used;
> + bool used_wrap_counter;
> +
> + last_used = vq->last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR);
> + used_wrap_counter = !!((vq->last_used_idx) >> 
> VRING_PACKED_EVENT_F_WRAP_CTR);
> + return is_used_desc_packed(vq, last_used, used_wrap_counter);

Hmm.

If vq->last_used_idx is read twice like this the values can be inconsistent,
no idea what the result will be if so.

I think we need to read vq->last_used_idx with READ_ONCE.

And I guess write it with WRITE_ONCE for symmetry.



>  }
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> @@ -1416,6 +1422,7 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>  {
>   struct vring_virtqueue *vq = to_vvq(_vq);
>   u16 last_used, id;
> + bool used_wrap_counter;
>   void *ret;
>  
>   START_USE(vq);
> @@ -1434,7 +1441,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   /* Only get used elements after they have been exposed by host. */
>   virtio_rmb(vq->weak_barriers);
>  
> - last_used = vq->last_used_idx;
> + used_wrap_counter = !!((vq->last_used_idx >> 
> VRING_PACKED_EVENT_F_WRAP_CTR));
> + last_used = (vq->last_used_idx) & (~(1 << 
> VRING_PACKED_EVENT_F_WRAP_CTR));
>   id = le16_to_cpu(vq->packed.vring.desc[last_used].id);
>   *len = le32_to_cpu(vq->packed.vring.desc[last_used].len);
>  
> @@ -1451,12 +1459,15 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   ret = vq->packed.desc_state[id].data;
>   detach_buf_packed(vq, id, ctx);
>  
> - vq->last_used_idx += vq->packed.desc_state[id].num;
> - if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
> - vq->last_used_idx -= vq->packed.vring.num;
> - vq->packed.used_wrap_counter ^= 1;
> + last_used += vq->packed.desc_state[id].num;
> + if (unlikely(last_used >= vq->packed.vring.num)) {
> + last_used -= vq->packed.vring.num;
> + 

RE: [PATCH v2 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-06-16 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> The domain->ops validation was added, as a precaution, for mixed-driver
> systems. However, at this moment only one iommu driver is possible. So
> remove it.

It's true on a physical platform. But I'm not sure whether a virtual platform
is allowed to include multiple e.g. one virtio-iommu alongside a virtual VT-d
or a virtual smmu. It might be clearer to claim that (as Robin pointed out)
there is plenty more significant problems than this to solve instead of simply
saying that only one iommu driver is possible if we don't have explicit code
to reject such configuration. 

> 
> Per discussion with Robin, in future when many can be permitted we will
> rely on the IOMMU core code to check the domain->ops:
> https://lore.kernel.org/linux-iommu/6575de6d-94ba-c427-5b1e-
> 967750ddf...@arm.com/
> 
> Signed-off-by: Nicolin Chen 

Apart from that,

Reviewed-by: Kevin Tian 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v2 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-06-16 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> From: Jason Gunthorpe 
> 
> The KVM mechanism for controlling wbinvd is based on OR of the coherency
> property of all devices attached to a guest, no matter those devices are
> attached to a single domain or multiple domains.
> 
> So, there is no value in trying to push a device that could do enforced
> cache coherency to a dedicated domain vs re-using an existing domain
> which is non-coherent since KVM won't be able to take advantage of it.
> This just wastes domain memory.
> 
> Simplify this code and eliminate the test. This removes the only logic
> that needed to have a dummy domain attached prior to searching for a
> matching domain and simplifies the next patches.
> 
> It's unclear whether we want to further optimize the Intel driver to
> update the domain coherency after a device is detached from it, at
> least not before KVM can be verified to handle such dynamics in related
> emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
> we don't see an usage requiring such optimization as the only device
> which imposes such non-coherency is Intel GPU which even doesn't
> support hotplug/hot remove.
> 
> Signed-off-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-06-16 Thread Tian, Kevin
> From: Nicolin Chen 
> Sent: Thursday, June 16, 2022 8:03 AM
> 
> Cases like VFIO wish to attach a device to an existing domain that was
> not allocated specifically from the device. This raises a condition
> where the IOMMU driver can fail the domain attach because the domain and
> device are incompatible with each other.
> 
> This is a soft failure that can be resolved by using a different domain.
> 
> Provide a dedicated errno from the IOMMU driver during attach that the
> reason attached failed is because of domain incompatability. EMEDIUMTYPE
> is chosen because it is never used within the iommu subsystem today and
> evokes a sense that the 'medium' aka the domain is incompatible.
> 
> VFIO can use this to know attach is a soft failure and it should continue
> searching. Otherwise the attach will be a hard failure and VFIO will
> return the code to userspace.
> 
> Update all drivers to return EMEDIUMTYPE in their failure paths that are
> related to domain incompatability. Also turn adjacent error prints into
> debug prints, for these soft failures, to prevent a kernel log spam.
> 
> Add kdocs describing this behavior.
> 
> Suggested-by: Jason Gunthorpe 
> Signed-off-by: Nicolin Chen 

Reviewed-by: Kevin Tian 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_ring : keep used_wrap_counter in vq->last_used_idx

2022-06-16 Thread Michael S. Tsirkin
On Thu, Jun 16, 2022 at 01:12:21PM +0800, Albert Huang wrote:
> From: "huangjie.albert" 
> 
> the used_wrap_counter and the vq->last_used_idx may get
> out of sync if they are separate assignment,and interrupt
> might use an incorrect value to check for the used index.
> 
> for example:OOB access
> ksoftirqd may consume the packet and it will call:
> virtnet_poll
>   -->virtnet_receive
>   -->virtqueue_get_buf_ctx
>   -->virtqueue_get_buf_ctx_packed
> and in virtqueue_get_buf_ctx_packed:
> 
> vq->last_used_idx += vq->packed.desc_state[id].num;
> if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
>  vq->last_used_idx -= vq->packed.vring.num;
>  vq->packed.used_wrap_counter ^= 1;
> }
> 
> if at the same time, there comes a vring interrupt,in vring_interrupt:
> we will call:
> vring_interrupt
>   -->more_used
>   -->more_used_packed
>   -->is_used_desc_packed
> in is_used_desc_packed, the last_used_idx maybe >= vq->packed.vring.num.
> so this could case a memory out of bounds bug.
> 
> this patch is to keep the used_wrap_counter in vq->last_used_idx
> so we can get the correct value to check for used index in interrupt.
> 
> v1->v2:
> - reuse the VRING_PACKED_EVENT_F_WRAP_CTR
> - Remove parameter judgment in is_used_desc_packed,
> because it can't be illegal
> 
> Signed-off-by: huangjie.albert 


This looks good, just a small suggestion below:

> ---
>  drivers/virtio/virtio_ring.c | 57 
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 13a7348cedff..b22d97c9a755 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -111,7 +111,12 @@ struct vring_virtqueue {
>   /* Number we've added since last sync. */
>   unsigned int num_added;
>  
> - /* Last used index we've seen. */
> + /* Last used index  we've seen.
> +  * for split ring, it just contains last used index
> +  * for packed ring, it not only contains last used index, but also
> +  * used_wrap_counter, the VRING_PACKED_EVENT_F_WRAP_CTR is
> +  * the bit shift in last_used_idx
> +  */
>   u16 last_used_idx;
>  
>   /* Hint for event idx: already triggered no need to disable. */
> @@ -154,9 +159,6 @@ struct vring_virtqueue {
>   /* Driver ring wrap counter. */
>   bool avail_wrap_counter;
>  
> - /* Device ring wrap counter. */
> - bool used_wrap_counter;
> -
>   /* Avail used flags. */
>   u16 avail_used_flags;
>  
> @@ -1406,8 +1408,12 @@ static inline bool is_used_desc_packed(const struct 
> vring_virtqueue *vq,
>  
>  static inline bool more_used_packed(const struct vring_virtqueue *vq)
>  {
> - return is_used_desc_packed(vq, vq->last_used_idx,
> - vq->packed.used_wrap_counter);
> + u16 last_used;
> + bool used_wrap_counter;
> +
> + last_used = vq->last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR);

This only works if last_used_idx is 16 bit and
VRING_PACKED_EVENT_F_WRAP_CTR is 15.

I think you want 
/* all bits below VRING_PACKED_EVENT_F_WRAP_CTR */
vq->last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));


> + used_wrap_counter = !!((vq->last_used_idx) >> 
> VRING_PACKED_EVENT_F_WRAP_CTR);


A bit more efficient and clear:

!!(q->last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR))



Also this logic is repeated in multiple places. Let's add a couple of inline
functions:

static inline bool packed_used_wrap_counter(vq)

static inline u16 packed_last_used(vq)

then use these everywhere.


> + return is_used_desc_packed(vq, last_used, used_wrap_counter);
>  }
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> @@ -1416,6 +1422,7 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>  {
>   struct vring_virtqueue *vq = to_vvq(_vq);
>   u16 last_used, id;
> + bool used_wrap_counter;
>   void *ret;
>  
>   START_USE(vq);
> @@ -1434,7 +1441,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   /* Only get used elements after they have been exposed by host. */
>   virtio_rmb(vq->weak_barriers);
>  
> - last_used = vq->last_used_idx;
> + used_wrap_counter = !!((vq->last_used_idx >> 
> VRING_PACKED_EVENT_F_WRAP_CTR));
> + last_used = (vq->last_used_idx) & (~(1 << 
> VRING_PACKED_EVENT_F_WRAP_CTR));
>   id = le16_to_cpu(vq->packed.vring.desc[last_used].id);
>   *len = le32_to_cpu(vq->packed.vring.desc[last_used].len);
>  
> @@ -1451,12 +1459,15 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   ret = vq->packed.desc_state[id].data;
>   detach_buf_packed(vq, id, ctx);
>  
> - vq->last_used_idx += vq->packed.desc_state[id].num;
> - if