date:20230328

Re: [External] Re: 9p regression (Was: [PATCH v2] virtio_ring: don't update event idx on get_buf)

2023-03-28 Thread Michael S. Tsirkin

On Tue, Mar 28, 2023 at 11:39:59AM +0800, Jason Wang wrote:
> On Tue, Mar 28, 2023 at 11:09 AM 黄杰  wrote:
> >
> > Jason Wang  于2023年3月28日周二 10:59写道：
> > >
> > > On Tue, Mar 28, 2023 at 10:13 AM Dominique Martinet
> > >  wrote:
> > > >
> > > > Hi Michael, Albert,
> > > >
> > > > Albert Huang wrote on Sat, Mar 25, 2023 at 06:56:33PM +0800:
> > > > > in virtio_net, if we disable the napi_tx, when we triger a tx 
> > > > > interrupt,
> > > > > the vq->event_triggered will be set to true. It will no longer be set 
> > > > > to
> > > > > false. Unless we explicitly call virtqueue_enable_cb_delayed or
> > > > > virtqueue_enable_cb_prepare.
> > > >
> > > > This patch (commited as 35395770f803 ("virtio_ring: don't update event
> > > > idx on get_buf") in next-20230327 apparently breaks 9p, as reported by
> > > > Luis in 
> > > > https://lkml.kernel.org/r/zci+7wg5oclsl...@bombadil.infradead.org
> > > >
> > > > I've just hit had a look at recent patches[1] and reverted this to test
> > > > and I can mount again, so I'm pretty sure this is the culprit, but I
> > > > didn't look at the content at all yet so cannot advise further.
> > > > It might very well be that we need some extra handling for 9p
> > > > specifically that can be added separately if required.
> > > >
> > > > [1] git log 0ec57cfa721fbd36b4c4c0d9ccc5d78a78f7fa35..HEAD 
> > > > drivers/virtio/
> > > >
> > > >
> > > > This can be reproduced with a simple mount, run qemu with some -virtfs
> > > > argument and `mount -t 9p -o debug=65535 tag mountpoint` will hang after
> > > > these messages:
> > > > 9pnet: -- p9_virtio_request (83): 9p debug: virtio request
> > > > 9pnet: -- p9_virtio_request (83): virtio request kicked
> > > >
> > > > So I suspect we're just not getting a callback.
> > >
> > > I think so. The patch assumes the driver will call
> > > virtqueue_disable/enable_cb() which is not the case of the 9p driver.
> > >
> > > So after the first interrupt, event_triggered will be set to true forever.
> > >
> > > Thanks
> > >
> >
> > Hi: Wang
> >
> > Yes,  This patch assumes that all virtio-related drivers will call
> > virtqueue_disable/enable_cb().
> > Thank you for raising this issue.
> >
> > It seems that napi_tx is only related to virtue_net. I'm thinking if
> > we need to refactor
> > napi_tx instead of implementing it inside virtio_ring.
> 
> We can hear from others.
> 
> I think it's better not to workaround virtio_ring issues in a specific
> driver. It might just add more hacks. We should correctly set
> VRING_AVAIL_F_NO_INTERRUPT,
> 
> Do you think the following might work (not even a compile test)?


ok but:

> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 41144b5246a8..12f4efb6dc54 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -852,16 +852,16 @@ static void virtqueue_disable_cb_split(struct
> virtqueue *_vq)
>  {
> struct vring_virtqueue *vq = to_vvq(_vq);
> 
> -   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> -   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -   if (vq->event)
> -   /* TODO: this is a hack. Figure out a cleaner
> value to write. */
> -   vring_used_event(>split.vring) = 0x0;
> -   else
> -   vq->split.vring.avail->flags =
> -   cpu_to_virtio16(_vq->vdev,
> -   vq->split.avail_flags_shadow);
> -   }
> +   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> +   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +
> +   if (vq->event && !vq->event_triggered)
> +   /* TODO: this is a hack. Figure out a cleaner value to write. 
> */
> +   vring_used_event(>split.vring) = 0x0;
> +   else
> +   vq->split.vring.avail->flags =
> +   cpu_to_virtio16(_vq->vdev,
> +   vq->split.avail_flags_shadow);
>  }
> 
>  static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
> @@ -1697,8 +1697,10 @@ static void virtqueue_disable_cb_packed(struct
> virtqueue *_vq)
>  {
> struct vring_virtqueue *vq = to_vvq(_vq);
> 
> -   if (vq->packed.event_flags_shadow != VRING_PACKED_EVENT_FLAG_DISABLE) 
> {
> +   if (!(vq->packed.event_flags_shadow & 
> VRING_PACKED_EVENT_FLAG_DISABLE))
> vq->packed.event_flags_shadow = 
> VRING_PACKED_EVENT_FLAG_DISABLE;
> +
> +   if (vq->event_triggered)

I don't get this one. if event_triggered why do you still want to
write into driver flags? it won't trigger again anytime soon.

> vq->packed.vring.driver->flags =
> cpu_to_le16(vq->packed.event_flags_shadow);
> }
> @@ -2330,12 +2332,6 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>  {
> struct vring_virtqueue *vq =

Re: [PATCH v2] virtio_ring: don't update event idx on get_buf

2023-03-28 Thread Michael S. Tsirkin

On Sat, Mar 25, 2023 at 06:56:33PM +0800, Albert Huang wrote:
> in virtio_net, if we disable the napi_tx, when we triger a tx interrupt,
> the vq->event_triggered will be set to true. It will no longer be set to
> false. Unless we explicitly call virtqueue_enable_cb_delayed or
> virtqueue_enable_cb_prepare.
> 
> If we disable the napi_tx, it will only be called when the tx ring
> buffer is relatively small.
> 
> Because event_triggered is true. Therefore, VRING_AVAIL_F_NO_INTERRUPT or
> VRING_PACKED_EVENT_FLAG_DISABLE will not be set. So we update
> vring_used_event(>split.vring) or vq->packed.vring.driver->off_wrap
> every time we call virtqueue_get_buf_ctx. This will bring more interruptions.
> 
> To summarize:
> 1) event_triggered was set to true in vring_interrupt()
> 2) after this nothing will happen for virtqueue_disable_cb() so
>VRING_AVAIL_F_NO_INTERRUPT is not set in avail_flags_shadow
> 3) virtqueue_get_buf_ctx_split() will still think the cb is enabled
>then it tries to publish new event
> 
> To fix, if event_triggered is set to true, do not update
> vring_used_event(>split.vring) or vq->packed.vring.driver->off_wrap
> 
> Tested with iperf:
> iperf3 tcp stream:
> vm1 -> vm2
> vm2 just receives tcp data stream from vm1, and sends the ack to vm1,
> there are many tx interrupts in vm2.
> but without event_triggered there are just a few tx interrupts.
> 
> Fixes: 8d622d21d248 ("virtio: fix up virtio_disable_cb")
> Signed-off-by: Albert Huang 
> Message-Id: <20230321085953.24949-1-huangjie.alb...@bytedance.com>
> Signed-off-by: Michael S. Tsirkin 

dropped for now due to breakage of 9p.
Pls post v3 with the fix you suggested (moving the check
of vq->event_triggered).

> ---
>  drivers/virtio/virtio_ring.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cbeeea1b0439..1c36fa477966 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -914,7 +914,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
> *_vq,
>   /* If we expect an interrupt for the next entry, tell host
>* by writing event index and flush out the write before
>* the read in the next get_buf call. */
> - if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> + if (unlikely(!(vq->split.avail_flags_shadow & 
> VRING_AVAIL_F_NO_INTERRUPT) &&
> +  !vq->event_triggered))
>   virtio_store_mb(vq->weak_barriers,
>   _used_event(>split.vring),
>   cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> @@ -1744,7 +1745,8 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>* by writing event index and flush out the write before
>* the read in the next get_buf call.
>*/
> - if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DESC)
> + if (unlikely(vq->packed.event_flags_shadow == 
> VRING_PACKED_EVENT_FLAG_DESC &&
> +  !vq->event_triggered))
>   virtio_store_mb(vq->weak_barriers,
>   >packed.vring.driver->off_wrap,
>   cpu_to_le16(vq->last_used_idx));
> -- 
> 2.37.0 (Apple Git-136)

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [External] Re: 9p regression (Was: [PATCH v2] virtio_ring: don't update event idx on get_buf)

2023-03-28 Thread Michael S. Tsirkin

On Tue, Mar 28, 2023 at 05:09:19PM +0800, 黄杰 wrote:
> Jason Wang  于2023年3月28日周二 11:40写道：
> >
> > On Tue, Mar 28, 2023 at 11:09 AM 黄杰  wrote:
> > >
> > > Jason Wang  于2023年3月28日周二 10:59写道：
> > > >
> > > > On Tue, Mar 28, 2023 at 10:13 AM Dominique Martinet
> > > >  wrote:
> > > > >
> > > > > Hi Michael, Albert,
> > > > >
> > > > > Albert Huang wrote on Sat, Mar 25, 2023 at 06:56:33PM +0800:
> > > > > > in virtio_net, if we disable the napi_tx, when we triger a tx 
> > > > > > interrupt,
> > > > > > the vq->event_triggered will be set to true. It will no longer be 
> > > > > > set to
> > > > > > false. Unless we explicitly call virtqueue_enable_cb_delayed or
> > > > > > virtqueue_enable_cb_prepare.
> > > > >
> > > > > This patch (commited as 35395770f803 ("virtio_ring: don't update event
> > > > > idx on get_buf") in next-20230327 apparently breaks 9p, as reported by
> > > > > Luis in 
> > > > > https://lkml.kernel.org/r/zci+7wg5oclsl...@bombadil.infradead.org
> > > > >
> > > > > I've just hit had a look at recent patches[1] and reverted this to 
> > > > > test
> > > > > and I can mount again, so I'm pretty sure this is the culprit, but I
> > > > > didn't look at the content at all yet so cannot advise further.
> > > > > It might very well be that we need some extra handling for 9p
> > > > > specifically that can be added separately if required.
> > > > >
> > > > > [1] git log 0ec57cfa721fbd36b4c4c0d9ccc5d78a78f7fa35..HEAD 
> > > > > drivers/virtio/
> > > > >
> > > > >
> > > > > This can be reproduced with a simple mount, run qemu with some -virtfs
> > > > > argument and `mount -t 9p -o debug=65535 tag mountpoint` will hang 
> > > > > after
> > > > > these messages:
> > > > > 9pnet: -- p9_virtio_request (83): 9p debug: virtio request
> > > > > 9pnet: -- p9_virtio_request (83): virtio request kicked
> > > > >
> > > > > So I suspect we're just not getting a callback.
> > > >
> > > > I think so. The patch assumes the driver will call
> > > > virtqueue_disable/enable_cb() which is not the case of the 9p driver.
> > > >
> > > > So after the first interrupt, event_triggered will be set to true 
> > > > forever.
> > > >
> > > > Thanks
> > > >
> > >
> > > Hi: Wang
> > >
> > > Yes,  This patch assumes that all virtio-related drivers will call
> > > virtqueue_disable/enable_cb().
> > > Thank you for raising this issue.
> > >
> > > It seems that napi_tx is only related to virtue_net. I'm thinking if
> > > we need to refactor
> > > napi_tx instead of implementing it inside virtio_ring.
> >
> > We can hear from others.
> >
> > I think it's better not to workaround virtio_ring issues in a specific
> > driver. It might just add more hacks. We should correctly set
> > VRING_AVAIL_F_NO_INTERRUPT,
> >
> > Do you think the following might work (not even a compile test)?
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 41144b5246a8..12f4efb6dc54 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -852,16 +852,16 @@ static void virtqueue_disable_cb_split(struct
> > virtqueue *_vq)
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > -   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > -   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -   if (vq->event)
> > -   /* TODO: this is a hack. Figure out a cleaner
> > value to write. */
> > -   vring_used_event(>split.vring) = 0x0;
> > -   else
> > -   vq->split.vring.avail->flags =
> > -   cpu_to_virtio16(_vq->vdev,
> > -   
> > vq->split.avail_flags_shadow);
> > -   }
> > +   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> > +   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > +
> > +   if (vq->event && !vq->event_triggered)
> > +   /* TODO: this is a hack. Figure out a cleaner value to 
> > write. */
> > +   vring_used_event(>split.vring) = 0x0;
> > +   else
> > +   vq->split.vring.avail->flags =
> > +   cpu_to_virtio16(_vq->vdev,
> > +   vq->split.avail_flags_shadow);
> >  }
> >
> >  static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue 
> > *_vq)
> > @@ -1697,8 +1697,10 @@ static void virtqueue_disable_cb_packed(struct
> > virtqueue *_vq)
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > -   if (vq->packed.event_flags_shadow != 
> > VRING_PACKED_EVENT_FLAG_DISABLE) {
> > +   if (!(vq->packed.event_flags_shadow & 
> > VRING_PACKED_EVENT_FLAG_DISABLE))
> > vq->packed.event_flags_shadow = 
> > VRING_PACKED_EVENT_FLAG_DISABLE;
> > +
> > +   if (vq->event_triggered)
> > vq->packed.vring.driver->flags =
> >

Re: [External] Re: 9p regression (Was: [PATCH v2] virtio_ring: don't update event idx on get_buf)

2023-03-28 Thread Michael S. Tsirkin

On Tue, Mar 28, 2023 at 11:39:59AM +0800, Jason Wang wrote:
> On Tue, Mar 28, 2023 at 11:09 AM 黄杰  wrote:
> >
> > Jason Wang  于2023年3月28日周二 10:59写道：
> > >
> > > On Tue, Mar 28, 2023 at 10:13 AM Dominique Martinet
> > >  wrote:
> > > >
> > > > Hi Michael, Albert,
> > > >
> > > > Albert Huang wrote on Sat, Mar 25, 2023 at 06:56:33PM +0800:
> > > > > in virtio_net, if we disable the napi_tx, when we triger a tx 
> > > > > interrupt,
> > > > > the vq->event_triggered will be set to true. It will no longer be set 
> > > > > to
> > > > > false. Unless we explicitly call virtqueue_enable_cb_delayed or
> > > > > virtqueue_enable_cb_prepare.
> > > >
> > > > This patch (commited as 35395770f803 ("virtio_ring: don't update event
> > > > idx on get_buf") in next-20230327 apparently breaks 9p, as reported by
> > > > Luis in 
> > > > https://lkml.kernel.org/r/zci+7wg5oclsl...@bombadil.infradead.org
> > > >
> > > > I've just hit had a look at recent patches[1] and reverted this to test
> > > > and I can mount again, so I'm pretty sure this is the culprit, but I
> > > > didn't look at the content at all yet so cannot advise further.
> > > > It might very well be that we need some extra handling for 9p
> > > > specifically that can be added separately if required.
> > > >
> > > > [1] git log 0ec57cfa721fbd36b4c4c0d9ccc5d78a78f7fa35..HEAD 
> > > > drivers/virtio/
> > > >
> > > >
> > > > This can be reproduced with a simple mount, run qemu with some -virtfs
> > > > argument and `mount -t 9p -o debug=65535 tag mountpoint` will hang after
> > > > these messages:
> > > > 9pnet: -- p9_virtio_request (83): 9p debug: virtio request
> > > > 9pnet: -- p9_virtio_request (83): virtio request kicked
> > > >
> > > > So I suspect we're just not getting a callback.
> > >
> > > I think so. The patch assumes the driver will call
> > > virtqueue_disable/enable_cb() which is not the case of the 9p driver.
> > >
> > > So after the first interrupt, event_triggered will be set to true forever.
> > >
> > > Thanks
> > >
> >
> > Hi: Wang
> >
> > Yes,  This patch assumes that all virtio-related drivers will call
> > virtqueue_disable/enable_cb().
> > Thank you for raising this issue.
> >
> > It seems that napi_tx is only related to virtue_net. I'm thinking if
> > we need to refactor
> > napi_tx instead of implementing it inside virtio_ring.
> 
> We can hear from others.
> 
> I think it's better not to workaround virtio_ring issues in a specific
> driver. It might just add more hacks. We should correctly set
> VRING_AVAIL_F_NO_INTERRUPT,

I am still stuck trying to understand why we don't set it.
How does event_triggered end up getting set without
event index support?

Thanks!

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] [PATCH 1/2] virtio-blk: migrate to the latest patchset version

2023-03-28 Thread Stefan Hajnoczi

On Mon, Mar 27, 2023 at 10:29:27PM -0400, Dmitry Fomichev wrote:
> The merged patch series to support zoned block devices in virtio-blk
> is not the most up to date version. The merged patch can be found at
> 
> https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomic...@wdc.com/
> 
> , but the latest and reviewed version is
> 
> https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomic...@wdc.com/
> 
> The differences between the two are mostly cleanups, but there is one
> change that is very important in terms of compatibility with the
> approved virtio-zbd specification.
> 
> Before it was approved, the OASIS virtio spec had a change in
> VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the
> current virtio-blk driver code. In the running code, the status is
> the first byte of the in-header that is followed by some pad bytes
> and the u64 that carries the sector at which the data has been written
> to the zone back to the driver, aka the append sector.
> 
> This layout turned out to be problematic for implementing in QEMU and
> the request status byte has been eventually made the last byte of the
> in-header. The current code doesn't expect that and this causes the
> append sector value always come as zero to the block layer. This needs
> to be fixed ASAP.
> 
> Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Fomichev 
> ---
>  drivers/block/virtio_blk.c  | 238 +---
>  include/uapi/linux/virtio_blk.h |  18 +--
>  2 files changed, 165 insertions(+), 91 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] [PATCH 2/2] virtio-blk: fix ZBD probe in kernels without ZBD support

2023-03-28 Thread Stefan Hajnoczi

On Mon, Mar 27, 2023 at 10:29:28PM -0400, Dmitry Fomichev wrote:
> When the kernel is built without support for zoned block devices,
> virtio-blk probe needs to error out any host-managed device scans
> to prevent such devices from appearing in the system as non-zoned.
> The current virtio-blk code simply bypasses all ZBD checks if
> CONFIG_BLK_DEV_ZONED is not defined and this leads to host-managed
> block devices being presented as non-zoned in the OS. This is one of
> the main problems this patch series is aimed to fix.
> 
> In this patch, make VIRTIO_BLK_F_ZONED feature defined even when
> CONFIG_BLK_DEV_ZONED is not. This change makes the code compliant with
> the voted revision of virtio-blk ZBD spec. Modify the probe code to
> look at the situation when VIRTIO_BLK_F_ZONED is negotiated in a kernel
> that is built without ZBD support. In this case, the code checks
> the zoned model of the device and fails the probe is the device
> is host-managed.
> 
> The patch also adds the comment to clarify that the call to perform
> the zoned device probe is correctly placed after virtio_device ready().
> 
> Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Fomichev 
> ---
>  drivers/block/virtio_blk.c | 34 ++
>  1 file changed, 18 insertions(+), 16 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [External] Re: 9p regression (Was: [PATCH v2] virtio_ring: don't update event idx on get_buf)

2023-03-28 Thread Michael S. Tsirkin

On Tue, Mar 28, 2023 at 05:09:19PM +0800, 黄杰 wrote:
> Jason Wang  于2023年3月28日周二 11:40写道：
> >
> > On Tue, Mar 28, 2023 at 11:09 AM 黄杰  wrote:
> > >
> > > Jason Wang  于2023年3月28日周二 10:59写道：
> > > >
> > > > On Tue, Mar 28, 2023 at 10:13 AM Dominique Martinet
> > > >  wrote:
> > > > >
> > > > > Hi Michael, Albert,
> > > > >
> > > > > Albert Huang wrote on Sat, Mar 25, 2023 at 06:56:33PM +0800:
> > > > > > in virtio_net, if we disable the napi_tx, when we triger a tx 
> > > > > > interrupt,
> > > > > > the vq->event_triggered will be set to true. It will no longer be 
> > > > > > set to
> > > > > > false. Unless we explicitly call virtqueue_enable_cb_delayed or
> > > > > > virtqueue_enable_cb_prepare.
> > > > >
> > > > > This patch (commited as 35395770f803 ("virtio_ring: don't update event
> > > > > idx on get_buf") in next-20230327 apparently breaks 9p, as reported by
> > > > > Luis in 
> > > > > https://lkml.kernel.org/r/zci+7wg5oclsl...@bombadil.infradead.org
> > > > >
> > > > > I've just hit had a look at recent patches[1] and reverted this to 
> > > > > test
> > > > > and I can mount again, so I'm pretty sure this is the culprit, but I
> > > > > didn't look at the content at all yet so cannot advise further.
> > > > > It might very well be that we need some extra handling for 9p
> > > > > specifically that can be added separately if required.
> > > > >
> > > > > [1] git log 0ec57cfa721fbd36b4c4c0d9ccc5d78a78f7fa35..HEAD 
> > > > > drivers/virtio/
> > > > >
> > > > >
> > > > > This can be reproduced with a simple mount, run qemu with some -virtfs
> > > > > argument and `mount -t 9p -o debug=65535 tag mountpoint` will hang 
> > > > > after
> > > > > these messages:
> > > > > 9pnet: -- p9_virtio_request (83): 9p debug: virtio request
> > > > > 9pnet: -- p9_virtio_request (83): virtio request kicked
> > > > >
> > > > > So I suspect we're just not getting a callback.
> > > >
> > > > I think so. The patch assumes the driver will call
> > > > virtqueue_disable/enable_cb() which is not the case of the 9p driver.
> > > >
> > > > So after the first interrupt, event_triggered will be set to true 
> > > > forever.
> > > >
> > > > Thanks
> > > >
> > >
> > > Hi: Wang
> > >
> > > Yes,  This patch assumes that all virtio-related drivers will call
> > > virtqueue_disable/enable_cb().
> > > Thank you for raising this issue.
> > >
> > > It seems that napi_tx is only related to virtue_net. I'm thinking if
> > > we need to refactor
> > > napi_tx instead of implementing it inside virtio_ring.
> >
> > We can hear from others.
> >
> > I think it's better not to workaround virtio_ring issues in a specific
> > driver. It might just add more hacks. We should correctly set
> > VRING_AVAIL_F_NO_INTERRUPT,
> >
> > Do you think the following might work (not even a compile test)?
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 41144b5246a8..12f4efb6dc54 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -852,16 +852,16 @@ static void virtqueue_disable_cb_split(struct
> > virtqueue *_vq)
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > -   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > -   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -   if (vq->event)
> > -   /* TODO: this is a hack. Figure out a cleaner
> > value to write. */
> > -   vring_used_event(>split.vring) = 0x0;
> > -   else
> > -   vq->split.vring.avail->flags =
> > -   cpu_to_virtio16(_vq->vdev,
> > -   
> > vq->split.avail_flags_shadow);
> > -   }
> > +   if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> > +   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > +
> > +   if (vq->event && !vq->event_triggered)
> > +   /* TODO: this is a hack. Figure out a cleaner value to 
> > write. */
> > +   vring_used_event(>split.vring) = 0x0;
> > +   else
> > +   vq->split.vring.avail->flags =
> > +   cpu_to_virtio16(_vq->vdev,
> > +   vq->split.avail_flags_shadow);
> >  }
> >
> >  static unsigned int virtqueue_enable_cb_prepare_split(struct virtqueue 
> > *_vq)
> > @@ -1697,8 +1697,10 @@ static void virtqueue_disable_cb_packed(struct
> > virtqueue *_vq)
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > -   if (vq->packed.event_flags_shadow != 
> > VRING_PACKED_EVENT_FLAG_DISABLE) {
> > +   if (!(vq->packed.event_flags_shadow & 
> > VRING_PACKED_EVENT_FLAG_DISABLE))
> > vq->packed.event_flags_shadow = 
> > VRING_PACKED_EVENT_FLAG_DISABLE;
> > +
> > +   if (vq->event_triggered)
> > vq->packed.vring.driver->flags =
> >

[PATCH net-next 6/8] virtio_net: auto release xdp shinfo

2023-03-28 Thread Xuan Zhuo

virtnet_build_xdp_buff_mrg() and virtnet_xdp_handler() auto
release xdp shinfo then the caller no need to careful the xdp shinfo.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a3f2bcb3db27..136131a7868a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,14 +833,14 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
struct xdp_buff *xdp,
stats->xdp_tx++;
xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf))
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
 
err = virtnet_xdp_xmit(dev, 1, , 0);
if (unlikely(!err)) {
xdp_return_frame_rx_napi(xdpf);
} else if (unlikely(err < 0)) {
trace_xdp_exception(dev, xdp_prog, act);
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
}
 
*xdp_xmit |= VIRTIO_XDP_TX;
@@ -850,7 +850,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
struct xdp_buff *xdp,
stats->xdp_redirects++;
err = xdp_do_redirect(dev, xdp, xdp_prog);
if (err)
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
 
*xdp_xmit |= VIRTIO_XDP_REDIR;
return VIRTNET_XDP_RES_CONSUMED;
@@ -862,8 +862,12 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
struct xdp_buff *xdp,
trace_xdp_exception(dev, xdp_prog, act);
fallthrough;
case XDP_DROP:
-   return VIRTNET_XDP_RES_DROP;
+   goto drop;
}
+
+drop:
+   put_xdp_frags(xdp);
+   return VIRTNET_XDP_RES_DROP;
 }
 
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
@@ -1199,7 +1203,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
 dev->name, *num_buf,
 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
dev->stats.rx_length_errors++;
-   return -EINVAL;
+   goto err;
}
 
stats->bytes += len;
@@ -1218,7 +1222,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 dev->name, len, (unsigned long)(truesize - 
room));
dev->stats.rx_length_errors++;
-   return -EINVAL;
+   goto err;
}
 
frag = >frags[shinfo->nr_frags++];
@@ -1233,6 +1237,10 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
 
*xdp_frags_truesize = xdp_frags_truesz;
return 0;
+
+err:
+   put_xdp_frags(xdp);
+   return -EINVAL;
 }
 
 static void *mergeable_xdp_prepare(struct virtnet_info *vi,
@@ -1361,7 +1369,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
err = virtnet_build_xdp_buff_mrg(dev, vi, rq, , data, len, 
frame_sz,
 _buf, _frags_truesz, 
stats);
if (unlikely(err))
-   goto err_xdp_frags;
+   goto err_xdp;
 
act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
 
@@ -1369,7 +1377,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
case VIRTNET_XDP_RES_PASS:
head_skb = build_skb_from_xdp_buff(dev, vi, , 
xdp_frags_truesz);
if (unlikely(!head_skb))
-   goto err_xdp_frags;
+   goto err_xdp;
 
rcu_read_unlock();
return head_skb;
@@ -1379,11 +1387,8 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto xdp_xmit;
 
case VIRTNET_XDP_RES_DROP:
-   goto err_xdp_frags;
+   goto err_xdp;
}
-err_xdp_frags:
-   put_xdp_frags();
-   goto err_xdp;
}
rcu_read_unlock();
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net-next 7/8] virtio_net: introduce receive_mergeable_xdp()

2023-03-28 Thread Xuan Zhuo

The purpose of this patch is to simplify the receive_mergeable().
Separate all the logic of XDP into a function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 128 +++
 1 file changed, 76 insertions(+), 52 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 136131a7868a..c8978d8d8adb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1316,6 +1316,63 @@ static void *mergeable_xdp_prepare(struct virtnet_info 
*vi,
return page_address(xdp_page) + VIRTIO_XDP_HEADROOM;
 }
 
+static struct sk_buff *receive_mergeable_xdp(struct net_device *dev,
+struct virtnet_info *vi,
+struct receive_queue *rq,
+struct bpf_prog *xdp_prog,
+void *buf,
+void *ctx,
+unsigned int len,
+unsigned int *xdp_xmit,
+struct virtnet_rq_stats *stats)
+{
+   struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+   int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
+   struct page *page = virt_to_head_page(buf);
+   int offset = buf - page_address(page);
+   unsigned int xdp_frags_truesz = 0;
+   struct sk_buff *head_skb;
+   unsigned int frame_sz;
+   struct xdp_buff xdp;
+   void *data;
+   u32 act;
+   int err;
+
+   data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
+offset, , hdr);
+   if (!data)
+   goto err_xdp;
+
+   err = virtnet_build_xdp_buff_mrg(dev, vi, rq, , data, len, frame_sz,
+_buf, _frags_truesz, stats);
+   if (unlikely(err))
+   goto err_xdp;
+
+   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
+
+   switch (act) {
+   case VIRTNET_XDP_RES_PASS:
+   head_skb = build_skb_from_xdp_buff(dev, vi, , 
xdp_frags_truesz);
+   if (unlikely(!head_skb))
+   goto err_xdp;
+   return head_skb;
+
+   case VIRTNET_XDP_RES_CONSUMED:
+   return NULL;
+
+   case VIRTNET_XDP_RES_DROP:
+   break;
+   }
+
+err_xdp:
+   put_page(page);
+   mergeable_buf_free(rq, num_buf, dev, stats);
+
+   stats->xdp_drops++;
+   stats->drops++;
+   return NULL;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -1325,21 +1382,22 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
 unsigned int *xdp_xmit,
 struct virtnet_rq_stats *stats)
 {
-   struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
-   int num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
-   struct page *page = virt_to_head_page(buf);
-   int offset = buf - page_address(page);
-   struct sk_buff *head_skb, *curr_skb;
-   struct bpf_prog *xdp_prog;
unsigned int truesize = mergeable_ctx_to_truesize(ctx);
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
-   unsigned int frame_sz;
-   int err;
+   struct virtio_net_hdr_mrg_rxbuf *hdr;
+   struct sk_buff *head_skb, *curr_skb;
+   struct bpf_prog *xdp_prog;
+   struct page *page;
+   int num_buf;
+   int offset;
 
head_skb = NULL;
stats->bytes += len - vi->hdr_len;
+   hdr = buf;
+   num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
+   page = virt_to_head_page(buf);
 
if (unlikely(len > truesize - room)) {
pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
@@ -1348,51 +1406,21 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_skb;
}
 
-   if (likely(!vi->xdp_enabled)) {
-   xdp_prog = NULL;
-   goto skip_xdp;
-   }
-
-   rcu_read_lock();
-   xdp_prog = rcu_dereference(rq->xdp_prog);
-   if (xdp_prog) {
-   unsigned int xdp_frags_truesz = 0;
-   struct xdp_buff xdp;
-   void *data;
-   u32 act;
-
-   data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
-offset, , hdr);
-   if (!data)
-   goto err_xdp;
-
-   err = virtnet_build_xdp_buff_mrg(dev, vi, rq, , data, len, 
frame_sz,
-

[PATCH net-next 8/8] virtio_net: introduce receive_small_xdp()

2023-03-28 Thread Xuan Zhuo

The purpose of this patch is to simplify the receive_small().
Separate all the logic of XDP of small into a function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 168 +++
 1 file changed, 100 insertions(+), 68 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c8978d8d8adb..37cd0bf97a16 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -939,6 +939,99 @@ static struct page *xdp_linearize_page(struct 
receive_queue *rq,
return NULL;
 }
 
+static struct sk_buff *receive_small_xdp(struct net_device *dev,
+struct virtnet_info *vi,
+struct receive_queue *rq,
+struct bpf_prog *xdp_prog,
+void *buf,
+void *ctx,
+unsigned int len,
+unsigned int *xdp_xmit,
+struct virtnet_rq_stats *stats)
+{
+   unsigned int xdp_headroom = (unsigned long)ctx;
+   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   unsigned int headroom = vi->hdr_len + header_offset;
+   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
+   struct page *page = virt_to_head_page(buf);
+   struct page *xdp_page;
+   unsigned int buflen;
+   struct xdp_buff xdp;
+   struct sk_buff *skb;
+   unsigned int delta = 0;
+   unsigned int metasize = 0;
+   void *orig_data;
+   u32 act;
+
+   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+   if (unlikely(hdr->hdr.gso_type))
+   goto err_xdp;
+
+   if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
+   int offset = buf - page_address(page) + header_offset;
+   unsigned int tlen = len + vi->hdr_len;
+   int num_buf = 1;
+
+   xdp_headroom = virtnet_get_headroom(vi);
+   header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   headroom = vi->hdr_len + header_offset;
+   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+   xdp_page = xdp_linearize_page(rq, _buf, page,
+ offset, header_offset,
+ );
+   if (!xdp_page)
+   goto err_xdp;
+
+   buf = page_address(xdp_page);
+   put_page(page);
+   page = xdp_page;
+   }
+
+   xdp_init_buff(, buflen, >xdp_rxq);
+   xdp_prepare_buff(, buf + VIRTNET_RX_PAD + vi->hdr_len,
+xdp_headroom, len, true);
+   orig_data = xdp.data;
+
+   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
+
+   switch (act) {
+   case VIRTNET_XDP_RES_PASS:
+   /* Recalculate length in case bpf program changed it */
+   delta = orig_data - xdp.data;
+   len = xdp.data_end - xdp.data;
+   metasize = xdp.data - xdp.data_meta;
+   break;
+
+   case VIRTNET_XDP_RES_CONSUMED:
+   goto xdp_xmit;
+
+   case VIRTNET_XDP_RES_DROP:
+   goto err_xdp;
+   }
+
+   skb = build_skb(buf, buflen);
+   if (!skb)
+   goto err;
+
+   skb_reserve(skb, headroom - delta);
+   skb_put(skb, len);
+   if (metasize)
+   skb_metadata_set(skb, metasize);
+
+   return skb;
+
+err_xdp:
+   stats->xdp_drops++;
+err:
+   stats->drops++;
+   put_page(page);
+xdp_xmit:
+   return NULL;
+}
+
 static struct sk_buff *receive_small(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -949,15 +1042,11 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
 {
struct sk_buff *skb;
struct bpf_prog *xdp_prog;
-   unsigned int xdp_headroom = (unsigned long)ctx;
-   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   unsigned int header_offset = VIRTNET_RX_PAD;
unsigned int headroom = vi->hdr_len + header_offset;
unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
  SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
struct page *page = virt_to_head_page(buf);
-   unsigned int delta = 0;
-   struct page *xdp_page;
-   unsigned int metasize = 0;
 
len -= vi->hdr_len;
stats->bytes += len;
@@ -977,57 +1066,9 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
rcu_read_lock();
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
-

[PATCH net-next 2/8] virtio_net: mergeable xdp: introduce mergeable_xdp_prepare

2023-03-28 Thread Xuan Zhuo

Separating the logic of preparation for xdp from receive_mergeable.

The purpose of this is to simplify the logic of execution of XDP.

The main logic here is that when headroom is insufficient, we need to
allocate a new page and calculate offset. It should be noted that if
there is new page, the variable page will refer to the new page.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 135 ++-
 1 file changed, 77 insertions(+), 58 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d2bf1ce0730..bb426958cdd4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1162,6 +1162,79 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
return 0;
 }
 
+static void *mergeable_xdp_prepare(struct virtnet_info *vi,
+  struct receive_queue *rq,
+  struct bpf_prog *xdp_prog,
+  void *ctx,
+  unsigned int *frame_sz,
+  int *num_buf,
+  struct page **page,
+  int offset,
+  unsigned int *len,
+  struct virtio_net_hdr_mrg_rxbuf *hdr)
+{
+   unsigned int truesize = mergeable_ctx_to_truesize(ctx);
+   unsigned int headroom = mergeable_ctx_to_headroom(ctx);
+   struct page *xdp_page;
+   unsigned int xdp_room;
+
+   /* Transient failure which in theory could occur if
+* in-flight packets from before XDP was enabled reach
+* the receive path after XDP is loaded.
+*/
+   if (unlikely(hdr->hdr.gso_type))
+   return NULL;
+
+   /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
+* with headroom may add hole in truesize, which
+* make their length exceed PAGE_SIZE. So we disabled the
+* hole mechanism for xdp. See add_recvbuf_mergeable().
+*/
+   *frame_sz = truesize;
+
+   /* This happens when headroom is not enough because
+* of the buffer was prefilled before XDP is set.
+* This should only happen for the first several packets.
+* In fact, vq reset can be used here to help us clean up
+* the prefilled buffers, but many existing devices do not
+* support it, and we don't want to bother users who are
+* using xdp normally.
+*/
+   if (!xdp_prog->aux->xdp_has_frags &&
+   (*num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
+   /* linearize data for XDP */
+   xdp_page = xdp_linearize_page(rq, num_buf,
+ *page, offset,
+ VIRTIO_XDP_HEADROOM,
+ len);
+
+   if (!xdp_page)
+   return NULL;
+   } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
+   xdp_room = SKB_DATA_ALIGN(VIRTIO_XDP_HEADROOM +
+ sizeof(struct skb_shared_info));
+   if (*len + xdp_room > PAGE_SIZE)
+   return NULL;
+
+   xdp_page = alloc_page(GFP_ATOMIC);
+   if (!xdp_page)
+   return NULL;
+
+   memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+  page_address(*page) + offset, *len);
+   } else {
+   return page_address(*page) + offset;
+   }
+
+   *frame_sz = PAGE_SIZE;
+
+   put_page(*page);
+
+   *page = xdp_page;
+
+   return page_address(xdp_page) + VIRTIO_XDP_HEADROOM;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -1181,7 +1254,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
unsigned int headroom = mergeable_ctx_to_headroom(ctx);
unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
-   unsigned int frame_sz, xdp_room;
+   unsigned int frame_sz;
int err;
 
head_skb = NULL;
@@ -1211,65 +1284,11 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
u32 act;
int i;
 
-   /* Transient failure which in theory could occur if
-* in-flight packets from before XDP was enabled reach
-* the receive path after XDP is loaded.
-*/
-   if (unlikely(hdr->hdr.gso_type))
+   data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
+offset, , hdr);
+   if (!data)
goto err_xdp;
 
-

[PATCH net-next 5/8] virtio_net: separate the logic of freeing the rest mergeable buf

2023-03-28 Thread Xuan Zhuo

This patch introduce a new function that frees the rest mergeable buf.
The subsequent patch will reuse this function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 09aed60e2f51..a3f2bcb3db27 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1076,6 +1076,28 @@ static struct sk_buff *receive_big(struct net_device 
*dev,
return NULL;
 }
 
+static void mergeable_buf_free(struct receive_queue *rq, int num_buf,
+  struct net_device *dev,
+  struct virtnet_rq_stats *stats)
+{
+   struct page *page;
+   void *buf;
+   int len;
+
+   while (num_buf-- > 1) {
+   buf = virtqueue_get_buf(rq->vq, );
+   if (unlikely(!buf)) {
+   pr_debug("%s: rx error: %d buffers missing\n",
+dev->name, num_buf);
+   dev->stats.rx_length_errors++;
+   break;
+   }
+   stats->bytes += len;
+   page = virt_to_head_page(buf);
+   put_page(page);
+   }
+}
+
 /* Why not use xdp_build_skb_from_frame() ?
  * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
  * virtio-net there are 2 points that do not match its requirements:
@@ -1436,18 +1458,8 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
stats->xdp_drops++;
 err_skb:
put_page(page);
-   while (num_buf-- > 1) {
-   buf = virtqueue_get_buf(rq->vq, );
-   if (unlikely(!buf)) {
-   pr_debug("%s: rx error: %d buffers missing\n",
-dev->name, num_buf);
-   dev->stats.rx_length_errors++;
-   break;
-   }
-   stats->bytes += len;
-   page = virt_to_head_page(buf);
-   put_page(page);
-   }
+   mergeable_buf_free(rq, num_buf, dev, stats);
+
 err_buf:
stats->drops++;
dev_kfree_skb(head_skb);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net-next 4/8] virtio_net: separate the logic of freeing xdp shinfo

2023-03-28 Thread Xuan Zhuo

This patch introduce a new function that releases the
xdp shinfo. The subsequent patch will reuse this function.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 72b9d6ee4024..09aed60e2f51 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -798,6 +798,21 @@ static int virtnet_xdp_xmit(struct net_device *dev,
return ret;
 }
 
+static void put_xdp_frags(struct xdp_buff *xdp)
+{
+   struct skb_shared_info *shinfo;
+   struct page *xdp_page;
+   int i;
+
+   if (xdp_buff_has_frags(xdp)) {
+   shinfo = xdp_get_shared_info_from_buff(xdp);
+   for (i = 0; i < shinfo->nr_frags; i++) {
+   xdp_page = skb_frag_page(>frags[i]);
+   put_page(xdp_page);
+   }
+   }
+}
+
 static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
   struct net_device *dev,
   unsigned int *xdp_xmit,
@@ -1312,12 +1327,9 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
unsigned int xdp_frags_truesz = 0;
-   struct skb_shared_info *shinfo;
-   struct page *xdp_page;
struct xdp_buff xdp;
void *data;
u32 act;
-   int i;
 
data = mergeable_xdp_prepare(vi, rq, xdp_prog, ctx, _sz, 
_buf, ,
 offset, , hdr);
@@ -1348,14 +1360,7 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_xdp_frags;
}
 err_xdp_frags:
-   if (xdp_buff_has_frags()) {
-   shinfo = xdp_get_shared_info_from_buff();
-   for (i = 0; i < shinfo->nr_frags; i++) {
-   xdp_page = skb_frag_page(>frags[i]);
-   put_page(xdp_page);
-   }
-   }
-
+   put_xdp_frags();
goto err_xdp;
}
rcu_read_unlock();
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-03-28 Thread Xuan Zhuo

At present, we have two similar logic to perform the XDP prog.

Therefore, this PATCH separates the code of executing XDP, which is
conducive to later maintenance.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 142 +--
 1 file changed, 75 insertions(+), 67 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bb426958cdd4..72b9d6ee4024 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -301,6 +301,15 @@ struct padded_vnet_hdr {
char padding[12];
 };
 
+enum {
+   /* xdp pass */
+   VIRTNET_XDP_RES_PASS,
+   /* drop packet. the caller needs to release the page. */
+   VIRTNET_XDP_RES_DROP,
+   /* packet is consumed by xdp. the caller needs to do nothing. */
+   VIRTNET_XDP_RES_CONSUMED,
+};
+
 static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
 static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
 
@@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device *dev,
return ret;
 }
 
+static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct xdp_buff *xdp,
+  struct net_device *dev,
+  unsigned int *xdp_xmit,
+  struct virtnet_rq_stats *stats)
+{
+   struct xdp_frame *xdpf;
+   int err;
+   u32 act;
+
+   act = bpf_prog_run_xdp(xdp_prog, xdp);
+   stats->xdp_packets++;
+
+   switch (act) {
+   case XDP_PASS:
+   return VIRTNET_XDP_RES_PASS;
+
+   case XDP_TX:
+   stats->xdp_tx++;
+   xdpf = xdp_convert_buff_to_frame(xdp);
+   if (unlikely(!xdpf))
+   return VIRTNET_XDP_RES_DROP;
+
+   err = virtnet_xdp_xmit(dev, 1, , 0);
+   if (unlikely(!err)) {
+   xdp_return_frame_rx_napi(xdpf);
+   } else if (unlikely(err < 0)) {
+   trace_xdp_exception(dev, xdp_prog, act);
+   return VIRTNET_XDP_RES_DROP;
+   }
+
+   *xdp_xmit |= VIRTIO_XDP_TX;
+   return VIRTNET_XDP_RES_CONSUMED;
+
+   case XDP_REDIRECT:
+   stats->xdp_redirects++;
+   err = xdp_do_redirect(dev, xdp, xdp_prog);
+   if (err)
+   return VIRTNET_XDP_RES_DROP;
+
+   *xdp_xmit |= VIRTIO_XDP_REDIR;
+   return VIRTNET_XDP_RES_CONSUMED;
+
+   default:
+   bpf_warn_invalid_xdp_action(dev, xdp_prog, act);
+   fallthrough;
+   case XDP_ABORTED:
+   trace_xdp_exception(dev, xdp_prog, act);
+   fallthrough;
+   case XDP_DROP:
+   return VIRTNET_XDP_RES_DROP;
+   }
+}
+
 static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 {
return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
@@ -876,7 +938,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
struct page *page = virt_to_head_page(buf);
unsigned int delta = 0;
struct page *xdp_page;
-   int err;
unsigned int metasize = 0;
 
len -= vi->hdr_len;
@@ -898,7 +959,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
-   struct xdp_frame *xdpf;
struct xdp_buff xdp;
void *orig_data;
u32 act;
@@ -931,46 +991,22 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
xdp_prepare_buff(, buf + VIRTNET_RX_PAD + vi->hdr_len,
 xdp_headroom, len, true);
orig_data = xdp.data;
-   act = bpf_prog_run_xdp(xdp_prog, );
-   stats->xdp_packets++;
+
+   act = virtnet_xdp_handler(xdp_prog, , dev, xdp_xmit, stats);
 
switch (act) {
-   case XDP_PASS:
+   case VIRTNET_XDP_RES_PASS:
/* Recalculate length in case bpf program changed it */
delta = orig_data - xdp.data;
len = xdp.data_end - xdp.data;
metasize = xdp.data - xdp.data_meta;
break;
-   case XDP_TX:
-   stats->xdp_tx++;
-   xdpf = xdp_convert_buff_to_frame();
-   if (unlikely(!xdpf))
-   goto err_xdp;
-   err = virtnet_xdp_xmit(dev, 1, , 0);
-   if (unlikely(!err)) {
-   xdp_return_frame_rx_napi(xdpf);
-   } else if (unlikely(err < 0)) {
-   trace_xdp_exception(vi->dev, xdp_prog, act);
-   goto err_xdp;
-   }
-

[PATCH net-next 0/8] virtio_net: refactor xdp codes

2023-03-28 Thread Xuan Zhuo

Due to historical reasons, the implementation of XDP in virtio-net is relatively
chaotic. For example, the processing of XDP actions has two copies of similar
code. Such as page, xdp_page processing, etc.

The purpose of this patch set is to refactor these code. Reduce the difficulty
of subsequent maintenance. Subsequent developers will not introduce new bugs
because of some complex logical relationships.

In addition, the supporting to AF_XDP that I want to submit later will also need
to reuse the logic of XDP, such as the processing of actions, I don't want to
introduce a new similar code. In this way, I can reuse these codes in the
future.

Please review.

Thanks.

v1:
1. fix some variables are uninitialized

Xuan Zhuo (8):
  virtio_net: mergeable xdp: put old page immediately
  virtio_net: mergeable xdp: introduce mergeable_xdp_prepare
  virtio_net: introduce virtnet_xdp_handler() to seprate the logic of
run xdp
  virtio_net: separate the logic of freeing xdp shinfo
  virtio_net: separate the logic of freeing the rest mergeable buf
  virtio_net: auto release xdp shinfo
  virtio_net: introduce receive_mergeable_xdp()
  virtio_net: introduce receive_small_xdp()

 drivers/net/virtio_net.c | 618 +++
 1 file changed, 360 insertions(+), 258 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net-next 1/8] virtio_net: mergeable xdp: put old page immediately

2023-03-28 Thread Xuan Zhuo

In the xdp implementation of virtio-net mergeable, it always checks
whether two page is used and a page is selected to release. This is
complicated for the processing of action, and be careful.

In the entire process, we have such principles:
* If xdp_page is used (PASS, TX, Redirect), then we release the old
  page.
* If it is a drop case, we will release two. The old page obtained from
  buf is release inside err_xdp, and xdp_page needs be relased by us.

But in fact, when we allocate a new page, we can release the old page
immediately. Then just one is using, we just need to release the new
page for drop case. On the drop path, err_xdp will release the variable
"page", so we only need to let "page" point to the new xdp_page in
advance.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e2560b6f7980..4d2bf1ce0730 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1245,6 +1245,9 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
if (!xdp_page)
goto err_xdp;
offset = VIRTIO_XDP_HEADROOM;
+
+   put_page(page);
+   page = xdp_page;
} else if (unlikely(headroom < virtnet_get_headroom(vi))) {
xdp_room = SKB_DATA_ALIGN(VIRTIO_XDP_HEADROOM +
  sizeof(struct 
skb_shared_info));
@@ -1259,6 +1262,9 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
   page_address(page) + offset, len);
frame_sz = PAGE_SIZE;
offset = VIRTIO_XDP_HEADROOM;
+
+   put_page(page);
+   page = xdp_page;
} else {
xdp_page = page;
}
@@ -1278,8 +1284,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
if (unlikely(!head_skb))
goto err_xdp_frags;
 
-   if (unlikely(xdp_page != page))
-   put_page(page);
rcu_read_unlock();
return head_skb;
case XDP_TX:
@@ -1297,8 +1301,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_xdp_frags;
}
*xdp_xmit |= VIRTIO_XDP_TX;
-   if (unlikely(xdp_page != page))
-   put_page(page);
rcu_read_unlock();
goto xdp_xmit;
case XDP_REDIRECT:
@@ -1307,8 +1309,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
if (err)
goto err_xdp_frags;
*xdp_xmit |= VIRTIO_XDP_REDIR;
-   if (unlikely(xdp_page != page))
-   put_page(page);
rcu_read_unlock();
goto xdp_xmit;
default:
@@ -1321,9 +1321,6 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
goto err_xdp_frags;
}
 err_xdp_frags:
-   if (unlikely(xdp_page != page))
-   __free_pages(xdp_page, 0);
-
if (xdp_buff_has_frags()) {
shinfo = xdp_get_shared_info_from_buff();
for (i = 0; i < shinfo->nr_frags; i++) {
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v1 1/2] vsock: return errors other than -ENOMEM to socket

2023-03-28 Thread Stefano Garzarella


On Tue, Mar 28, 2023 at 01:42:19PM +0300, Arseniy Krasnov wrote:



On 28.03.2023 12:42, Stefano Garzarella wrote:

I pressed send too early...

CCing Bryan, Vishnu, and pv-driv...@vmware.com

On Tue, Mar 28, 2023 at 11:39 AM Stefano Garzarella  wrote:


On Sun, Mar 26, 2023 at 01:13:11AM +0300, Arseniy Krasnov wrote:

This removes behaviour, where error code returned from any transport
was always switched to ENOMEM. This works in the same way as:
commit
c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"),
but for receive calls.

Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/af_vsock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 19aea7cba26e..9262e0b77d47 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2007,7 +2007,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct 
msghdr *msg,

  read = transport->stream_dequeue(vsk, msg, len - copied, flags);


In vmci_transport_stream_dequeue() vmci_qpair_peekv() and
vmci_qpair_dequev() return VMCI_ERROR_* in case of errors.

Maybe we should return -ENOMEM in vmci_transport_stream_dequeue() if
those functions fail to keep the same behavior.


Yes, seems i missed it, because several months ago we had similar question for 
send
logic:
https://www.spinics.net/lists/kernel/msg4611091.html
And it was ok to not handle VMCI send path in this way. So i think current 
implementation
for tx is a little bit buggy, because VMCI specific error from 
'vmci_qpair_enquev()' is
returned to af_vsock.c. I think error conversion must be added to VMCI 
transport for tx
also.


Good point!

These are negative values, so there are no big problems, but I don't
know what the user expects in this case.

@Vishnu Do we want to return an errno to the user or a VMCI_ERROR_*?

In both cases I think we should do the same for both enqueue and
dequeue.



Good thing is that Hyper-V uses general error codes.


Yeah!

Thanks,
Stefano



Thanks, Arseniy


CCing Bryan, Vishnu, and pv-driv...@vmware.com

The other transports seem okay to me.

Thanks,
Stefano


  if (read < 0) {
-  err = -ENOMEM;
+  err = read;
  break;
  }

@@ -2058,7 +2058,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, 
struct msghdr *msg,
  msg_len = transport->seqpacket_dequeue(vsk, msg, flags);

  if (msg_len < 0) {
-  err = -ENOMEM;
+  err = msg_len;
  goto out;
  }

--
2.25.1







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v1 2/2] vsock/test: update expected return values

2023-03-28 Thread Stefano Garzarella


On Sun, Mar 26, 2023 at 01:14:01AM +0300, Arseniy Krasnov wrote:

This updates expected return values for invalid buffer test. Now such
values are returned from transport, not from af_vsock.c.


Since only virtio transport supports it for now, it's okay.
In the future we should make sure that we have the same behavior between 
transports.


Reviewed-by: Stefano Garzarella 



Signed-off-by: Arseniy Krasnov 
---
tools/testing/vsock/vsock_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 3de10dbb50f5..a91d0ef963be 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -723,7 +723,7 @@ static void test_seqpacket_invalid_rec_buffer_server(const 
struct test_opts *opt
exit(EXIT_FAILURE);
}

-   if (errno != ENOMEM) {
+   if (errno != EFAULT) {
perror("unexpected errno of 'broken_buf'");
exit(EXIT_FAILURE);
}
@@ -887,7 +887,7 @@ static void test_inv_buf_client(const struct test_opts 
*opts, bool stream)
exit(EXIT_FAILURE);
}

-   if (errno != ENOMEM) {
+   if (errno != EFAULT) {
fprintf(stderr, "unexpected recv(2) errno %d\n", errno);
exit(EXIT_FAILURE);
}
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v1 1/2] vsock: return errors other than -ENOMEM to socket

2023-03-28 Thread Stefano Garzarella

I pressed send too early...

CCing Bryan, Vishnu, and pv-driv...@vmware.com

On Tue, Mar 28, 2023 at 11:39 AM Stefano Garzarella  wrote:
>
> On Sun, Mar 26, 2023 at 01:13:11AM +0300, Arseniy Krasnov wrote:
> >This removes behaviour, where error code returned from any transport
> >was always switched to ENOMEM. This works in the same way as:
> >commit
> >c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"),
> >but for receive calls.
> >
> >Signed-off-by: Arseniy Krasnov 
> >---
> > net/vmw_vsock/af_vsock.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> >index 19aea7cba26e..9262e0b77d47 100644
> >--- a/net/vmw_vsock/af_vsock.c
> >+++ b/net/vmw_vsock/af_vsock.c
> >@@ -2007,7 +2007,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, 
> >struct msghdr *msg,
> >
> >   read = transport->stream_dequeue(vsk, msg, len - copied, 
> > flags);
>
> In vmci_transport_stream_dequeue() vmci_qpair_peekv() and
> vmci_qpair_dequev() return VMCI_ERROR_* in case of errors.
>
> Maybe we should return -ENOMEM in vmci_transport_stream_dequeue() if
> those functions fail to keep the same behavior.
>
> CCing Bryan, Vishnu, and pv-driv...@vmware.com
>
> The other transports seem okay to me.
>
> Thanks,
> Stefano
>
> >   if (read < 0) {
> >-  err = -ENOMEM;
> >+  err = read;
> >   break;
> >   }
> >
> >@@ -2058,7 +2058,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, 
> >struct msghdr *msg,
> >   msg_len = transport->seqpacket_dequeue(vsk, msg, flags);
> >
> >   if (msg_len < 0) {
> >-  err = -ENOMEM;
> >+  err = msg_len;
> >   goto out;
> >   }
> >
> >--
> >2.25.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 2/3] virtio/vsock: WARN_ONCE() for invalid state of socket

2023-03-28 Thread Stefano Garzarella

On Tue, Mar 28, 2023 at 11:35 AM Arseniy Krasnov
 wrote:
>
>
>
> On 28.03.2023 12:29, Stefano Garzarella wrote:
> > On Sun, Mar 26, 2023 at 01:09:25AM +0300, Arseniy Krasnov wrote:
> >> This adds WARN_ONCE() and return from stream dequeue callback when
> >> socket's queue is empty, but 'rx_bytes' still non-zero.
> >
> > Nit: I would explain why we add this, for example:
> >
> > This allows the detection of potential bugs due to packet merging
> > (see previous patch).
> >
> >>
> >> Signed-off-by: Arseniy Krasnov 
> >> ---
> >> net/vmw_vsock/virtio_transport_common.c | 7 +++
> >> 1 file changed, 7 insertions(+)
> >
> >>
> >> diff --git a/net/vmw_vsock/virtio_transport_common.c 
> >> b/net/vmw_vsock/virtio_transport_common.c
> >> index b9144af71553..ad70531de133 100644
> >> --- a/net/vmw_vsock/virtio_transport_common.c
> >> +++ b/net/vmw_vsock/virtio_transport_common.c
> >> @@ -398,6 +398,13 @@ virtio_transport_stream_do_dequeue(struct vsock_sock 
> >> *vsk,
> >> u32 free_space;
> >>
> >> spin_lock_bh(>rx_lock);
> >> +
> >> +if (WARN_ONCE(skb_queue_empty(>rx_queue) && vvs->rx_bytes,
> >> +  "No skbuffs with non-zero 'rx_bytes'\n")) {
> >
> > Nit: I would rephrase it this way:
> > "rx_queue is empty, but rx_bytes is non-zero"
> >
> >> +spin_unlock_bh(>rx_lock);
> >> +return err;
> >> +}
> >> +
> >> while (total < len && !skb_queue_empty(>rx_queue)) {
> >> skb = skb_peek(>rx_queue);
> >>
> >> --
> >> 2.25.1
> >>
> >
> > Anyway the patch LGTM!
>
> Thanks for review! Since only string value and commit message should be
> updated, i can resend it with 'net' (as it is fix) and update two thing
> above in 'net' version?

Yep, sure!

And you can already add my R-b ;-)

Thanks,
Stefano

>
> Thanks, Arseniy
> >
> > Reviewed-by: Stefano Garzarella 
> >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v1 1/2] vsock: return errors other than -ENOMEM to socket

2023-03-28 Thread Stefano Garzarella


On Sun, Mar 26, 2023 at 01:13:11AM +0300, Arseniy Krasnov wrote:

This removes behaviour, where error code returned from any transport
was always switched to ENOMEM. This works in the same way as:
commit
c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"),
but for receive calls.

Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/af_vsock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 19aea7cba26e..9262e0b77d47 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2007,7 +2007,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct 
msghdr *msg,

read = transport->stream_dequeue(vsk, msg, len - copied, flags);


In vmci_transport_stream_dequeue() vmci_qpair_peekv() and
vmci_qpair_dequev() return VMCI_ERROR_* in case of errors.

Maybe we should return -ENOMEM in vmci_transport_stream_dequeue() if
those functions fail to keep the same behavior.

CCing Bryan, Vishnu, and pv-driv...@vmware.com

The other transports seem okay to me.

Thanks,
Stefano


if (read < 0) {
-   err = -ENOMEM;
+   err = read;
break;
}

@@ -2058,7 +2058,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, 
struct msghdr *msg,
msg_len = transport->seqpacket_dequeue(vsk, msg, flags);

if (msg_len < 0) {
-   err = -ENOMEM;
+   err = msg_len;
goto out;
}

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 3/3] test/vsock: new skbuff appending test

2023-03-28 Thread Stefano Garzarella


On Sun, Mar 26, 2023 at 01:10:16AM +0300, Arseniy Krasnov wrote:

This adds test which checks case when data of newly received skbuff is
appended to the last skbuff in the socket's queue. It looks like simple
test with 'send()' and 'recv()', but internally it triggers logic which
appends one received skbuff to another. Test checks that this feature
works correctly.

This test is actual only for virtio transport.

Signed-off-by: Arseniy Krasnov 
---
tools/testing/vsock/vsock_test.c | 90 
1 file changed, 90 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 3de10dbb50f5..12b97c92fbb2 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -968,6 +968,91 @@ static void test_seqpacket_inv_buf_server(const struct 
test_opts *opts)
test_inv_buf_server(opts, false);
}

+#define HELLO_STR "HELLO"
+#define WORLD_STR "WORLD"
+
+static void test_stream_virtio_skb_merge_client(const struct test_opts *opts)
+{
+   ssize_t res;
+   int fd;
+
+   fd = vsock_stream_connect(opts->peer_cid, 1234);
+   if (fd < 0) {
+   perror("connect");
+   exit(EXIT_FAILURE);
+   }
+
+   /* Send first skbuff. */
+   res = send(fd, HELLO_STR, strlen(HELLO_STR), 0);
+   if (res != strlen(HELLO_STR)) {
+   fprintf(stderr, "unexpected send(2) result %zi\n", res);
+   exit(EXIT_FAILURE);
+   }
+
+   control_writeln("SEND0");
+   /* Peer reads part of first skbuff. */
+   control_expectln("REPLY0");
+
+   /* Send second skbuff, it will be appended to the first. */
+   res = send(fd, WORLD_STR, strlen(WORLD_STR), 0);
+   if (res != strlen(WORLD_STR)) {
+   fprintf(stderr, "unexpected send(2) result %zi\n", res);
+   exit(EXIT_FAILURE);
+   }
+
+   control_writeln("SEND1");
+   /* Peer reads merged skbuff packet. */
+   control_expectln("REPLY1");
+
+   close(fd);
+}
+
+static void test_stream_virtio_skb_merge_server(const struct test_opts *opts)
+{
+   unsigned char buf[64];
+   ssize_t res;
+   int fd;
+
+   fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+   if (fd < 0) {
+   perror("accept");
+   exit(EXIT_FAILURE);
+   }
+
+   control_expectln("SEND0");
+
+   /* Read skbuff partially. */
+   res = recv(fd, buf, 2, 0);
+   if (res != 2) {
+   fprintf(stderr, "expected recv(2) returns 2 bytes, got %zi\n", 
res);
+   exit(EXIT_FAILURE);
+   }
+
+   control_writeln("REPLY0");
+   control_expectln("SEND1");
+
+   res = recv(fd, buf + 2, sizeof(buf) - 2, 0);
+   if (res != 8) {
+   fprintf(stderr, "expected recv(2) returns 8 bytes, got %zi\n", 
res);
+   exit(EXIT_FAILURE);
+   }
+
+   res = recv(fd, buf, sizeof(buf) - 8 - 2, MSG_DONTWAIT);
+   if (res != -1) {
+   fprintf(stderr, "expected recv(2) failure, got %zi\n", res);
+   exit(EXIT_FAILURE);
+   }
+
+   if (memcmp(buf, HELLO_STR WORLD_STR, strlen(HELLO_STR WORLD_STR))) {
+   fprintf(stderr, "pattern mismatch\n");
+   exit(EXIT_FAILURE);
+   }
+
+   control_writeln("REPLY1");
+
+   close(fd);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -1038,6 +1123,11 @@ static struct test_case test_cases[] = {
.run_client = test_seqpacket_inv_buf_client,
.run_server = test_seqpacket_inv_buf_server,
},
+   {
+   .name = "SOCK_STREAM virtio skb merge",
+   .run_client = test_stream_virtio_skb_merge_client,
+   .run_server = test_stream_virtio_skb_merge_server,
+   },
{},
};

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 2/3] virtio/vsock: WARN_ONCE() for invalid state of socket

2023-03-28 Thread Stefano Garzarella


On Sun, Mar 26, 2023 at 01:09:25AM +0300, Arseniy Krasnov wrote:

This adds WARN_ONCE() and return from stream dequeue callback when
socket's queue is empty, but 'rx_bytes' still non-zero.


Nit: I would explain why we add this, for example:

This allows the detection of potential bugs due to packet merging
(see previous patch).



Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/virtio_transport_common.c | 7 +++
1 file changed, 7 insertions(+)




diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index b9144af71553..ad70531de133 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -398,6 +398,13 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
u32 free_space;

spin_lock_bh(>rx_lock);
+
+   if (WARN_ONCE(skb_queue_empty(>rx_queue) && vvs->rx_bytes,
+ "No skbuffs with non-zero 'rx_bytes'\n")) {


Nit: I would rephrase it this way:
"rx_queue is empty, but rx_bytes is non-zero"


+   spin_unlock_bh(>rx_lock);
+   return err;
+   }
+
while (total < len && !skb_queue_empty(>rx_queue)) {
skb = skb_peek(>rx_queue);

--
2.25.1



Anyway the patch LGTM!

Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 11/16] virtio_net: introduce virtnet_dev_rx_queue_group()

2023-03-28 Thread Xuan Zhuo

Adding an API to set sysfs_rx_queue_group.

This is prepare for separating the virtio-related funcs.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 15 +++
 drivers/net/virtio/virtnet.h |  1 +
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 1323c6733f56..3f58af7d1550 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -2661,6 +2661,16 @@ static const struct attribute_group 
virtio_net_mrg_rx_group = {
.name = "virtio_net",
.attrs = virtio_net_mrg_rx_attrs
 };
+
+void virtnet_dev_rx_queue_group(struct virtnet_info *vi, struct net_device 
*dev)
+{
+   if (vi->mergeable_rx_bufs)
+   dev->sysfs_rx_queue_group = _net_mrg_rx_group;
+}
+#else
+void virtnet_dev_rx_queue_group(struct virtnet_info *vi, struct net_device 
*dev)
+{
+}
 #endif
 
 static bool virtnet_fail_on_feature(struct virtio_device *vdev,
@@ -2943,10 +2953,7 @@ static int virtnet_probe(struct virtio_device *vdev)
if (err)
goto free;
 
-#ifdef CONFIG_SYSFS
-   if (vi->mergeable_rx_bufs)
-   dev->sysfs_rx_queue_group = _net_mrg_rx_group;
-#endif
+   virtnet_dev_rx_queue_group(vi, dev);
netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
 
diff --git a/drivers/net/virtio/virtnet.h b/drivers/net/virtio/virtnet.h
index b889825c54d0..48e0c5ba346a 100644
--- a/drivers/net/virtio/virtnet.h
+++ b/drivers/net/virtio/virtnet.h
@@ -184,4 +184,5 @@ struct virtnet_info {
 int virtnet_rx_resize(struct virtnet_info *vi, struct virtnet_rq *rq, u32 
ring_num);
 int virtnet_tx_resize(struct virtnet_info *vi, struct virtnet_sq *sq, u32 
ring_num);
 int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs);
+void virtnet_dev_rx_queue_group(struct virtnet_info *vi, struct net_device 
*dev);
 #endif
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 15/16] virtio_net: add APIs to register/unregister virtio driver

2023-03-28 Thread Xuan Zhuo

This is prepare for separating vortio-related funcs.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 75a74864c3fe..02989cace0fb 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -3146,6 +3146,16 @@ static struct virtio_driver virtio_net_driver = {
 #endif
 };
 
+int virtnet_register_virtio_driver(void)
+{
+   return register_virtio_driver(_net_driver);
+}
+
+void virtnet_unregister_virtio_driver(void)
+{
+   unregister_virtio_driver(_net_driver);
+}
+
 static __init int virtio_net_driver_init(void)
 {
int ret;
@@ -3154,7 +3164,7 @@ static __init int virtio_net_driver_init(void)
if (ret)
return ret;
 
-   ret = register_virtio_driver(_net_driver);
+   ret = virtnet_register_virtio_driver();
if (ret) {
virtnet_cpuhp_remove();
return ret;
@@ -3166,7 +3176,7 @@ module_init(virtio_net_driver_init);
 
 static __exit void virtio_net_driver_exit(void)
 {
-   unregister_virtio_driver(_net_driver);
+   virtnet_unregister_virtio_driver();
virtnet_cpuhp_remove();
 }
 module_exit(virtio_net_driver_exit);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 14/16] virtio_net: move virtnet_[en/dis]able_delayed_refill to header file

2023-03-28 Thread Xuan Zhuo

Move virtnet_[en/dis]able_delayed_refill to header file.

This is prepare for separating virtio-related funcs.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 20 +++-
 drivers/net/virtio/virtnet.h | 15 +++
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 8f281a7f9d7a..75a74864c3fe 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -136,20 +136,6 @@ static struct page *get_a_page(struct virtnet_rq *rq, 
gfp_t gfp_mask)
return p;
 }
 
-static void enable_delayed_refill(struct virtnet_info *vi)
-{
-   spin_lock_bh(>refill_lock);
-   vi->refill_enabled = true;
-   spin_unlock_bh(>refill_lock);
-}
-
-static void disable_delayed_refill(struct virtnet_info *vi)
-{
-   spin_lock_bh(>refill_lock);
-   vi->refill_enabled = false;
-   spin_unlock_bh(>refill_lock);
-}
-
 static void virtqueue_napi_schedule(struct napi_struct *napi,
struct virtqueue *vq)
 {
@@ -1622,7 +1608,7 @@ static int virtnet_open(struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int i, err;
 
-   enable_delayed_refill(vi);
+   virtnet_enable_delayed_refill(vi);
 
for (i = 0; i < vi->max_queue_pairs; i++) {
if (i < vi->curr_queue_pairs)
@@ -1979,7 +1965,7 @@ static int virtnet_close(struct net_device *dev)
int i;
 
/* Make sure NAPI doesn't schedule refill work */
-   disable_delayed_refill(vi);
+   virtnet_disable_delayed_refill(vi);
/* Make sure virtnet_refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(>refill);
 
@@ -2068,7 +2054,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 
virtio_device_ready(vdev);
 
-   enable_delayed_refill(vi);
+   virtnet_enable_delayed_refill(vi);
 
if (netif_running(vi->dev)) {
err = virtnet_get_netdev()->ndo_open(vi->dev);
diff --git a/drivers/net/virtio/virtnet.h b/drivers/net/virtio/virtnet.h
index 1315dcf52f1b..5f20e9103a0e 100644
--- a/drivers/net/virtio/virtnet.h
+++ b/drivers/net/virtio/virtnet.h
@@ -193,4 +193,19 @@ void virtnet_skb_xmit_done(struct virtqueue *vq);
 void virtnet_skb_recv_done(struct virtqueue *rvq);
 void virtnet_refill_work(struct work_struct *work);
 void virtnet_free_bufs(struct virtnet_info *vi);
+
+static inline void virtnet_enable_delayed_refill(struct virtnet_info *vi)
+{
+   spin_lock_bh(>refill_lock);
+   vi->refill_enabled = true;
+   spin_unlock_bh(>refill_lock);
+}
+
+static inline void virtnet_disable_delayed_refill(struct virtnet_info *vi)
+{
+   spin_lock_bh(>refill_lock);
+   vi->refill_enabled = false;
+   spin_unlock_bh(>refill_lock);
+}
+
 #endif
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 16/16] virtio_net: separating the virtio code

2023-03-28 Thread Xuan Zhuo

Moving virtio-related functions such as virtio callbacks, virtio driver
register to a separate file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/Makefile |   3 +-
 drivers/net/virtio/virtnet.c| 884 +---
 drivers/net/virtio/virtnet.h|   2 +
 drivers/net/virtio/virtnet_virtio.c | 880 +++
 drivers/net/virtio/virtnet_virtio.h |   8 +
 5 files changed, 895 insertions(+), 882 deletions(-)
 create mode 100644 drivers/net/virtio/virtnet_virtio.c
 create mode 100644 drivers/net/virtio/virtnet_virtio.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 9b35fb00d6c7..6bdc870fa1c8 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -6,4 +6,5 @@
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 
 virtio_net-y := virtnet.o virtnet_common.o virtnet_ctrl.o \
-   virtnet_ethtool.o
+   virtnet_ethtool.o \
+   virtnet_virtio.o
diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 02989cace0fb..ca9d3073ba93 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -4,48 +4,18 @@
  * Copyright 2007 Rusty Russell  IBM Corporation
  */
 //#define DEBUG
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 
 #include "virtnet.h"
 #include "virtnet_common.h"
 #include "virtnet_ctrl.h"
 #include "virtnet_ethtool.h"
-
-static int napi_weight = NAPI_POLL_WEIGHT;
-module_param(napi_weight, int, 0444);
-
-static bool csum = true, gso = true, napi_tx = true;
-module_param(csum, bool, 0444);
-module_param(gso, bool, 0444);
-module_param(napi_tx, bool, 0644);
+#include "virtnet_virtio.h"
 
 #define GOOD_COPY_LEN  128
 
-static const unsigned long guest_offloads[] = {
-   VIRTIO_NET_F_GUEST_TSO4,
-   VIRTIO_NET_F_GUEST_TSO6,
-   VIRTIO_NET_F_GUEST_ECN,
-   VIRTIO_NET_F_GUEST_UFO,
-   VIRTIO_NET_F_GUEST_CSUM,
-   VIRTIO_NET_F_GUEST_USO4,
-   VIRTIO_NET_F_GUEST_USO6,
-   VIRTIO_NET_F_GUEST_HDRLEN
-};
-
 #define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
@@ -89,21 +59,11 @@ static int vq2txq(struct virtqueue *vq)
return (vq->index - 1) / 2;
 }
 
-static int txq2vq(int txq)
-{
-   return txq * 2 + 1;
-}
-
 static int vq2rxq(struct virtqueue *vq)
 {
return vq->index / 2;
 }
 
-static int rxq2vq(int rxq)
-{
-   return rxq * 2;
-}
-
 static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff 
*skb)
 {
return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
@@ -1570,7 +1530,7 @@ static void virtnet_poll_cleantx(struct virtnet_rq *rq)
}
 }
 
-static int virtnet_poll(struct napi_struct *napi, int budget)
+int virtnet_poll(struct napi_struct *napi, int budget)
 {
struct virtnet_rq *rq =
container_of(napi, struct virtnet_rq, napi);
@@ -1634,7 +1594,7 @@ static int virtnet_open(struct net_device *dev)
return 0;
 }
 
-static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+int virtnet_poll_tx(struct napi_struct *napi, int budget)
 {
struct virtnet_sq *sq = container_of(napi, struct virtnet_sq, napi);
struct virtnet_info *vi = sq->vq->vdev->priv;
@@ -1949,16 +1909,6 @@ int _virtnet_set_queues(struct virtnet_info *vi, u16 
queue_pairs)
return 0;
 }
 
-static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
-{
-   int err;
-
-   rtnl_lock();
-   err = _virtnet_set_queues(vi, queue_pairs);
-   rtnl_unlock();
-   return err;
-}
-
 static int virtnet_close(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -1978,96 +1928,6 @@ static int virtnet_close(struct net_device *dev)
return 0;
 }
 
-static void virtnet_init_default_rss(struct virtnet_info *vi)
-{
-   u32 indir_val = 0;
-   int i = 0;
-
-   vi->ctrl->rss.hash_types = vi->rss_hash_types_supported;
-   vi->rss_hash_types_saved = vi->rss_hash_types_supported;
-   vi->ctrl->rss.indirection_table_mask = vi->rss_indir_table_size
-   ? vi->rss_indir_table_size - 1 
: 0;
-   vi->ctrl->rss.unclassified_queue = 0;
-
-   for (; i < vi->rss_indir_table_size; ++i) {
-   indir_val = ethtool_rxfh_indir_default(i, vi->curr_queue_pairs);
-   vi->ctrl->rss.indirection_table[i] = indir_val;
-   }
-
-   vi->ctrl->rss.max_tx_vq = vi->curr_queue_pairs;
-   vi->ctrl->rss.hash_key_length = vi->rss_key_size;
-
-   netdev_rss_key_fill(vi->ctrl->rss.key, vi->rss_key_size);
-}
-
-static void virtnet_init_settings(struct net_device *dev)
-{
-   struct virtnet_info *vi = netdev_priv(dev);
-
-   vi->speed = SPEED_UNKNOWN;
-   vi->duplex =

[PATCH 12/16] virtio_net: introduce virtnet_get_netdev()

2023-03-28 Thread Xuan Zhuo

Adding an API to get netdev_ops. Avoid to use the netdev_ops directly.

This is prepare for separating the virtio-related funcs.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 11 ---
 drivers/net/virtio/virtnet.h |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 3f58af7d1550..5f508d9500f3 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -2054,7 +2054,7 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
netif_device_detach(vi->dev);
netif_tx_unlock_bh(vi->dev);
if (netif_running(vi->dev))
-   virtnet_close(vi->dev);
+   virtnet_get_netdev()->ndo_stop(vi->dev);
 }
 
 static int init_vqs(struct virtnet_info *vi);
@@ -2073,7 +2073,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
enable_delayed_refill(vi);
 
if (netif_running(vi->dev)) {
-   err = virtnet_open(vi->dev);
+   err = virtnet_get_netdev()->ndo_open(vi->dev);
if (err)
return err;
}
@@ -2319,6 +2319,11 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_tx_timeout = virtnet_tx_timeout,
 };
 
+const struct net_device_ops *virtnet_get_netdev(void)
+{
+   return _netdev;
+}
+
 static void virtnet_config_changed_work(struct work_struct *work)
 {
struct virtnet_info *vi =
@@ -2796,7 +2801,7 @@ static int virtnet_probe(struct virtio_device *vdev)
/* Set up network device as normal. */
dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIVE_ADDR_CHANGE |
   IFF_TX_SKB_NO_LINEAR;
-   dev->netdev_ops = _netdev;
+   dev->netdev_ops = virtnet_get_netdev();
dev->features = NETIF_F_HIGHDMA;
 
dev->ethtool_ops = virtnet_get_ethtool_ops();
diff --git a/drivers/net/virtio/virtnet.h b/drivers/net/virtio/virtnet.h
index 48e0c5ba346a..269ddc386418 100644
--- a/drivers/net/virtio/virtnet.h
+++ b/drivers/net/virtio/virtnet.h
@@ -185,4 +185,5 @@ int virtnet_rx_resize(struct virtnet_info *vi, struct 
virtnet_rq *rq, u32 ring_n
 int virtnet_tx_resize(struct virtnet_info *vi, struct virtnet_sq *sq, u32 
ring_num);
 int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs);
 void virtnet_dev_rx_queue_group(struct virtnet_info *vi, struct net_device 
*dev);
+const struct net_device_ops *virtnet_get_netdev(void);
 #endif
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 08/16] virtio_net: separating the APIs of cq

2023-03-28 Thread Xuan Zhuo

Separating the APIs of cq into a file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/Makefile   |   2 +-
 drivers/net/virtio/virtnet.c  | 299 +-
 drivers/net/virtio/virtnet_ctrl.c | 272 +++
 drivers/net/virtio/virtnet_ctrl.h |  45 +
 4 files changed, 319 insertions(+), 299 deletions(-)
 create mode 100644 drivers/net/virtio/virtnet_ctrl.c
 create mode 100644 drivers/net/virtio/virtnet_ctrl.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index 3bef2b51876c..a2d80f95c921 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 
-virtio_net-y := virtnet.o virtnet_common.o
+virtio_net-y := virtnet.o virtnet_common.o virtnet_ctrl.o
diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 4a3b5efb674e..84b90333dc77 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -22,6 +22,7 @@
 
 #include "virtnet.h"
 #include "virtnet_common.h"
+#include "virtnet_ctrl.h"
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -84,36 +85,6 @@ static const struct virtnet_stat_desc 
virtnet_rq_stats_desc[] = {
 #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
 #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
 
-/* This structure can contain rss message with maximum settings for 
indirection table and keysize
- * Note, that default structure that describes RSS configuration 
virtio_net_rss_config
- * contains same info but can't handle table values.
- * In any case, structure would be passed to virtio hw through sg_buf split by 
parts
- * because table sizes may be differ according to the device configuration.
- */
-#define VIRTIO_NET_RSS_MAX_KEY_SIZE 40
-#define VIRTIO_NET_RSS_MAX_TABLE_LEN128
-struct virtio_net_ctrl_rss {
-   u32 hash_types;
-   u16 indirection_table_mask;
-   u16 unclassified_queue;
-   u16 indirection_table[VIRTIO_NET_RSS_MAX_TABLE_LEN];
-   u16 max_tx_vq;
-   u8 hash_key_length;
-   u8 key[VIRTIO_NET_RSS_MAX_KEY_SIZE];
-};
-
-/* Control VQ buffers: protected by the rtnl lock */
-struct control_buf {
-   struct virtio_net_ctrl_hdr hdr;
-   virtio_net_ctrl_ack status;
-   struct virtio_net_ctrl_mq mq;
-   u8 promisc;
-   u8 allmulti;
-   __virtio16 vid;
-   __virtio64 offloads;
-   struct virtio_net_ctrl_rss rss;
-};
-
 struct padded_vnet_hdr {
struct virtio_net_hdr_v1_hash hdr;
/*
@@ -1932,73 +1903,6 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
return err;
 }
 
-/*
- * Send command via the control virtqueue and check status.  Commands
- * supported by the hypervisor, as indicated by feature bits, should
- * never fail unless improperly formatted.
- */
-static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
-struct scatterlist *out)
-{
-   struct scatterlist *sgs[4], hdr, stat;
-   unsigned out_num = 0, tmp;
-   int ret;
-
-   /* Caller should know better */
-   BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ));
-
-   vi->ctrl->status = ~0;
-   vi->ctrl->hdr.class = class;
-   vi->ctrl->hdr.cmd = cmd;
-   /* Add header */
-   sg_init_one(, >ctrl->hdr, sizeof(vi->ctrl->hdr));
-   sgs[out_num++] = 
-
-   if (out)
-   sgs[out_num++] = out;
-
-   /* Add return status. */
-   sg_init_one(, >ctrl->status, sizeof(vi->ctrl->status));
-   sgs[out_num] = 
-
-   BUG_ON(out_num + 1 > ARRAY_SIZE(sgs));
-   ret = virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC);
-   if (ret < 0) {
-   dev_warn(>vdev->dev,
-"Failed to add sgs for command vq: %d\n.", ret);
-   return false;
-   }
-
-   if (unlikely(!virtqueue_kick(vi->cvq)))
-   return vi->ctrl->status == VIRTIO_NET_OK;
-
-   /* Spin for a response, the kick causes an ioport write, trapping
-* into the hypervisor, so the request should be handled immediately.
-*/
-   while (!virtqueue_get_buf(vi->cvq, ) &&
-  !virtqueue_is_broken(vi->cvq))
-   cpu_relax();
-
-   return vi->ctrl->status == VIRTIO_NET_OK;
-}
-
-static int virtnet_ctrl_set_mac_address(struct virtnet_info *vi, const void 
*addr, int len)
-{
-   struct virtio_device *vdev = vi->vdev;
-   struct scatterlist sg;
-
-   sg_init_one(, addr, len);
-
-   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
- VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
-   dev_warn(>dev,
-"Failed to set mac address by vq command.\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi =

[PATCH 13/16] virtio_net: prepare for virtio

2023-03-28 Thread Xuan Zhuo

Put some functions or macro into the header file.

This is prepare for separating the virtio-related funcs.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 43 +++-
 drivers/net/virtio/virtnet.h |  7 ++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 5f508d9500f3..8f281a7f9d7a 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -33,8 +33,6 @@ module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
 
-/* FIXME: MTU in config. */
-#define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN  128
 
 static const unsigned long guest_offloads[] = {
@@ -175,7 +173,7 @@ static void virtqueue_napi_complete(struct napi_struct 
*napi,
}
 }
 
-static void skb_xmit_done(struct virtqueue *vq)
+void virtnet_skb_xmit_done(struct virtqueue *vq)
 {
struct virtnet_info *vi = vq->vdev->priv;
struct napi_struct *napi = >sq[vq2txq(vq)].napi;
@@ -635,7 +633,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
unsigned int xdp_headroom = (unsigned long)ctx;
unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
unsigned int headroom = vi->hdr_len + header_offset;
-   unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+   unsigned int buflen = SKB_DATA_ALIGN(VIRTNET_GOOD_PACKET_LEN + 
headroom) +
  SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
struct page *page = virt_to_head_page(buf);
unsigned int delta = 0;
@@ -646,9 +644,9 @@ static struct sk_buff *receive_small(struct net_device *dev,
len -= vi->hdr_len;
stats->bytes += len;
 
-   if (unlikely(len > GOOD_PACKET_LEN)) {
+   if (unlikely(len > VIRTNET_GOOD_PACKET_LEN)) {
pr_debug("%s: rx error: len %u exceeds max size %d\n",
-dev->name, len, GOOD_PACKET_LEN);
+dev->name, len, VIRTNET_GOOD_PACKET_LEN);
dev->stats.rx_length_errors++;
goto err;
}
@@ -678,7 +676,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
xdp_headroom = virtnet_get_headroom(vi);
header_offset = VIRTNET_RX_PAD + xdp_headroom;
headroom = vi->hdr_len + header_offset;
-   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+   buflen = SKB_DATA_ALIGN(VIRTNET_GOOD_PACKET_LEN + 
headroom) +
 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
xdp_page = xdp_linearize_page(rq, _buf, page,
  offset, header_offset,
@@ -1286,7 +1284,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, 
struct virtnet_rq *rq,
char *buf;
unsigned int xdp_headroom = virtnet_get_headroom(vi);
void *ctx = (void *)(unsigned long)xdp_headroom;
-   int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
+   int len = vi->hdr_len + VIRTNET_RX_PAD + VIRTNET_GOOD_PACKET_LEN + 
xdp_headroom;
int err;
 
len = SKB_DATA_ALIGN(len) +
@@ -1298,7 +1296,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, 
struct virtnet_rq *rq,
get_page(alloc_frag->page);
alloc_frag->offset += len;
sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
-   vi->hdr_len + GOOD_PACKET_LEN);
+   vi->hdr_len + VIRTNET_GOOD_PACKET_LEN);
err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
if (err < 0)
put_page(virt_to_head_page(buf));
@@ -1421,7 +1419,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
  * Returns false if we couldn't fill entirely (OOM).
  *
  * Normally run in the receive path, but can also be run from ndo_open
- * before we're receiving packets, or from refill_work which is
+ * before we're receiving packets, or from virtnet_refill_work which is
  * careful to disable receiving (using napi_disable).
  */
 static bool try_fill_recv(struct virtnet_info *vi, struct virtnet_rq *rq,
@@ -1453,7 +1451,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct 
virtnet_rq *rq,
return !oom;
 }
 
-static void skb_recv_done(struct virtqueue *rvq)
+void virtnet_skb_recv_done(struct virtqueue *rvq)
 {
struct virtnet_info *vi = rvq->vdev->priv;
struct virtnet_rq *rq = >rq[vq2rxq(rvq)];
@@ -1498,7 +1496,7 @@ static void virtnet_napi_tx_disable(struct napi_struct 
*napi)
napi_disable(napi);
 }
 
-static void refill_work(struct work_struct *work)
+void virtnet_refill_work(struct work_struct *work)
 {
struct virtnet_info *vi =
container_of(work, struct virtnet_info, refill.work);
@@ -1982,7 +1980,7 @@

[PATCH 05/16] virtio_net: separate virtnet_ctrl_set_queues()

2023-03-28 Thread Xuan Zhuo

Separating the code setting queues by cq to a function.

This is to facilitate separating cq-related functions into a separate
file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 3fcf70782d97..0196492f289b 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -2078,19 +2078,25 @@ static void virtnet_ack_link_announce(struct 
virtnet_info *vi)
rtnl_unlock();
 }
 
-static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+static int virtnet_ctrl_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
struct scatterlist sg;
+
+   vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
+   sg_init_one(, >ctrl->mq, sizeof(vi->ctrl->mq));
+
+   return virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
+   VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, );
+}
+
+static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+{
struct net_device *dev = vi->dev;
 
if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
return 0;
 
-   vi->ctrl->mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
-   sg_init_one(, >ctrl->mq, sizeof(vi->ctrl->mq));
-
-   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
- VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, )) {
+   if (!virtnet_ctrl_set_queues(vi, queue_pairs)) {
dev_warn(>dev, "Fail to set num of queue pairs to %d\n",
 queue_pairs);
return -EINVAL;
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 09/16] virtio_net: introduce virtnet_rq_update_stats()

2023-03-28 Thread Xuan Zhuo

Separating the code of updating rq stats.

This is prepare for separating the code of ethtool.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 84b90333dc77..36c747e43b3f 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -1550,6 +1550,21 @@ static void refill_work(struct work_struct *work)
}
 }
 
+static void virtnet_rq_update_stats(struct virtnet_rq *rq, struct 
virtnet_rq_stats *stats)
+{
+   int i;
+
+   u64_stats_update_begin(>stats.syncp);
+   for (i = 0; i < VIRTNET_RQ_STATS_LEN; i++) {
+   size_t offset = virtnet_rq_stats_desc[i].offset;
+   u64 *item;
+
+   item = (u64 *)((u8 *)>stats + offset);
+   *item += *(u64 *)((u8 *)stats + offset);
+   }
+   u64_stats_update_end(>stats.syncp);
+}
+
 static int virtnet_receive(struct virtnet_rq *rq, int budget,
   unsigned int *xdp_xmit)
 {
@@ -1557,7 +1572,6 @@ static int virtnet_receive(struct virtnet_rq *rq, int 
budget,
struct virtnet_rq_stats stats = {};
unsigned int len;
void *buf;
-   int i;
 
if (!vi->big_packets || vi->mergeable_rx_bufs) {
void *ctx;
@@ -1584,15 +1598,7 @@ static int virtnet_receive(struct virtnet_rq *rq, int 
budget,
}
}
 
-   u64_stats_update_begin(>stats.syncp);
-   for (i = 0; i < VIRTNET_RQ_STATS_LEN; i++) {
-   size_t offset = virtnet_rq_stats_desc[i].offset;
-   u64 *item;
-
-   item = (u64 *)((u8 *)>stats + offset);
-   *item += *(u64 *)((u8 *) + offset);
-   }
-   u64_stats_update_end(>stats.syncp);
+   virtnet_rq_update_stats(rq, );
 
return stats.packets;
 }
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 10/16] virtio_net: separating the funcs of ethtool

2023-03-28 Thread Xuan Zhuo

Put the functions of ethtool in an independent file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/Makefile  |   3 +-
 drivers/net/virtio/virtnet.c | 573 +-
 drivers/net/virtio/virtnet.h |   3 +
 drivers/net/virtio/virtnet_ethtool.c | 578 +++
 drivers/net/virtio/virtnet_ethtool.h |   8 +
 5 files changed, 596 insertions(+), 569 deletions(-)
 create mode 100644 drivers/net/virtio/virtnet_ethtool.c
 create mode 100644 drivers/net/virtio/virtnet_ethtool.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index a2d80f95c921..9b35fb00d6c7 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -5,4 +5,5 @@
 
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 
-virtio_net-y := virtnet.o virtnet_common.o virtnet_ctrl.o
+virtio_net-y := virtnet.o virtnet_common.o virtnet_ctrl.o \
+   virtnet_ethtool.o
diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 36c747e43b3f..1323c6733f56 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -23,6 +23,7 @@
 #include "virtnet.h"
 #include "virtnet_common.h"
 #include "virtnet_ctrl.h"
+#include "virtnet_ethtool.h"
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -54,37 +55,6 @@ static const unsigned long guest_offloads[] = {
(1ULL << VIRTIO_NET_F_GUEST_USO4) | \
(1ULL << VIRTIO_NET_F_GUEST_USO6))
 
-struct virtnet_stat_desc {
-   char desc[ETH_GSTRING_LEN];
-   size_t offset;
-};
-
-#define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m)
-#define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m)
-
-static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
-   { "packets",VIRTNET_SQ_STAT(packets) },
-   { "bytes",  VIRTNET_SQ_STAT(bytes) },
-   { "xdp_tx", VIRTNET_SQ_STAT(xdp_tx) },
-   { "xdp_tx_drops",   VIRTNET_SQ_STAT(xdp_tx_drops) },
-   { "kicks",  VIRTNET_SQ_STAT(kicks) },
-   { "tx_timeouts",VIRTNET_SQ_STAT(tx_timeouts) },
-};
-
-static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
-   { "packets",VIRTNET_RQ_STAT(packets) },
-   { "bytes",  VIRTNET_RQ_STAT(bytes) },
-   { "drops",  VIRTNET_RQ_STAT(drops) },
-   { "xdp_packets",VIRTNET_RQ_STAT(xdp_packets) },
-   { "xdp_tx", VIRTNET_RQ_STAT(xdp_tx) },
-   { "xdp_redirects",  VIRTNET_RQ_STAT(xdp_redirects) },
-   { "xdp_drops",  VIRTNET_RQ_STAT(xdp_drops) },
-   { "kicks",  VIRTNET_RQ_STAT(kicks) },
-};
-
-#define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
-#define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
-
 struct padded_vnet_hdr {
struct virtio_net_hdr_v1_hash hdr;
/*
@@ -1550,21 +1520,6 @@ static void refill_work(struct work_struct *work)
}
 }
 
-static void virtnet_rq_update_stats(struct virtnet_rq *rq, struct 
virtnet_rq_stats *stats)
-{
-   int i;
-
-   u64_stats_update_begin(>stats.syncp);
-   for (i = 0; i < VIRTNET_RQ_STATS_LEN; i++) {
-   size_t offset = virtnet_rq_stats_desc[i].offset;
-   u64 *item;
-
-   item = (u64 *)((u8 *)>stats + offset);
-   *item += *(u64 *)((u8 *)stats + offset);
-   }
-   u64_stats_update_end(>stats.syncp);
-}
-
 static int virtnet_receive(struct virtnet_rq *rq, int budget,
   unsigned int *xdp_xmit)
 {
@@ -1845,8 +1800,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
-static int virtnet_rx_resize(struct virtnet_info *vi,
-struct virtnet_rq *rq, u32 ring_num)
+int virtnet_rx_resize(struct virtnet_info *vi, struct virtnet_rq *rq, u32 
ring_num)
 {
bool running = netif_running(vi->dev);
int err, qindex;
@@ -1868,8 +1822,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
return err;
 }
 
-static int virtnet_tx_resize(struct virtnet_info *vi,
-struct virtnet_sq *sq, u32 ring_num)
+int virtnet_tx_resize(struct virtnet_info *vi, struct virtnet_sq *sq, u32 
ring_num)
 {
bool running = netif_running(vi->dev);
struct netdev_queue *txq;
@@ -1991,7 +1944,7 @@ static void virtnet_stats(struct net_device *dev,
tot->rx_frame_errors = dev->stats.rx_frame_errors;
 }
 
-static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
struct net_device *dev = vi->dev;
 
@@ -2041,66 +1994,6 @@ static int virtnet_close(struct net_device *dev)
return 0;
 }
 
-static void virtnet_get_ringparam(struct net_device *dev,
- struct ethtool_ringparam *ring,
-

[PATCH 03/16] virtio_net: add prefix to the struct inside header file

2023-03-28 Thread Xuan Zhuo

We moved some structures to the header file, but these structures do not
prefixed with virtnet. This patch adds virtnet for these.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 78 ++--
 drivers/net/virtio/virtnet.h | 12 +++---
 2 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 5ca354e29483..92ef95c163b6 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -174,7 +174,7 @@ static inline struct virtio_net_hdr_mrg_rxbuf 
*skb_vnet_hdr(struct sk_buff *skb)
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
  */
-static void give_pages(struct receive_queue *rq, struct page *page)
+static void give_pages(struct virtnet_rq *rq, struct page *page)
 {
struct page *end;
 
@@ -184,7 +184,7 @@ static void give_pages(struct receive_queue *rq, struct 
page *page)
rq->pages = page;
 }
 
-static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
+static struct page *get_a_page(struct virtnet_rq *rq, gfp_t gfp_mask)
 {
struct page *p = rq->pages;
 
@@ -268,7 +268,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
 
 /* Called from bottom half context */
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,
-  struct receive_queue *rq,
+  struct virtnet_rq *rq,
   struct page *page, unsigned int offset,
   unsigned int len, unsigned int truesize,
   unsigned int headroom)
@@ -370,7 +370,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
return skb;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
+static void free_old_xmit_skbs(struct virtnet_sq *sq, bool in_napi)
 {
unsigned int len;
unsigned int packets = 0;
@@ -418,7 +418,7 @@ static bool is_xdp_raw_buffer_queue(struct virtnet_info 
*vi, int q)
 
 static void check_sq_full_and_disable(struct virtnet_info *vi,
  struct net_device *dev,
- struct send_queue *sq)
+ struct virtnet_sq *sq)
 {
bool use_napi = sq->napi.weight;
int qnum;
@@ -452,7 +452,7 @@ static void check_sq_full_and_disable(struct virtnet_info 
*vi,
 }
 
 static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
-  struct send_queue *sq,
+  struct virtnet_sq *sq,
   struct xdp_frame *xdpf)
 {
struct virtio_net_hdr_mrg_rxbuf *hdr;
@@ -541,9 +541,9 @@ static int virtnet_xdp_xmit(struct net_device *dev,
int n, struct xdp_frame **frames, u32 flags)
 {
struct virtnet_info *vi = netdev_priv(dev);
-   struct receive_queue *rq = vi->rq;
+   struct virtnet_rq *rq = vi->rq;
struct bpf_prog *xdp_prog;
-   struct send_queue *sq;
+   struct virtnet_sq *sq;
unsigned int len;
int packets = 0;
int bytes = 0;
@@ -631,7 +631,7 @@ static unsigned int virtnet_get_headroom(struct 
virtnet_info *vi)
  * across multiple buffers (num_buf > 1), and we make sure buffers
  * have enough headroom.
  */
-static struct page *xdp_linearize_page(struct receive_queue *rq,
+static struct page *xdp_linearize_page(struct virtnet_rq *rq,
   int *num_buf,
   struct page *p,
   int offset,
@@ -683,7 +683,7 @@ static struct page *xdp_linearize_page(struct receive_queue 
*rq,
 
 static struct sk_buff *receive_small(struct net_device *dev,
 struct virtnet_info *vi,
-struct receive_queue *rq,
+struct virtnet_rq *rq,
 void *buf, void *ctx,
 unsigned int len,
 unsigned int *xdp_xmit,
@@ -827,7 +827,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 
 static struct sk_buff *receive_big(struct net_device *dev,
   struct virtnet_info *vi,
-  struct receive_queue *rq,
+  struct virtnet_rq *rq,
   void *buf,
   unsigned int len,
   struct virtnet_rq_stats *stats)
@@ -900,7 +900,7 @@ static struct sk_buff *build_skb_from_xdp_buff(struct 
net_device *dev,
 /* TODO: build xdp in big mode */
 static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
  struct virtnet_info *vi,
-

[PATCH 04/16] virtio_net: separating cpu-related funs

2023-03-28 Thread Xuan Zhuo

Add a file virtnet_common.c to save the common funcs.
This patch moves the cpu-related funs into it.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/Makefile |   2 +-
 drivers/net/virtio/virtnet.c| 132 ++
 drivers/net/virtio/virtnet_common.c | 138 
 drivers/net/virtio/virtnet_common.h |  14 +++
 4 files changed, 163 insertions(+), 123 deletions(-)
 create mode 100644 drivers/net/virtio/virtnet_common.c
 create mode 100644 drivers/net/virtio/virtnet_common.h

diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
index ccd45c0e5064..3bef2b51876c 100644
--- a/drivers/net/virtio/Makefile
+++ b/drivers/net/virtio/Makefile
@@ -5,4 +5,4 @@
 
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 
-virtio_net-y := virtnet.o
+virtio_net-y := virtnet.o virtnet_common.o
diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 92ef95c163b6..3fcf70782d97 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -14,7 +14,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -22,6 +21,7 @@
 #include 
 
 #include "virtnet.h"
+#include "virtnet_common.h"
 
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
@@ -2233,108 +2233,6 @@ static int virtnet_vlan_rx_kill_vid(struct net_device 
*dev,
return 0;
 }
 
-static void virtnet_clean_affinity(struct virtnet_info *vi)
-{
-   int i;
-
-   if (vi->affinity_hint_set) {
-   for (i = 0; i < vi->max_queue_pairs; i++) {
-   virtqueue_set_affinity(vi->rq[i].vq, NULL);
-   virtqueue_set_affinity(vi->sq[i].vq, NULL);
-   }
-
-   vi->affinity_hint_set = false;
-   }
-}
-
-static void virtnet_set_affinity(struct virtnet_info *vi)
-{
-   cpumask_var_t mask;
-   int stragglers;
-   int group_size;
-   int i, j, cpu;
-   int num_cpu;
-   int stride;
-
-   if (!zalloc_cpumask_var(, GFP_KERNEL)) {
-   virtnet_clean_affinity(vi);
-   return;
-   }
-
-   num_cpu = num_online_cpus();
-   stride = max_t(int, num_cpu / vi->curr_queue_pairs, 1);
-   stragglers = num_cpu >= vi->curr_queue_pairs ?
-   num_cpu % vi->curr_queue_pairs :
-   0;
-   cpu = cpumask_first(cpu_online_mask);
-
-   for (i = 0; i < vi->curr_queue_pairs; i++) {
-   group_size = stride + (i < stragglers ? 1 : 0);
-
-   for (j = 0; j < group_size; j++) {
-   cpumask_set_cpu(cpu, mask);
-   cpu = cpumask_next_wrap(cpu, cpu_online_mask,
-   nr_cpu_ids, false);
-   }
-   virtqueue_set_affinity(vi->rq[i].vq, mask);
-   virtqueue_set_affinity(vi->sq[i].vq, mask);
-   __netif_set_xps_queue(vi->dev, cpumask_bits(mask), i, XPS_CPUS);
-   cpumask_clear(mask);
-   }
-
-   vi->affinity_hint_set = true;
-   free_cpumask_var(mask);
-}
-
-static int virtnet_cpu_online(unsigned int cpu, struct hlist_node *node)
-{
-   struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
-  node);
-   virtnet_set_affinity(vi);
-   return 0;
-}
-
-static int virtnet_cpu_dead(unsigned int cpu, struct hlist_node *node)
-{
-   struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
-  node_dead);
-   virtnet_set_affinity(vi);
-   return 0;
-}
-
-static int virtnet_cpu_down_prep(unsigned int cpu, struct hlist_node *node)
-{
-   struct virtnet_info *vi = hlist_entry_safe(node, struct virtnet_info,
-  node);
-
-   virtnet_clean_affinity(vi);
-   return 0;
-}
-
-static enum cpuhp_state virtionet_online;
-
-static int virtnet_cpu_notif_add(struct virtnet_info *vi)
-{
-   int ret;
-
-   ret = cpuhp_state_add_instance_nocalls(virtionet_online, >node);
-   if (ret)
-   return ret;
-   ret = cpuhp_state_add_instance_nocalls(CPUHP_VIRT_NET_DEAD,
-  >node_dead);
-   if (!ret)
-   return ret;
-   cpuhp_state_remove_instance_nocalls(virtionet_online, >node);
-   return ret;
-}
-
-static void virtnet_cpu_notif_remove(struct virtnet_info *vi)
-{
-   cpuhp_state_remove_instance_nocalls(virtionet_online, >node);
-   cpuhp_state_remove_instance_nocalls(CPUHP_VIRT_NET_DEAD,
-   >node_dead);
-}
-
 static void virtnet_get_ringparam(struct net_device *dev,
  struct ethtool_ringparam *ring,
  struct kernel_ethtool_ringparam *kernel_ring,
@@ -4091,34 +3989,24 @@ static __init int virtio_net_driver_init(void)
 {

[PATCH 07/16] virtio_net: remove lock from virtnet_ack_link_announce()

2023-03-28 Thread Xuan Zhuo

Removing rtnl_lock() from virtnet_ack_link_announce(). This is to
facilitate separating cq-related functions into a separate file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 6ad217af44d9..4a3b5efb674e 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -2083,11 +2083,9 @@ static void virtnet_stats(struct net_device *dev,
 
 static void virtnet_ack_link_announce(struct virtnet_info *vi)
 {
-   rtnl_lock();
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_ANNOUNCE,
  VIRTIO_NET_CTRL_ANNOUNCE_ACK, NULL))
dev_warn(>dev->dev, "Failed to ack link announce.\n");
-   rtnl_unlock();
 }
 
 static int virtnet_ctrl_set_queues(struct virtnet_info *vi, u16 queue_pairs)
@@ -3187,7 +3185,10 @@ static void virtnet_config_changed_work(struct 
work_struct *work)
 
if (v & VIRTIO_NET_S_ANNOUNCE) {
netdev_notify_peers(vi->dev);
+
+   rtnl_lock();
virtnet_ack_link_announce(vi);
+   rtnl_unlock();
}
 
/* Ignore unknown (future) status bits */
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 06/16] virtio_net: separate virtnet_ctrl_set_mac_address()

2023-03-28 Thread Xuan Zhuo

Separating the code setting MAC by cq to a function.

This is to facilitate separating cq-related functions into a separate
file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index 0196492f289b..6ad217af44d9 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -1982,13 +1982,29 @@ static bool virtnet_send_command(struct virtnet_info 
*vi, u8 class, u8 cmd,
return vi->ctrl->status == VIRTIO_NET_OK;
 }
 
+static int virtnet_ctrl_set_mac_address(struct virtnet_info *vi, const void 
*addr, int len)
+{
+   struct virtio_device *vdev = vi->vdev;
+   struct scatterlist sg;
+
+   sg_init_one(, addr, len);
+
+   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
+ VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
+   dev_warn(>dev,
+"Failed to set mac address by vq command.\n");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
struct virtnet_info *vi = netdev_priv(dev);
struct virtio_device *vdev = vi->vdev;
int ret;
struct sockaddr *addr;
-   struct scatterlist sg;
 
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STANDBY))
return -EOPNOTSUPP;
@@ -2002,11 +2018,7 @@ static int virtnet_set_mac_address(struct net_device 
*dev, void *p)
goto out;
 
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
-   sg_init_one(, addr->sa_data, dev->addr_len);
-   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
- VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
-   dev_warn(>dev,
-"Failed to set mac address by vq command.\n");
+   if (virtnet_ctrl_set_mac_address(vi, addr->sa_data, 
dev->addr_len)) {
ret = -EINVAL;
goto out;
}
@@ -3822,12 +3834,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 */
if (!virtio_has_feature(vdev, VIRTIO_NET_F_MAC) &&
virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_MAC_ADDR)) {
-   struct scatterlist sg;
-
-   sg_init_one(, dev->dev_addr, dev->addr_len);
-   if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MAC,
- VIRTIO_NET_CTRL_MAC_ADDR_SET, )) {
-   pr_debug("virtio_net: setting MAC address failed\n");
+   if (virtnet_ctrl_set_mac_address(vi, dev->dev_addr, 
dev->addr_len)) {
rtnl_unlock();
err = -EINVAL;
goto free_unregister_netdev;
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 02/16] virtio_net: move struct to header file

2023-03-28 Thread Xuan Zhuo

Moving some structures and macros to header file.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio/virtnet.c | 181 +-
 drivers/net/virtio/virtnet.h | 184 +++
 2 files changed, 186 insertions(+), 179 deletions(-)
 create mode 100644 drivers/net/virtio/virtnet.h

diff --git a/drivers/net/virtio/virtnet.c b/drivers/net/virtio/virtnet.c
index e2560b6f7980..5ca354e29483 100644
--- a/drivers/net/virtio/virtnet.c
+++ b/drivers/net/virtio/virtnet.c
@@ -6,7 +6,6 @@
 //#define DEBUG
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -16,13 +15,14 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
+#include "virtnet.h"
+
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
@@ -35,26 +35,6 @@ module_param(napi_tx, bool, 0644);
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN  128
 
-#define VIRTNET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
-
-/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
-#define VIRTIO_XDP_HEADROOM 256
-
-/* Separating two types of XDP xmit */
-#define VIRTIO_XDP_TX  BIT(0)
-#define VIRTIO_XDP_REDIR   BIT(1)
-
-#define VIRTIO_XDP_FLAGBIT(0)
-
-/* RX packet size EWMA. The average packet size is used to determine the packet
- * buffer size when refilling RX rings. As the entire RX ring may be refilled
- * at once, the weight is chosen so that the EWMA will be insensitive to short-
- * term, transient changes in packet size.
- */
-DECLARE_EWMA(pkt_len, 0, 64)
-
-#define VIRTNET_DRIVER_VERSION "1.0.0"
-
 static const unsigned long guest_offloads[] = {
VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6,
@@ -78,28 +58,6 @@ struct virtnet_stat_desc {
size_t offset;
 };
 
-struct virtnet_sq_stats {
-   struct u64_stats_sync syncp;
-   u64 packets;
-   u64 bytes;
-   u64 xdp_tx;
-   u64 xdp_tx_drops;
-   u64 kicks;
-   u64 tx_timeouts;
-};
-
-struct virtnet_rq_stats {
-   struct u64_stats_sync syncp;
-   u64 packets;
-   u64 bytes;
-   u64 drops;
-   u64 xdp_packets;
-   u64 xdp_tx;
-   u64 xdp_redirects;
-   u64 xdp_drops;
-   u64 kicks;
-};
-
 #define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m)
 #define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m)
 
@@ -126,57 +84,6 @@ static const struct virtnet_stat_desc 
virtnet_rq_stats_desc[] = {
 #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
 #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
 
-/* Internal representation of a send virtqueue */
-struct send_queue {
-   /* Virtqueue associated with this send _queue */
-   struct virtqueue *vq;
-
-   /* TX: fragments + linear part + virtio header */
-   struct scatterlist sg[MAX_SKB_FRAGS + 2];
-
-   /* Name of the send queue: output.$index */
-   char name[16];
-
-   struct virtnet_sq_stats stats;
-
-   struct napi_struct napi;
-
-   /* Record whether sq is in reset state. */
-   bool reset;
-};
-
-/* Internal representation of a receive virtqueue */
-struct receive_queue {
-   /* Virtqueue associated with this receive_queue */
-   struct virtqueue *vq;
-
-   struct napi_struct napi;
-
-   struct bpf_prog __rcu *xdp_prog;
-
-   struct virtnet_rq_stats stats;
-
-   /* Chain pages by the private ptr. */
-   struct page *pages;
-
-   /* Average packet length for mergeable receive buffers. */
-   struct ewma_pkt_len mrg_avg_pkt_len;
-
-   /* Page frag for packet buffer allocation. */
-   struct page_frag alloc_frag;
-
-   /* RX: fragments + linear part + virtio header */
-   struct scatterlist sg[MAX_SKB_FRAGS + 2];
-
-   /* Min single buffer size for mergeable buffers case. */
-   unsigned int min_buf_len;
-
-   /* Name of this receive queue: input.$index */
-   char name[16];
-
-   struct xdp_rxq_info xdp_rxq;
-};
-
 /* This structure can contain rss message with maximum settings for 
indirection table and keysize
  * Note, that default structure that describes RSS configuration 
virtio_net_rss_config
  * contains same info but can't handle table values.
@@ -207,90 +114,6 @@ struct control_buf {
struct virtio_net_ctrl_rss rss;
 };
 
-struct virtnet_info {
-   struct virtio_device *vdev;
-   struct virtqueue *cvq;
-   struct net_device *dev;
-   struct send_queue *sq;
-   struct receive_queue *rq;
-   unsigned int status;
-
-   /* Max # of queue pairs supported by the device */
-   u16 max_queue_pairs;
-
-   /* # of queue pairs currently used by the driver */
-   u16 curr_queue_pairs;
-
-   /* # of XDP queue pairs currently used by the driver */
-   u16 xdp_queue_pairs;
-
-   /* xdp_queue_pairs may be 0, when xdp is already loaded. So add this. */
-

[PATCH 00/16] virtio-net: split virtio-net.c

2023-03-28 Thread Xuan Zhuo

Considering the complexity of virtio-net.c and the new features we want
to add, it is time to split virtio-net.c into multiple independent
module files.

This is beneficial to the maintenance and adding new functions.

And AF_XDP support will be added later, then a separate xsk.c file will
be added.

This patchset split virtio-net.c into these parts:

* virtnet.c: virtio net device ops (napi, tx, rx, device ops, ...)
* virtnet_common.c:  virtio net common code
* virtnet_ethtool.c: virtio net ethtool callbacks
* virtnet_ctrl.c:virtio net ctrl queue command APIs
* virtnet_virtio.c:  virtio net virtio callbacks/ops (driver register, virtio 
probe, virtio free, ...)

Please review.

Thanks.

Xuan Zhuo (16):
  virtio_net: add a separate directory for virtio-net
  virtio_net: move struct to header file
  virtio_net: add prefix to the struct inside header file
  virtio_net: separating cpu-related funs
  virtio_net: separate virtnet_ctrl_set_queues()
  virtio_net: separate virtnet_ctrl_set_mac_address()
  virtio_net: remove lock from virtnet_ack_link_announce()
  virtio_net: separating the APIs of cq
  virtio_net: introduce virtnet_rq_update_stats()
  virtio_net: separating the funcs of ethtool
  virtio_net: introduce virtnet_dev_rx_queue_group()
  virtio_net: introduce virtnet_get_netdev()
  virtio_net: prepare for virtio
  virtio_net: move virtnet_[en/dis]able_delayed_refill to header file
  virtio_net: add APIs to register/unregister virtio driver
  virtio_net: separating the virtio code

 MAINTAINERS   |2 +-
 drivers/net/Kconfig   |8 +-
 drivers/net/Makefile  |2 +-
 drivers/net/virtio/Kconfig|   11 +
 drivers/net/virtio/Makefile   |   10 +
 .../net/{virtio_net.c => virtio/virtnet.c}| 2368 ++---
 drivers/net/virtio/virtnet.h  |  213 ++
 drivers/net/virtio/virtnet_common.c   |  138 +
 drivers/net/virtio/virtnet_common.h   |   14 +
 drivers/net/virtio/virtnet_ctrl.c |  272 ++
 drivers/net/virtio/virtnet_ctrl.h |   45 +
 drivers/net/virtio/virtnet_ethtool.c  |  578 
 drivers/net/virtio/virtnet_ethtool.h  |8 +
 drivers/net/virtio/virtnet_virtio.c   |  880 ++
 drivers/net/virtio/virtnet_virtio.h   |8 +
 15 files changed, 2366 insertions(+), 2191 deletions(-)
 create mode 100644 drivers/net/virtio/Kconfig
 create mode 100644 drivers/net/virtio/Makefile
 rename drivers/net/{virtio_net.c => virtio/virtnet.c} (50%)
 create mode 100644 drivers/net/virtio/virtnet.h
 create mode 100644 drivers/net/virtio/virtnet_common.c
 create mode 100644 drivers/net/virtio/virtnet_common.h
 create mode 100644 drivers/net/virtio/virtnet_ctrl.c
 create mode 100644 drivers/net/virtio/virtnet_ctrl.h
 create mode 100644 drivers/net/virtio/virtnet_ethtool.c
 create mode 100644 drivers/net/virtio/virtnet_ethtool.h
 create mode 100644 drivers/net/virtio/virtnet_virtio.c
 create mode 100644 drivers/net/virtio/virtnet_virtio.h

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH 01/16] virtio_net: add a separate directory for virtio-net

2023-03-28 Thread Xuan Zhuo

Create a separate directory for virtio-net.

Considering the complexity of virtio-net.c and the new features we want
to add, it is time to split virtio-net.c into multiple independent
module files.

This is beneficial to the maintenance and adding new functions.

And AF_XDP support will be added later, then a separate xsk.c file will
be added.

Signed-off-by: Xuan Zhuo 
---
 MAINTAINERS|  2 +-
 drivers/net/Kconfig|  8 +---
 drivers/net/Makefile   |  2 +-
 drivers/net/virtio/Kconfig | 11 +++
 drivers/net/virtio/Makefile|  8 
 drivers/net/{virtio_net.c => virtio/virtnet.c} |  0
 6 files changed, 22 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/virtio/Kconfig
 create mode 100644 drivers/net/virtio/Makefile
 rename drivers/net/{virtio_net.c => virtio/virtnet.c} (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index fbbda4671e73..6bb3c2199003 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22083,7 +22083,7 @@ F:  Documentation/devicetree/bindings/virtio/
 F: Documentation/driver-api/virtio/
 F: drivers/block/virtio_blk.c
 F: drivers/crypto/virtio/
-F: drivers/net/virtio_net.c
+F: drivers/net/virtio/
 F: drivers/vdpa/
 F: drivers/virtio/
 F: include/linux/vdpa.h
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index c34bd432da27..23a169d248b5 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -407,13 +407,7 @@ config VETH
  When one end receives the packet it appears on its pair and vice
  versa.
 
-config VIRTIO_NET
-   tristate "Virtio network driver"
-   depends on VIRTIO
-   select NET_FAILOVER
-   help
- This is the virtual network driver for virtio.  It can be used with
- QEMU based VMMs (like KVM or Xen).  Say Y or M.
+source "drivers/net/virtio/Kconfig"
 
 config NLMON
tristate "Virtual netlink monitoring device"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index e26f98f897c5..47537dd0f120 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -31,7 +31,7 @@ obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
 obj-$(CONFIG_TAP) += tap.o
 obj-$(CONFIG_VETH) += veth.o
-obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
+obj-$(CONFIG_VIRTIO_NET) += virtio/
 obj-$(CONFIG_VXLAN) += vxlan/
 obj-$(CONFIG_GENEVE) += geneve.o
 obj-$(CONFIG_BAREUDP) += bareudp.o
diff --git a/drivers/net/virtio/Kconfig b/drivers/net/virtio/Kconfig
new file mode 100644
index ..9bc2a2fc6c3e
--- /dev/null
+++ b/drivers/net/virtio/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# virtio-net device configuration
+#
+config VIRTIO_NET
+   tristate "Virtio network driver"
+   depends on VIRTIO
+   select NET_FAILOVER
+   help
+ This is the virtual network driver for virtio.  It can be used with
+ QEMU based VMMs (like KVM or Xen).  Say Y or M.
diff --git a/drivers/net/virtio/Makefile b/drivers/net/virtio/Makefile
new file mode 100644
index ..ccd45c0e5064
--- /dev/null
+++ b/drivers/net/virtio/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Makefile for the virtio network device drivers.
+#
+
+obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
+
+virtio_net-y := virtnet.o
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio/virtnet.c
similarity index 100%
rename from drivers/net/virtio_net.c
rename to drivers/net/virtio/virtnet.c
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 1/3] virtio/vsock: fix header length on skb merging

2023-03-28 Thread Stefano Garzarella


On Sun, Mar 26, 2023 at 01:08:22AM +0300, Arseniy Krasnov wrote:

This fixes appending newly arrived skbuff to the last skbuff of the
socket's queue. Problem fires when we are trying to append data to skbuff
which was already processed in dequeue callback at least once. Dequeue
callback calls function 'skb_pull()' which changes 'skb->len'. In current
implementation 'skb->len' is used to update length in header of the last
skbuff after new data was copied to it. This is bug, because value in
header is used to calculate 'rx_bytes'/'fwd_cnt' and thus must be not
be changed during skbuff's lifetime.

Bug starts to fire since:

commit 077706165717
("virtio/vsock: don't use skbuff state to account credit")

It presents before, but didn't triggered due to a little bit buggy
implementation of credit calculation logic. So use Fixes tag for it.

Fixes: 077706165717 ("virtio/vsock: don't use skbuff state to account credit")
Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/virtio_transport_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 7fc178c3ee07..b9144af71553 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -1101,7 +1101,7 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk,
memcpy(skb_put(last_skb, skb->len), skb->data, 
skb->len);
free_pkt = true;
last_hdr->flags |= hdr->flags;
-   last_hdr->len = cpu_to_le32(last_skb->len);
+   le32_add_cpu(_hdr->len, len);
goto out;
}
}
--
2.25.1



LGTM!

Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 1/3] vsock: support sockmap

2023-03-28 Thread Stefano Garzarella


On Mon, Mar 27, 2023 at 07:11:51PM +, Bobby Eshleman wrote:

This patch adds sockmap support for vsock sockets. It is intended to be
usable by all transports, but only the virtio and loopback transports
are implemented.

SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported.

Signed-off-by: Bobby Eshleman 
Acked-by: Michael S. Tsirkin 
---
drivers/vhost/vsock.c   |   1 +
include/linux/virtio_vsock.h|   1 +
include/net/af_vsock.h  |  17 
net/vmw_vsock/Makefile  |   1 +
net/vmw_vsock/af_vsock.c|  64 ++--
net/vmw_vsock/virtio_transport.c|   2 +
net/vmw_vsock/virtio_transport_common.c |  25 +
net/vmw_vsock/vsock_bpf.c   | 174 
net/vmw_vsock/vsock_loopback.c  |   2 +
9 files changed, 281 insertions(+), 6 deletions(-)


LGTM!

Reviewed-by: Stefano Garzarella 

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net] virtio/vsock: fix leak due to missing skb owner

2023-03-28 Thread Stefano Garzarella


On Mon, Mar 27, 2023 at 10:01:05PM +, Bobby Eshleman wrote:

This patch sets the owner for the skb when being sent from a socket and
so solves the leak caused when virtio_transport_purge_skbs() finds
skb->sk is always NULL and therefore never matches it with the current
socket. Setting the owner upon allocation fixes this.

Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff")
Signed-off-by: Bobby Eshleman 
Reported-by: Cong Wang 
Link: https://lore.kernel.org/all/ZCCbATwov4U+GBUv@pop-os.localdomain/
---
net/vmw_vsock/virtio_transport_common.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 957cdc01c8e8..2a2f0c1a9fbd 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -94,6 +94,9 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info,
 info->op,
 info->flags);

+   if (info->vsk)
+   skb_set_owner_w(skb, sk_vsock(info->vsk));
+


Should we do the same also in virtio_transport_recv_pkt()?

The skb in that cases is allocated in drivers/vhost/vsock.c and
net/vmw_vsock/virtio_transport.c using directly
virtio_vsock_alloc_skb(), because we don't know in advance which socket
it belongs to.

Then in virtio_transport_recv_pkt() we look for the socket and queue it
up. This should also solve the problem in vsock_loopback.c where we move
skb from one socket to another.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] testing/vsock: add vsock_perf to gitignore

2023-03-28 Thread Stefano Garzarella


On Mon, Mar 27, 2023 at 10:16:06PM +, Bobby Eshleman wrote:

This adds the vsock_perf binary to the gitignore file.

Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility")
Signed-off-by: Bobby Eshleman 
---
tools/testing/vsock/.gitignore | 1 +
1 file changed, 1 insertion(+)


Reviewed-by: Stefano Garzarella 



diff --git a/tools/testing/vsock/.gitignore b/tools/testing/vsock/.gitignore
index 87ca2731cff9..a8adcfdc292b 100644
--- a/tools/testing/vsock/.gitignore
+++ b/tools/testing/vsock/.gitignore
@@ -2,3 +2,4 @@
*.d
vsock_test
vsock_diag_test
+vsock_perf

---
base-commit: e5b42483ccce50d5b957f474fd332afd4ef0c27b
change-id: 20230327-vsock-add-vsock-perf-to-ignore-82b46b1f3f6f

Best regards,
--
Bobby Eshleman 



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v6 07/11] virtio_ring: update document for virtqueue_add_*

2023-03-28 Thread Jason Wang

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo  wrote:
>
> Update the document of virtqueue_add_* series API, allowing the callers to
> use sg->dma_address to pass the dma address to Virtio Core.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d5dffbe50070..0b198ec3ff92 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2202,6 +2202,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma 
> address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_sgs(struct virtqueue *_vq,
> @@ -2236,6 +2240,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma 
> address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_outbuf(struct virtqueue *vq,
> @@ -2258,6 +2266,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma 
> address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_inbuf(struct virtqueue *vq,
> @@ -2281,6 +2293,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
>   * Caller must ensure we don't call this with other virtqueue operations
>   * at the same time (except where noted).
>   *
> + * If the caller has done dma map then use sg->dma_address to pass dma 
> address.
> + * If one sg->dma_address is used, then all sgs must use sg->dma_address;
> + * otherwise all sg->dma_address must be NULL.
> + *
>   * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
>   */
>  int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v6 06/11] virtio_ring: packed-indirect: support premapped

2023-03-28 Thread Jason Wang

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo  wrote:
>
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
>
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
>
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use sg->dma_address,
> otherwise all dma_address must be null when passing it to the APIs of
> virtio.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 2afff1dc6c74..d5dffbe50070 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1338,6 +1338,7 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
> unsigned int i, n;
> u16 head, id;
> dma_addr_t addr;
> +   bool dma_map_internal;
>
> head = vq->packed.next_avail_idx;
> desc = alloc_indirect_packed(total_sg, gfp);
> @@ -1355,7 +1356,8 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
> id = vq->free_head;
> BUG_ON(id == vq->packed.vring.num);
>
> -   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> +   dma_map_internal = !sgs[0]->dma_address;
> +   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
> in_sgs))
> goto err_map;
>
> for (n = 0; n < out_sgs + in_sgs; n++) {
> @@ -1417,6 +1419,8 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
> vq->packed.desc_state[id].data = data;
> vq->packed.desc_state[id].indir_desc = desc;
> vq->packed.desc_state[id].last = id;
> +   vq->packed.desc_state[id].flags = dma_map_internal ? 
> VRING_STATE_F_MAP_INTERNAL : 0;
> +
>
> vq->num_added += 1;
>
> @@ -1426,7 +1430,8 @@ static int virtqueue_add_indirect_packed(struct 
> vring_virtqueue *vq,
> return 0;
>
>  unmap_release:
> -   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> +   if (dma_map_internal)
> +   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
>
>  err_map:
> kfree(desc);
> @@ -1653,7 +1658,7 @@ static void detach_buf_packed(struct vring_virtqueue 
> *vq,
> if (!desc)
> return;
>
> -   if (vq->use_dma_api) {
> +   if (vq->use_dma_api && dma_map_internal) {
> len = vq->packed.desc_extra[id].len;
> for (i = 0; i < len / sizeof(struct 
> vring_packed_desc);
> i++)
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v6 05/11] virtio_ring: packed: support premapped

2023-03-28 Thread Jason Wang

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo  wrote:
>
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
>
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
>
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use
> sg->dma_address, otherwise all must be null when passing it to the APIs
> of virtio.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 9d6acd59e3e0..2afff1dc6c74 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -81,6 +81,7 @@ struct vring_desc_state_packed {
> struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. 
> */
> u16 num;/* Descriptor list length. */
> u16 last;   /* The last desc state in a list. */
> +   u32 flags;  /* State flags. */
>  };
>
>  struct vring_desc_extra {
> @@ -1264,7 +1265,8 @@ static inline u16 packed_last_used(u16 last_used_idx)
>  }
>
>  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> -struct vring_desc_extra *extra)
> +struct vring_desc_extra *extra,
> +bool dma_map_internal)
>  {
> u16 flags;
>
> @@ -1279,6 +1281,9 @@ static void vring_unmap_extra_packed(const struct 
> vring_virtqueue *vq,
>  (flags & VRING_DESC_F_WRITE) ?
>  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> } else {
> +   if (!dma_map_internal)
> +   return;
> +
> dma_unmap_page(vring_dma_dev(vq),
>extra->addr, extra->len,
>(flags & VRING_DESC_F_WRITE) ?
> @@ -1445,6 +1450,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
> unsigned int i, n, c, descs_used;
> __le16 head_flags, flags;
> u16 head, id, prev, curr;
> +   bool dma_map_internal;
> int err;
>
> START_USE(vq);
> @@ -1490,7 +1496,8 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
> id = vq->free_head;
> BUG_ON(id == vq->packed.vring.num);
>
> -   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
> +   dma_map_internal = !sgs[0]->dma_address;
> +   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
> in_sgs)) {
> END_USE(vq);
> return -EIO;
> }
> @@ -1544,6 +1551,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
> *_vq,
> vq->packed.desc_state[id].data = data;
> vq->packed.desc_state[id].indir_desc = ctx;
> vq->packed.desc_state[id].last = prev;
> +   vq->packed.desc_state[id].flags = dma_map_internal ? 
> VRING_STATE_F_MAP_INTERNAL : 0;
>
> /*
>  * A driver MUST NOT make the first descriptor in the list
> @@ -1615,8 +1623,10 @@ static void detach_buf_packed(struct vring_virtqueue 
> *vq,
> struct vring_desc_state_packed *state = NULL;
> struct vring_packed_desc *desc;
> unsigned int i, curr;
> +   bool dma_map_internal;
>
> state = >packed.desc_state[id];
> +   dma_map_internal = !!(state->flags & VRING_STATE_F_MAP_INTERNAL);
>
> /* Clear data ptr. */
> state->data = NULL;
> @@ -1629,7 +1639,8 @@ static void detach_buf_packed(struct vring_virtqueue 
> *vq,
> curr = id;
> for (i = 0; i < state->num; i++) {
> vring_unmap_extra_packed(vq,
> -
> >packed.desc_extra[curr]);
> +>packed.desc_extra[curr],
> +dma_map_internal);
> curr = vq->packed.desc_extra[curr].next;
> }
> }
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost v6 04/11] virtio_ring: split: support premapped

2023-03-28 Thread Jason Wang

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo  wrote:
>
> virtio core only supports virtual addresses, dma is completed in virtio
> core.
>
> In some scenarios (such as the AF_XDP), the memory is allocated
> and DMA mapping is completed in advance, so it is necessary for us to
> support passing the DMA address to virtio core.
>
> Drives can use sg->dma_address to pass the mapped dma address to virtio
> core. If one sg->dma_address is used then all sgs must use
> sg->dma_address, otherwise all must be null when passing it to the APIs
> of virtio.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 31 +++
>  1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 3ada30b475d2..9d6acd59e3e0 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -67,9 +67,13 @@
>  #define LAST_ADD_TIME_INVALID(vq)
>  #endif
>
> +#define VRING_STATE_F_MAP_INTERNAL BIT(0)
> +
>  struct vring_desc_state_split {
> void *data; /* Data for callback. */
> struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> +   u32 flags;  /* State flags. */
> +   u32 padding;
>  };
>
>  struct vring_desc_state_packed {
> @@ -448,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct 
> vring_virtqueue *vq,
>  }
>
>  static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> - unsigned int i)
> + unsigned int i, bool 
> dma_map_internal)
>  {
> struct vring_desc_extra *extra = vq->split.desc_extra;
> u16 flags;
> @@ -465,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct 
> vring_virtqueue *vq,
>  (flags & VRING_DESC_F_WRITE) ?
>  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> } else {
> +   if (!dma_map_internal)
> +   goto out;
> +
> dma_unmap_page(vring_dma_dev(vq),
>extra[i].addr,
>extra[i].len,
> @@ -615,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> struct scatterlist *sg;
> struct vring_desc *desc;
> unsigned int i, n, avail, descs_used, prev;
> -   bool indirect;
> +   bool indirect, dma_map_internal;
> int head;
>
> START_USE(vq);
> @@ -668,7 +675,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> return -ENOSPC;
> }
>
> -   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> +   dma_map_internal = !sgs[0]->dma_address;
> +   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
> in_sgs))
> goto err_map;
>
> for (n = 0; n < out_sgs; n++) {
> @@ -735,6 +743,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> else
> vq->split.desc_state[head].indir_desc = ctx;
>
> +   vq->split.desc_state[head].flags = dma_map_internal ? 
> VRING_STATE_F_MAP_INTERNAL : 0;
> +
> /* Put entry in available array (but don't update avail->idx until 
> they
>  * do sync). */
> avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
> @@ -759,7 +769,8 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> return 0;
>
>  unmap_release:
> -   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> +   if (dma_map_internal)
> +   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
>
>  err_map:
> if (indirect)
> @@ -805,20 +816,22 @@ static void detach_buf_split(struct vring_virtqueue 
> *vq, unsigned int head,
>  {
> unsigned int i, j;
> __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +   bool dma_map_internal;
>
> /* Clear data ptr. */
> vq->split.desc_state[head].data = NULL;
> +   dma_map_internal = !!(vq->split.desc_state[head].flags & 
> VRING_STATE_F_MAP_INTERNAL);
>
> /* Put back on free list: unmap first-level descriptors and find end 
> */
> i = head;
>
> while (vq->split.vring.desc[i].flags & nextflag) {
> -   vring_unmap_one_split(vq, i);
> +   vring_unmap_one_split(vq, i, dma_map_internal);
> i = vq->split.desc_extra[i].next;
> vq->vq.num_free++;
> }
>
> -   vring_unmap_one_split(vq, i);
> +   vring_unmap_one_split(vq, i, dma_map_internal);
> vq->split.desc_extra[i].next = vq->free_head;
> vq->free_head = head;
>
> @@ -840,8 +853,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
> unsigned int head,
> VRING_DESC_F_INDIRECT));
>

Re: [PATCH vhost v6 01/11] virtio_ring: split: separate dma codes

2023-03-28 Thread Jason Wang

On Mon, Mar 27, 2023 at 12:05 PM Xuan Zhuo  wrote:
>
> DMA-related logic is separated from the virtqueue_add_split() to
> one new function. DMA address will be saved as sg->dma_address if
> use_dma_api is true, then virtqueue_add_split() will use it directly.
> Unmap operation will be simpler.
>
> The purpose of this is to facilitate subsequent support to receive
> dma address mapped by drivers.
>
> Signed-off-by: Xuan Zhuo 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 122 +++
>  1 file changed, 94 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 41144b5246a8..2aafb7da793d 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct 
> vring_virtqueue *vq,
> direction);
>  }
>
> +static dma_addr_t vring_sg_address(struct scatterlist *sg)
> +{
> +   if (sg->dma_address)
> +   return sg->dma_address;
> +
> +   return (dma_addr_t)sg_phys(sg);
> +}
> +
>  static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>void *cpu_addr, size_t size,
>enum dma_data_direction direction)
> @@ -520,6 +528,80 @@ static inline unsigned int 
> virtqueue_add_desc_split(struct virtqueue *vq,
> return next;
>  }
>
> +static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
> +   struct scatterlist *sgs[],
> +   unsigned int total_sg,
> +   unsigned int out_sgs,
> +   unsigned int in_sgs)
> +{
> +   struct scatterlist *sg;
> +   unsigned int n;
> +
> +   if (!vq->use_dma_api)
> +   return;
> +
> +   for (n = 0; n < out_sgs; n++) {
> +   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +   if (!sg->dma_address)
> +   return;
> +
> +   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
> +  sg->length, DMA_TO_DEVICE);
> +   }
> +   }
> +
> +   for (; n < (out_sgs + in_sgs); n++) {
> +   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +   if (!sg->dma_address)
> +   return;
> +
> +   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
> +  sg->length, DMA_FROM_DEVICE);
> +   }
> +   }
> +}
> +
> +static int virtqueue_map_sgs(struct vring_virtqueue *vq,
> +struct scatterlist *sgs[],
> +unsigned int total_sg,
> +unsigned int out_sgs,
> +unsigned int in_sgs)
> +{
> +   struct scatterlist *sg;
> +   unsigned int n;
> +
> +   if (!vq->use_dma_api)
> +   return 0;
> +
> +   for (n = 0; n < out_sgs; n++) {
> +   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +   dma_addr_t addr = vring_map_one_sg(vq, sg, 
> DMA_TO_DEVICE);
> +
> +   if (vring_mapping_error(vq, addr))
> +   goto err;
> +
> +   sg->dma_address = addr;
> +   }
> +   }
> +
> +   for (; n < (out_sgs + in_sgs); n++) {
> +   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +   dma_addr_t addr = vring_map_one_sg(vq, sg, 
> DMA_FROM_DEVICE);
> +
> +   if (vring_mapping_error(vq, addr))
> +   goto err;
> +
> +   sg->dma_address = addr;
> +   }
> +   }
> +
> +   return 0;
> +
> +err:
> +   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> +   return -ENOMEM;
> +}
> +
>  static inline int virtqueue_add_split(struct virtqueue *_vq,
>   struct scatterlist *sgs[],
>   unsigned int total_sg,
> @@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> struct vring_virtqueue *vq = to_vvq(_vq);
> struct scatterlist *sg;
> struct vring_desc *desc;
> -   unsigned int i, n, avail, descs_used, prev, err_idx;
> -   int head;
> +   unsigned int i, n, avail, descs_used, prev;
> bool indirect;
> +   int head;
>
> START_USE(vq);
>
> @@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue 
> *_vq,
> return -ENOSPC;
> }
>
> +   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
> +   goto err_map;
> +
> for (n = 0; n < out_sgs; n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -   dma_addr_t addr =

Re: [PATCH vhost v4 01/11] virtio_ring: split: separate dma codes

2023-03-28 Thread Jason Wang

On Tue, Mar 28, 2023 at 2:24 PM Jason Wang  wrote:
>
>
> 在 2023/3/22 10:56, Xuan Zhuo 写道:
> > DMA-related logic is separated from the virtqueue_add_split() to
> > one new function. DMA address will be saved as sg->dma_address if
> > use_dma_api is true, then virtqueue_add_split() will use it directly.
> > Unmap operation will be simpler.
> >
> > The purpose of this is to facilitate subsequent support to receive
> > dma address mapped by drivers.
> >
> > Signed-off-by: Xuan Zhuo 
>
>
> Acked-by: Jason Wang 
>
> Thanks

Please ignore this, hit the button accidentally.

Thanks

>
>
> > ---
> >   drivers/virtio/virtio_ring.c | 121 +++
> >   1 file changed, 93 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 41144b5246a8..fe704ca6c813 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct 
> > vring_virtqueue *vq,
> >   direction);
> >   }
> >
> > +static dma_addr_t vring_sg_address(struct scatterlist *sg)
> > +{
> > + if (sg->dma_address)
> > + return sg->dma_address;
> > +
> > + return (dma_addr_t)sg_phys(sg);
> > +}
> > +
> >   static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> >  void *cpu_addr, size_t size,
> >  enum dma_data_direction direction)
> > @@ -520,6 +528,80 @@ static inline unsigned int 
> > virtqueue_add_desc_split(struct virtqueue *vq,
> >   return next;
> >   }
> >
> > +static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
> > + struct scatterlist *sgs[],
> > + unsigned int total_sg,
> > + unsigned int out_sgs,
> > + unsigned int in_sgs)
> > +{
> > + struct scatterlist *sg;
> > + unsigned int n;
> > +
> > + if (!vq->use_dma_api)
> > + return;
> > +
> > + for (n = 0; n < out_sgs; n++) {
> > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > + if (!sg->dma_address)
> > + return;
> > +
> > + dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
> > +sg->length, DMA_TO_DEVICE);
> > + }
> > + }
> > +
> > + for (; n < (out_sgs + in_sgs); n++) {
> > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > + if (!sg->dma_address)
> > + return;
> > +
> > + dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
> > +sg->length, DMA_FROM_DEVICE);
> > + }
> > + }
> > +}
> > +
> > +static int virtqueue_map_sgs(struct vring_virtqueue *vq,
> > +  struct scatterlist *sgs[],
> > +  unsigned int total_sg,
> > +  unsigned int out_sgs,
> > +  unsigned int in_sgs)
> > +{
> > + struct scatterlist *sg;
> > + unsigned int n;
> > +
> > + if (!vq->use_dma_api)
> > + return 0;
> > +
> > + for (n = 0; n < out_sgs; n++) {
> > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > + dma_addr_t addr = vring_map_one_sg(vq, sg, 
> > DMA_TO_DEVICE);
> > +
> > + if (vring_mapping_error(vq, addr))
> > + goto err;
> > +
> > + sg->dma_address = addr;
> > + }
> > + }
> > +
> > + for (; n < (out_sgs + in_sgs); n++) {
> > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > + dma_addr_t addr = vring_map_one_sg(vq, sg, 
> > DMA_FROM_DEVICE);
> > +
> > + if (vring_mapping_error(vq, addr))
> > + goto err;
> > +
> > + sg->dma_address = addr;
> > + }
> > + }
> > +
> > + return 0;
> > +
> > +err:
> > + virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
> > + return -ENOMEM;
> > +}
> > +
> >   static inline int virtqueue_add_split(struct virtqueue *_vq,
> > struct scatterlist *sgs[],
> > unsigned int total_sg,
> > @@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue 
> > *_vq,
> >   struct vring_virtqueue *vq = to_vvq(_vq);
> >   struct scatterlist *sg;
> >   struct vring_desc *desc;
> > - unsigned int i, n, avail, descs_used, prev, err_idx;
> > - int head;
> > + unsigned int i, n, avail, descs_used, prev;
> >   bool indirect;
> > + int head;
> >
> >   START_USE(vq);
> >
> > @@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct 
> > virtqueue *_vq,
> >   return -ENOSPC;
> >   }
> >
> > + if

Re: [PATCH vhost v4 01/11] virtio_ring: split: separate dma codes

2023-03-28 Thread Jason Wang



在 2023/3/22 10:56, Xuan Zhuo 写道:

DMA-related logic is separated from the virtqueue_add_split() to
one new function. DMA address will be saved as sg->dma_address if
use_dma_api is true, then virtqueue_add_split() will use it directly.
Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 

Thanks



---
  drivers/virtio/virtio_ring.c | 121 +++
  1 file changed, 93 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 41144b5246a8..fe704ca6c813 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
direction);
  }
  
+static dma_addr_t vring_sg_address(struct scatterlist *sg)

+{
+   if (sg->dma_address)
+   return sg->dma_address;
+
+   return (dma_addr_t)sg_phys(sg);
+}
+
  static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
   void *cpu_addr, size_t size,
   enum dma_data_direction direction)
@@ -520,6 +528,80 @@ static inline unsigned int virtqueue_add_desc_split(struct 
virtqueue *vq,
return next;
  }
  
+static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,

+   struct scatterlist *sgs[],
+   unsigned int total_sg,
+   unsigned int out_sgs,
+   unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_TO_DEVICE);
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_FROM_DEVICE);
+   }
+   }
+}
+
+static int virtqueue_map_sgs(struct vring_virtqueue *vq,
+struct scatterlist *sgs[],
+unsigned int total_sg,
+unsigned int out_sgs,
+unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return 0;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   return 0;
+
+err:
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   return -ENOMEM;
+}
+
  static inline int virtqueue_add_split(struct virtqueue *_vq,
  struct scatterlist *sgs[],
  unsigned int total_sg,
@@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
struct vring_desc *desc;
-   unsigned int i, n, avail, descs_used, prev, err_idx;
-   int head;
+   unsigned int i, n, avail, descs_used, prev;
bool indirect;
+   int head;
  
  	START_USE(vq);
  
@@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,

return -ENOSPC;
}
  
+	if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))

+   return -ENOMEM;
+
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
prev = i;
/* Note that we trust indirect descriptor
 * table

Re: [PATCH v4 08/11] vdpa: Add eventfd for the vdpa callback

2023-03-28 Thread Jason Wang



在 2023/3/23 13:30, Xie Yongji 写道:

Add eventfd for the vdpa callback so that user
can signal it directly instead of triggering the
callback. It will be used for vhost-vdpa case.

Signed-off-by: Xie Yongji 



Acked-by: Jason Wang 

Thanks



---
  drivers/vhost/vdpa.c | 2 ++
  drivers/virtio/virtio_vdpa.c | 1 +
  include/linux/vdpa.h | 6 ++
  3 files changed, 9 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..9cd878e25cff 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -599,9 +599,11 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, 
unsigned int cmd,
if (vq->call_ctx.ctx) {
cb.callback = vhost_vdpa_virtqueue_cb;
cb.private = vq;
+   cb.trigger = vq->call_ctx.ctx;
} else {
cb.callback = NULL;
cb.private = NULL;
+   cb.trigger = NULL;
}
ops->set_vq_cb(vdpa, idx, );
vhost_vdpa_setup_vq_irq(v, idx);
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index f3826f42b704..2a095f37f26b 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -196,6 +196,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
int index,
/* Setup virtqueue callback */
cb.callback = callback ? virtio_vdpa_virtqueue_cb : NULL;
cb.private = info;
+   cb.trigger = NULL;
ops->set_vq_cb(vdpa, index, );
ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq));
  
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h

index e52c9a37999c..0372b2e3d38a 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -13,10 +13,16 @@
   * struct vdpa_calllback - vDPA callback definition.
   * @callback: interrupt callback function
   * @private: the data passed to the callback function
+ * @trigger: the eventfd for the callback (Optional).
+ *   When it is set, the vDPA driver must guarantee that
+ *   signaling it is functional equivalent to triggering
+ *   the callback. Then vDPA parent can signal it directly
+ *   instead of triggering the callback.
   */
  struct vdpa_callback {
irqreturn_t (*callback)(void *data);
void *private;
+   struct eventfd_ctx *trigger;
  };
  
  /**


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 07/11] vduse: Add sysfs interface for irq callback affinity

2023-03-28 Thread Jason Wang



在 2023/3/23 13:30, Xie Yongji 写道:

Add sysfs interface for each vduse virtqueue to
get/set the affinity for irq callback. This might
be useful for performance tuning when the irq callback
affinity mask contains more than one CPU.

Signed-off-by: Xie Yongji 



Acked-by: Jason Wang 

Thanks



---
  drivers/vdpa/vdpa_user/vduse_dev.c | 124 ++---
  1 file changed, 113 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index cefabd0dab9c..77da3685568a 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -61,6 +61,7 @@ struct vduse_virtqueue {
struct work_struct kick;
int irq_effective_cpu;
struct cpumask irq_affinity;
+   struct kobject kobj;
  };
  
  struct vduse_dev;

@@ -1387,6 +1388,96 @@ static const struct file_operations vduse_dev_fops = {
.llseek = noop_llseek,
  };
  
+static ssize_t irq_cb_affinity_show(struct vduse_virtqueue *vq, char *buf)

+{
+   return sprintf(buf, "%*pb\n", cpumask_pr_args(>irq_affinity));
+}
+
+static ssize_t irq_cb_affinity_store(struct vduse_virtqueue *vq,
+const char *buf, size_t count)
+{
+   cpumask_var_t new_value;
+   int ret;
+
+   if (!zalloc_cpumask_var(_value, GFP_KERNEL))
+   return -ENOMEM;
+
+   ret = cpumask_parse(buf, new_value);
+   if (ret)
+   goto free_mask;
+
+   ret = -EINVAL;
+   if (!cpumask_intersects(new_value, cpu_online_mask))
+   goto free_mask;
+
+   cpumask_copy(>irq_affinity, new_value);
+   ret = count;
+free_mask:
+   free_cpumask_var(new_value);
+   return ret;
+}
+
+struct vq_sysfs_entry {
+   struct attribute attr;
+   ssize_t (*show)(struct vduse_virtqueue *vq, char *buf);
+   ssize_t (*store)(struct vduse_virtqueue *vq, const char *buf,
+size_t count);
+};
+
+static struct vq_sysfs_entry irq_cb_affinity_attr = __ATTR_RW(irq_cb_affinity);
+
+static struct attribute *vq_attrs[] = {
+   _cb_affinity_attr.attr,
+   NULL,
+};
+ATTRIBUTE_GROUPS(vq);
+
+static ssize_t vq_attr_show(struct kobject *kobj, struct attribute *attr,
+   char *buf)
+{
+   struct vduse_virtqueue *vq = container_of(kobj,
+   struct vduse_virtqueue, kobj);
+   struct vq_sysfs_entry *entry = container_of(attr,
+   struct vq_sysfs_entry, attr);
+
+   if (!entry->show)
+   return -EIO;
+
+   return entry->show(vq, buf);
+}
+
+static ssize_t vq_attr_store(struct kobject *kobj, struct attribute *attr,
+const char *buf, size_t count)
+{
+   struct vduse_virtqueue *vq = container_of(kobj,
+   struct vduse_virtqueue, kobj);
+   struct vq_sysfs_entry *entry = container_of(attr,
+   struct vq_sysfs_entry, attr);
+
+   if (!entry->store)
+   return -EIO;
+
+   return entry->store(vq, buf, count);
+}
+
+static const struct sysfs_ops vq_sysfs_ops = {
+   .show = vq_attr_show,
+   .store = vq_attr_store,
+};
+
+static void vq_release(struct kobject *kobj)
+{
+   struct vduse_virtqueue *vq = container_of(kobj,
+   struct vduse_virtqueue, kobj);
+   kfree(vq);
+}
+
+static const struct kobj_type vq_type = {
+   .release= vq_release,
+   .sysfs_ops  = _sysfs_ops,
+   .default_groups = vq_groups,
+};
+
  static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
  {
int i;
@@ -1395,13 +1486,13 @@ static void vduse_dev_deinit_vqs(struct vduse_dev *dev)
return;
  
  	for (i = 0; i < dev->vq_num; i++)

-   kfree(dev->vqs[i]);
+   kobject_put(>vqs[i]->kobj);
kfree(dev->vqs);
  }
  
  static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 vq_align, u32 vq_num)

  {
-   int i;
+   int ret, i;
  
  	dev->vq_align = vq_align;

dev->vq_num = vq_num;
@@ -1411,8 +1502,10 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, u32 
vq_align, u32 vq_num)
  
  	for (i = 0; i < vq_num; i++) {

dev->vqs[i] = kzalloc(sizeof(*dev->vqs[i]), GFP_KERNEL);
-   if (!dev->vqs[i])
+   if (!dev->vqs[i]) {
+   ret = -ENOMEM;
goto err;
+   }
  
  		dev->vqs[i]->index = i;

dev->vqs[i]->irq_effective_cpu = IRQ_UNBOUND;
@@ -1421,15 +1514,23 @@ static int vduse_dev_init_vqs(struct vduse_dev *dev, 
u32 vq_align, u32 vq_num)
spin_lock_init(>vqs[i]->kick_lock);
spin_lock_init(>vqs[i]->irq_lock);
cpumask_setall(>vqs[i]->irq_affinity);
+
+   kobject_init(>vqs[i]->kobj, _type);
+   ret = kobject_add(>vqs[i]->kobj,
+

Re: [PATCH v4 06/11] vduse: Support get_vq_affinity callback

2023-03-28 Thread Jason Wang



在 2023/3/23 13:30, Xie Yongji 写道:

This implements get_vq_affinity callback so that
the virtio-blk driver can build the blk-mq queues
based on the irq callback affinity.

Signed-off-by: Xie Yongji 



Acked-by: Jason Wang 

Thanks



---
  drivers/vdpa/vdpa_user/vduse_dev.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 45aa8703c4b5..cefabd0dab9c 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -722,6 +722,14 @@ static int vduse_vdpa_set_vq_affinity(struct vdpa_device 
*vdpa, u16 idx,
return 0;
  }
  
+static const struct cpumask *

+vduse_vdpa_get_vq_affinity(struct vdpa_device *vdpa, u16 idx)
+{
+   struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+   return >vqs[idx]->irq_affinity;
+}
+
  static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
unsigned int asid,
struct vhost_iotlb *iotlb)
@@ -773,6 +781,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = 
{
.set_config = vduse_vdpa_set_config,
.get_generation = vduse_vdpa_get_generation,
.set_vq_affinity= vduse_vdpa_set_vq_affinity,
+   .get_vq_affinity= vduse_vdpa_get_vq_affinity,
.reset  = vduse_vdpa_reset,
.set_map= vduse_vdpa_set_map,
.free   = vduse_vdpa_free,


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 05/11] vduse: Support set_vq_affinity callback

2023-03-28 Thread Jason Wang



在 2023/3/23 13:30, Xie Yongji 写道:

Since virtio-vdpa bus driver already support interrupt
affinity spreading mechanism, let's implement the
set_vq_affinity callback to bring it to vduse device.
After we get the virtqueue's affinity, we can spread
IRQs between CPUs in the affinity mask, in a round-robin
manner, to run the irq callback.

Signed-off-by: Xie Yongji 



Acked-by: Jason Wang 

Thanks



---
  drivers/vdpa/vdpa_user/vduse_dev.c | 61 ++
  1 file changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 98359d87a06f..45aa8703c4b5 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -41,6 +41,8 @@
  #define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
  #define VDUSE_MSG_DEFAULT_TIMEOUT 30
  
+#define IRQ_UNBOUND -1

+
  struct vduse_virtqueue {
u16 index;
u16 num_max;
@@ -57,6 +59,8 @@ struct vduse_virtqueue {
struct vdpa_callback cb;
struct work_struct inject;
struct work_struct kick;
+   int irq_effective_cpu;
+   struct cpumask irq_affinity;
  };
  
  struct vduse_dev;

@@ -128,6 +132,7 @@ static struct class *vduse_class;
  static struct cdev vduse_ctrl_cdev;
  static struct cdev vduse_cdev;
  static struct workqueue_struct *vduse_irq_wq;
+static struct workqueue_struct *vduse_irq_bound_wq;
  
  static u32 allowed_device_id[] = {

VIRTIO_ID_BLOCK,
@@ -708,6 +713,15 @@ static u32 vduse_vdpa_get_generation(struct vdpa_device 
*vdpa)
return dev->generation;
  }
  
+static int vduse_vdpa_set_vq_affinity(struct vdpa_device *vdpa, u16 idx,

+ const struct cpumask *cpu_mask)
+{
+   struct vduse_dev *dev = vdpa_to_vduse(vdpa);
+
+   cpumask_copy(>vqs[idx]->irq_affinity, cpu_mask);
+   return 0;
+}
+
  static int vduse_vdpa_set_map(struct vdpa_device *vdpa,
unsigned int asid,
struct vhost_iotlb *iotlb)
@@ -758,6 +772,7 @@ static const struct vdpa_config_ops vduse_vdpa_config_ops = 
{
.get_config = vduse_vdpa_get_config,
.set_config = vduse_vdpa_set_config,
.get_generation = vduse_vdpa_get_generation,
+   .set_vq_affinity= vduse_vdpa_set_vq_affinity,
.reset  = vduse_vdpa_reset,
.set_map= vduse_vdpa_set_map,
.free   = vduse_vdpa_free,
@@ -917,7 +932,8 @@ static void vduse_vq_irq_inject(struct work_struct *work)
  }
  
  static int vduse_dev_queue_irq_work(struct vduse_dev *dev,

-   struct work_struct *irq_work)
+   struct work_struct *irq_work,
+   int irq_effective_cpu)
  {
int ret = -EINVAL;
  
@@ -926,7 +942,11 @@ static int vduse_dev_queue_irq_work(struct vduse_dev *dev,

goto unlock;
  
  	ret = 0;

-   queue_work(vduse_irq_wq, irq_work);
+   if (irq_effective_cpu == IRQ_UNBOUND)
+   queue_work(vduse_irq_wq, irq_work);
+   else
+   queue_work_on(irq_effective_cpu,
+ vduse_irq_bound_wq, irq_work);
  unlock:
up_read(>rwsem);
  
@@ -1029,6 +1049,22 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,

return ret;
  }
  
+static void vduse_vq_update_effective_cpu(struct vduse_virtqueue *vq)

+{
+   int curr_cpu = vq->irq_effective_cpu;
+
+   while (true) {
+   curr_cpu = cpumask_next(curr_cpu, >irq_affinity);
+   if (cpu_online(curr_cpu))
+   break;
+
+   if (curr_cpu >= nr_cpu_ids)
+   curr_cpu = IRQ_UNBOUND;
+   }
+
+   vq->irq_effective_cpu = curr_cpu;
+}
+
  static long vduse_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
  {
@@ -,7 +1147,7 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
int cmd,
break;
}
case VDUSE_DEV_INJECT_CONFIG_IRQ:
-   ret = vduse_dev_queue_irq_work(dev, >inject);
+   ret = vduse_dev_queue_irq_work(dev, >inject, IRQ_UNBOUND);
break;
case VDUSE_VQ_SETUP: {
struct vduse_vq_config config;
@@ -1198,7 +1234,10 @@ static long vduse_dev_ioctl(struct file *file, unsigned 
int cmd,
break;
  
  		index = array_index_nospec(index, dev->vq_num);

-   ret = vduse_dev_queue_irq_work(dev, >vqs[index]->inject);
+
+   vduse_vq_update_effective_cpu(dev->vqs[index]);
+   ret = vduse_dev_queue_irq_work(dev, >vqs[index]->inject,
+   dev->vqs[index]->irq_effective_cpu);
break;
}
case VDUSE_IOTLB_REG_UMEM: {
@@ -1367,10 +1406,12 @@ static int vduse_dev_init_vqs(struct vduse_dev

Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

2023-03-28 Thread Jason Wang



在 2023/3/24 17:12, Michael S. Tsirkin 写道:

On Fri, Mar 24, 2023 at 02:27:52PM +0800, Jason Wang wrote:

On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji  wrote:

To support interrupt affinity spreading mechanism,
this makes use of group_cpus_evenly() to create
an irq callback affinity mask for each virtqueue
of vdpa device. Then we will unify set_vq_affinity
callback to pass the affinity to the vdpa device driver.

Signed-off-by: Xie Yongji 

Thinking hard of all the logics, I think I've found something interesting.

Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries to
pass irq_affinity to transport specific find_vqs().  This seems a
layer violation since driver has no knowledge of

1) whether or not the callback is based on an IRQ
2) whether or not the device is a PCI or not (the details are hided by
the transport driver)
3) how many vectors could be used by a device

This means the driver can't actually pass a real affinity masks so the
commit passes a zero irq affinity structure as a hint in fact, so the
PCI layer can build a default affinity based that groups cpus evenly
based on the number of MSI-X vectors (the core logic is the
group_cpus_evenly). I think we should fix this by replacing the
irq_affinity structure with

1) a boolean like auto_cb_spreading

or

2) queue to cpu mapping

So each transport can do its own logic based on that. Then virtio-vDPA
can pass that policy to VDUSE where we only need a group_cpus_evenly()
and avoid duplicating irq_create_affinity_masks()?

Thanks

I don't really understand what you propose. Care to post a patch?



I meant to avoid passing irq_affinity structure in find_vqs but an array 
of boolean telling us whether or not the vq requires a automatic 
spreading of callbacks. But it seems less flexible.




Also does it have to block this patchset or can it be done on top?



We can leave it in the future.

So

Acked-by: Jason Wang 

Thanks





---
  drivers/virtio/virtio_vdpa.c | 68 
  1 file changed, 68 insertions(+)

diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index f72696b4c1c2..f3826f42b704 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -13,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -272,6 +273,66 @@ static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
 virtio_vdpa_del_vq(vq);
  }

+static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs)
+{
+   affd->nr_sets = 1;
+   affd->set_size[0] = affvecs;
+}
+
+static struct cpumask *
+create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
+{
+   unsigned int affvecs = 0, curvec, usedvecs, i;
+   struct cpumask *masks = NULL;
+
+   if (nvecs > affd->pre_vectors + affd->post_vectors)
+   affvecs = nvecs - affd->pre_vectors - affd->post_vectors;
+
+   if (!affd->calc_sets)
+   affd->calc_sets = default_calc_sets;
+
+   affd->calc_sets(affd, affvecs);
+
+   if (!affvecs)
+   return NULL;
+
+   masks = kcalloc(nvecs, sizeof(*masks), GFP_KERNEL);
+   if (!masks)
+   return NULL;
+
+   /* Fill out vectors at the beginning that don't need affinity */
+   for (curvec = 0; curvec < affd->pre_vectors; curvec++)
+   cpumask_setall([curvec]);
+
+   for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) {
+   unsigned int this_vecs = affd->set_size[i];
+   int j;
+   struct cpumask *result = group_cpus_evenly(this_vecs);
+
+   if (!result) {
+   kfree(masks);
+   return NULL;
+   }
+
+   for (j = 0; j < this_vecs; j++)
+   cpumask_copy([curvec + j], [j]);
+   kfree(result);
+
+   curvec += this_vecs;
+   usedvecs += this_vecs;
+   }
+
+   /* Fill out vectors at the end that don't need affinity */
+   if (usedvecs >= affvecs)
+   curvec = affd->pre_vectors + affvecs;
+   else
+   curvec = affd->pre_vectors + usedvecs;
+   for (; curvec < nvecs; curvec++)
+   cpumask_setall([curvec]);
+
+   return masks;
+}
+
  static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
 struct virtqueue *vqs[],
 vq_callback_t *callbacks[],
@@ -282,9 +343,15 @@ static int virtio_vdpa_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
 struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
 struct vdpa_device *vdpa = vd_get_vdpa(vdev);
 const struct vdpa_config_ops *ops = vdpa->config;
+   struct irq_affinity default_affd = { 0 };
+   struct cpumask *masks;
 struct vdpa_callback cb;
 int i, err, queue_idx = 0;

+   masks = create_affinity_masks(nvqs,

Re: [PATCH v4 03/11] virtio-vdpa: Support interrupt affinity spreading mechanism

2023-03-28 Thread Jason Wang

On Tue, Mar 28, 2023 at 12:05 PM Yongji Xie  wrote:
>
> On Tue, Mar 28, 2023 at 11:44 AM Jason Wang  wrote:
> >
> > On Tue, Mar 28, 2023 at 11:33 AM Yongji Xie  wrote:
> > >
> > > On Tue, Mar 28, 2023 at 11:14 AM Jason Wang  wrote:
> > > >
> > > > On Tue, Mar 28, 2023 at 11:03 AM Yongji Xie  
> > > > wrote:
> > > > >
> > > > > On Fri, Mar 24, 2023 at 2:28 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Mar 23, 2023 at 1:31 PM Xie Yongji 
> > > > > >  wrote:
> > > > > > >
> > > > > > > To support interrupt affinity spreading mechanism,
> > > > > > > this makes use of group_cpus_evenly() to create
> > > > > > > an irq callback affinity mask for each virtqueue
> > > > > > > of vdpa device. Then we will unify set_vq_affinity
> > > > > > > callback to pass the affinity to the vdpa device driver.
> > > > > > >
> > > > > > > Signed-off-by: Xie Yongji 
> > > > > >
> > > > > > Thinking hard of all the logics, I think I've found something 
> > > > > > interesting.
> > > > > >
> > > > > > Commit ad71473d9c437 ("virtio_blk: use virtio IRQ affinity") tries 
> > > > > > to
> > > > > > pass irq_affinity to transport specific find_vqs().  This seems a
> > > > > > layer violation since driver has no knowledge of
> > > > > >
> > > > > > 1) whether or not the callback is based on an IRQ
> > > > > > 2) whether or not the device is a PCI or not (the details are hided 
> > > > > > by
> > > > > > the transport driver)
> > > > > > 3) how many vectors could be used by a device
> > > > > >
> > > > > > This means the driver can't actually pass a real affinity masks so 
> > > > > > the
> > > > > > commit passes a zero irq affinity structure as a hint in fact, so 
> > > > > > the
> > > > > > PCI layer can build a default affinity based that groups cpus evenly
> > > > > > based on the number of MSI-X vectors (the core logic is the
> > > > > > group_cpus_evenly). I think we should fix this by replacing the
> > > > > > irq_affinity structure with
> > > > > >
> > > > > > 1) a boolean like auto_cb_spreading
> > > > > >
> > > > > > or
> > > > > >
> > > > > > 2) queue to cpu mapping
> > > > > >
> > > > >
> > > > > But only the driver knows which queues are used in the control path
> > > > > which don't need the automatic irq affinity assignment.
> > > >
> > > > Is this knowledge awarded by the transport driver now?
> > > >
> > >
> > > This knowledge is awarded by the device driver rather than the transport 
> > > driver.
> > >
> > > E.g. virtio-scsi uses:
> > >
> > > struct irq_affinity desc = { .pre_vectors = 2 }; // vq0 is control
> > > queue, vq1 is event queue
> >
> > Ok, but it only works as a hint, it's not a real affinity. As replied,
> > we can pass an array of boolean in this case then transport driver
> > knows it doesn't need to use automatic affinity for the first two
> > queues.
> >
>
> But we don't know whether we would use other fields in structure
> irq_affinity in the future. So a full set should be better?

Good point. So the issue is the calc_sets() and we probably need that
if there's a virtio driver that needs more than one set of vectors
that needs to be spreaded. Technically, we could have a virtio level
abstraction for this but I agree it's probably not worth bothering
now.

>
> > >
> > > > E.g virtio-blk uses:
> > > >
> > > > struct irq_affinity desc = { 0, };
> > > >
> > > > Atleast we can tell the transport driver which vq requires automatic
> > > > irq affinity.
> > > >
> > >
> > > I think that is what the current implementation does.
> > >
> > > > > So I think the
> > > > > irq_affinity structure can only be created by device drivers and
> > > > > passed to the virtio-pci/virtio-vdpa driver.
> > > >
> > > > This could be not easy since the driver doesn't even know how many
> > > > interrupts will be used by the transport driver, so it can't built the
> > > > actual affinity structure.
> > > >
> > >
> > > The actual affinity mask is built by the transport driver,
> >
> > For PCI yes, it talks directly to the IRQ subsystems.
> >
> > > device
> > > driver only passes a hint on which queues don't need the automatic irq
> > > affinity assignment.
> >
> > But not for virtio-vDPA since the IRQ needs to be dealt with by the
> > parent driver. For our case, it's the VDUSE where it doesn't need IRQ
> > at all, a queue to cpu mapping is sufficient.
> >
>
> The device driver doesn't know whether it is binded to virtio-pci or
> virtio-vdpa. So it should pass a full set needed by the automatic irq
> affinity assignment instead of a subset. Then virtio-vdpa can choose
> to pass a queue to cpu mapping to VDUSE, which is what we do now (use
> set_vq_affinity()).

Yes, so basically two ways:

1) automatic IRQ management, passing affd to find_vqs(), affinity was
determined by the transport (e.g vDPA).
2) affinity that is under the control of the driver, it needs to use
set_vq_affinity() but need to deal with cpu hotplug stuffs.

Thanks

>
> Thanks,
> Yongji
>

57 matches

Mail list logo