Re: [PATCH vhost 00/22] virtio-net: support AF_XDP zero copy

2023-10-12 Thread Jason Wang
On Thu, Oct 12, 2023 at 9:58 AM Xuan Zhuo  wrote:
>
> On Wed, 11 Oct 2023 10:00:57 -0700, Jakub Kicinski  wrote:
> > On Wed, 11 Oct 2023 17:27:06 +0800 Xuan Zhuo wrote:
> > > ## AF_XDP
> > >
> > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The 
> > > zero
> > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > support
> > > this feature, This patch set allows virtio-net to support xsk's zerocopy 
> > > xmit
> > > feature.
> >
> > You're moving the driver and adding a major feature.
> > This really needs to go via net or bpf.
> > If you have dependencies in other trees please wait for
> > after the merge window.
>
>
> If so, I can remove the first two commits.
>
> Then, the sq uses the premapped mode by default.
> And we can use the api virtqueue_dma_map_single_attrs to replace the
> virtqueue_dma_map_page_attrs.
>
> And then I will fix that on the top.
>
> Hi Micheal and Jason, is that ok for you?

I would go with what looks easy for you but I think Jakub wants the
series to go with next-next (this is what we did in the past for
networking specific features that is done in virtio-net). So we need
to tweak the prefix to use net-next instead of vhost.

Thanks


>
> Thanks.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Report a possible vhost bug in stable branches

2023-10-12 Thread Jason Wang
On Thu, Oct 12, 2023 at 9:43 AM Xianting Tian
 wrote:
>
> cgroup attach work and dev flush work will both be added to dev work
> list in vhost_attach_cgroups() when set dev owner:
>  static int vhost_attach_cgroups(struct vhost_dev *dev)
>  {
>  struct vhost_attach_cgroups_struct attach;
>
>  attach.owner = current;
>  vhost_work_init(&attach.work,
> vhost_attach_cgroups_work);
>  vhost_work_queue(dev, &attach.work); // add cgroup
> attach work
>  vhost_work_dev_flush(dev);   // add dev
> flush work
>  return attach.ret;
>  }
>
>And dev kworker will be waken up to handle the two works in
> vhost_worker():
>  node = llist_del_all(&dev->work_list);
>  node = llist_reverse_order(node);
>  llist_for_each_entry_safe{
>  work->fn(work);
>  }
>
>As the list is reversed before processing in vhost_worker(), so it is
> possible
>that dev flush work is processed before cgroup attach work.

This sounds weird. It's llist not list so when adding the new entry
was added to the head that why we need llist_reverse_order() to
recover the order.

 Have you ever reproduced these issues?

Thanks

> If so,
> vhost_attach_cgroups
>may return "attach.ret" before cgroup attach work is handled, but
> "attach.ret" is random
>value as it is in stack.
>
> The possible fix maybe:
>
> static int vhost_attach_cgroups(struct vhost_dev *dev)
> {
>  struct vhost_attach_cgroups_struct attach;
>
>  attach.ret = 0;
>  attach.owner = current;
>  vhost_work_init(&attach.work, vhost_attach_cgroups_work);
>  vhost_work_queue(dev, &attach.work);
>  vhost_work_dev_flush(dev);
>  return attach.ret;
> }
>
>   So this fix is just to initialize the attach.ret to 0, this fix may
> not the final fix,
>   We just want you experts know this issue exists, and we met it
> recently in our test.
>
> And the issue exists in may stable branches.
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-10-12 Thread Jason Wang
On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
>
> Now, virtio-net already supports per-queue moderation parameter
> setting. Based on this, we use the netdim library of linux to support
> dynamic coalescing moderation for virtio-net.
>
> Due to hardware scheduling issues, we only tested rx dim.

Do you have PPS numbers? And TX numbers are also important as the
throughput could be misleading due to various reasons.

Thanks

>
> @Test env
> rxq0 has affinity to cpu0.
>
> @Test cmd
> client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> server: taskset -c 0 sockperf sr --tcp
>
> @Test res
> The second column is the ratio of the result returned by client
> when rx dim is enabled to the result returned by client when
> rx dim is disabled.
> --
> | msg_size |  rx_dim=on / rx_dim=off |
> --
> |   14B| + 3%|
> --
> |   100B   | + 16%   |
> --
> |   500B   | + 25%   |
> --
> |   1400B  | + 28%   |
> --
> |   2048B  | + 22%   |
> --
> |   4096B  | + 5%|
> --
>
> ---
> This patch set was part of the previous netdim patch set[1].
> [1] was split into a merged bugfix set[2] and the current set.
> The previous relevant commentators have been Cced.
>
> [1] 
> https://lore.kernel.org/all/20230811065512.22190-1-hen...@linux.alibaba.com/
> [2] https://lore.kernel.org/all/cover.1696745452.git.hen...@linux.alibaba.com/
>
> Heng Qi (5):
>   virtio-net: returns whether napi is complete
>   virtio-net: separate rx/tx coalescing moderation cmds
>   virtio-net: extract virtqueue coalescig cmd for reuse
>   virtio-net: support rx netdim
>   virtio-net: support tx netdim
>
>  drivers/net/virtio_net.c | 394 ---
>  1 file changed, 322 insertions(+), 72 deletions(-)
>
> --
> 2.19.1.6.gb485710b
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost 00/22] virtio-net: support AF_XDP zero copy

2023-10-12 Thread Xuan Zhuo
On Thu, 12 Oct 2023 15:50:13 +0800, Jason Wang  wrote:
> On Thu, Oct 12, 2023 at 9:58 AM Xuan Zhuo  wrote:
> >
> > On Wed, 11 Oct 2023 10:00:57 -0700, Jakub Kicinski  wrote:
> > > On Wed, 11 Oct 2023 17:27:06 +0800 Xuan Zhuo wrote:
> > > > ## AF_XDP
> > > >
> > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The 
> > > > zero
> > > > copy feature of xsk (XDP socket) needs to be supported by the driver. 
> > > > The
> > > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > > support
> > > > this feature, This patch set allows virtio-net to support xsk's 
> > > > zerocopy xmit
> > > > feature.
> > >
> > > You're moving the driver and adding a major feature.
> > > This really needs to go via net or bpf.
> > > If you have dependencies in other trees please wait for
> > > after the merge window.
> >
> >
> > If so, I can remove the first two commits.
> >
> > Then, the sq uses the premapped mode by default.
> > And we can use the api virtqueue_dma_map_single_attrs to replace the
> > virtqueue_dma_map_page_attrs.
> >
> > And then I will fix that on the top.
> >
> > Hi Micheal and Jason, is that ok for you?
>
> I would go with what looks easy for you but I think Jakub wants the
> series to go with next-next (this is what we did in the past for
> networking specific features that is done in virtio-net). So we need
> to tweak the prefix to use net-next instead of vhost.

OK.

I will fix that in next version.

Thanks.

>
> Thanks
>
>
> >
> > Thanks.
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vhost 21/22] virtio_net: update tx timeout record

2023-10-12 Thread Michael S. Tsirkin
On Wed, Oct 11, 2023 at 05:27:27PM +0800, Xuan Zhuo wrote:
> If send queue sent some packets, we update the tx timeout
> record to prevent the tx timeout.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/xsk.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> index 7abd46bb0e3d..e605f860edb6 100644
> --- a/drivers/net/virtio/xsk.c
> +++ b/drivers/net/virtio/xsk.c
> @@ -274,6 +274,16 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, struct 
> xsk_buff_pool *pool,
>  
>   virtnet_xsk_check_queue(sq);
>  
> + if (stats.packets) {
> + struct netdev_queue *txq;
> + struct virtnet_info *vi;
> +
> + vi = sq->vq->vdev->priv;
> +
> + txq = netdev_get_tx_queue(vi->dev, sq - vi->sq);
> + txq_trans_cond_update(txq);
> + }
> +
>   u64_stats_update_begin(&sq->stats.syncp);
>   sq->stats.packets += stats.packets;
>   sq->stats.bytes += stats.bytes;

I don't get what this is doing. Is there some kind of race here you
are trying to address? And what introduced the race?

> -- 
> 2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 08/22] virtio_net: virtnet_poll_tx support rescheduled

2023-10-12 Thread Michael S. Tsirkin
On Wed, Oct 11, 2023 at 05:27:14PM +0800, Xuan Zhuo wrote:
> virtnet_poll_tx() support to return budget when busy to be rescheduled.
> 
> When retval < budget, napi_poll() in dev.c will exit directly. And
> virtqueue_napi_complete() will be called to close napi.
> 
> When retval == budget, the napi_poll() in dev.c will re-add napi to the
> queue.
> 
> The purpose of this patch is to support xsk xmit in virtio_poll_tx() for
> subsequent patch.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio/main.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index bcfd31a55076..f32cfa189972 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -1976,6 +1976,7 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
>   struct virtnet_info *vi = sq->vq->vdev->priv;
>   unsigned int index = vq2txq(sq->vq);
>   struct netdev_queue *txq;
> + int busy = 0;
>   int opaque;
>   bool done;
>  
> @@ -1993,6 +1994,11 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
> int budget)
>   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   netif_tx_wake_queue(txq);
>  
> + if (busy) {
> + __netif_tx_unlock(txq);
> + return budget;
> + }
> +
>   opaque = virtqueue_enable_cb_prepare(sq->vq);
>  
>   done = napi_complete_done(napi, 0);

This just adds a bit of dead code.
Pls just squash into that patch.

> -- 
> 2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 01/22] virtio_ring: virtqueue_set_dma_premapped support disable

2023-10-12 Thread Michael S. Tsirkin
On Wed, Oct 11, 2023 at 05:27:07PM +0800, Xuan Zhuo wrote:
> virtqueue_set_dma_premapped() adds a new parameter to disable the
> virtqueue premapped mode.
> 
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c |  2 +-
>  drivers/virtio/virtio_ring.c | 11 ---
>  include/linux/virtio.h   |  2 +-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fe7f314d65c9..6b5f47ebf9b2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -737,7 +737,7 @@ static void virtnet_rq_set_premapped(struct virtnet_info 
> *vi)
>   return;
>  
>   for (i = 0; i < vi->max_queue_pairs; i++) {
> - if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> + if (virtqueue_set_dma_premapped(vi->rq[i].vq, true))
>   continue;
>  
>   vi->rq[i].do_dma = true;
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 51d8f3299c10..b3ded56722f4 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2784,7 +2784,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
>   * 0: success.
>   * -EINVAL: vring does not use the dma api, so we can not enable premapped 
> mode.
>   */
> -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
>  {
>   struct vring_virtqueue *vq = to_vvq(_vq);
>   u32 num;
> @@ -2803,8 +2803,13 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
>   return -EINVAL;
>   }
>  
> - vq->premapped = true;
> - vq->do_unmap = false;
> + if (mode) {
> + vq->premapped = true;
> + vq->do_unmap = false;
> + } else {
> + vq->premapped = false;
> + vq->do_unmap = vq->use_dma_api;
> + }
>  
>   END_USE(vq);
>  
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 4cc614a38376..1cf7b004348b 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -81,7 +81,7 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>  
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>  
> -int virtqueue_set_dma_premapped(struct virtqueue *_vq);
> +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode);
>  
>  bool virtqueue_poll(struct virtqueue *vq, unsigned);

Wait a sec I thought we never change premapped. If you make this
dynamic don't you need a bunch of locking?
Or maybe queue is empty when you change this?
If yes pls add a bunch of BUG_ON checks to make sure this is not misused.


> -- 
> 2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 21/22] virtio_net: update tx timeout record

2023-10-12 Thread Xuan Zhuo
On Thu, 12 Oct 2023 05:10:55 -0400, "Michael S. Tsirkin"  
wrote:
> On Wed, Oct 11, 2023 at 05:27:27PM +0800, Xuan Zhuo wrote:
> > If send queue sent some packets, we update the tx timeout
> > record to prevent the tx timeout.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/net/virtio/xsk.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> > index 7abd46bb0e3d..e605f860edb6 100644
> > --- a/drivers/net/virtio/xsk.c
> > +++ b/drivers/net/virtio/xsk.c
> > @@ -274,6 +274,16 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, struct 
> > xsk_buff_pool *pool,
> >
> > virtnet_xsk_check_queue(sq);
> >
> > +   if (stats.packets) {
> > +   struct netdev_queue *txq;
> > +   struct virtnet_info *vi;
> > +
> > +   vi = sq->vq->vdev->priv;
> > +
> > +   txq = netdev_get_tx_queue(vi->dev, sq - vi->sq);
> > +   txq_trans_cond_update(txq);
> > +   }
> > +
> > u64_stats_update_begin(&sq->stats.syncp);
> > sq->stats.packets += stats.packets;
> > sq->stats.bytes += stats.bytes;
>
> I don't get what this is doing. Is there some kind of race here you
> are trying to address? And what introduced the race?


Because the xsk xmit shares the send queue with the kernel xmit,
then when I do benchmark, the xsk will always use the send queue,
so the kernel may have no chance to do xmit, the tx watchdog
thinks that the send queue is hang and prints tx timeout log.

So I call the txq_trans_cond_update() to tell the tx watchdog
that the send queue is working.

Thanks.


>
> > --
> > 2.32.0.3.g01195cf9f
>
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 01/22] virtio_ring: virtqueue_set_dma_premapped support disable

2023-10-12 Thread Xuan Zhuo
On Thu, 12 Oct 2023 05:15:52 -0400, "Michael S. Tsirkin"  
wrote:
> On Wed, Oct 11, 2023 at 05:27:07PM +0800, Xuan Zhuo wrote:
> > virtqueue_set_dma_premapped() adds a new parameter to disable the
> > virtqueue premapped mode.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/net/virtio_net.c |  2 +-
> >  drivers/virtio/virtio_ring.c | 11 ---
> >  include/linux/virtio.h   |  2 +-
> >  3 files changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index fe7f314d65c9..6b5f47ebf9b2 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -737,7 +737,7 @@ static void virtnet_rq_set_premapped(struct 
> > virtnet_info *vi)
> > return;
> >
> > for (i = 0; i < vi->max_queue_pairs; i++) {
> > -   if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > +   if (virtqueue_set_dma_premapped(vi->rq[i].vq, true))
> > continue;
> >
> > vi->rq[i].do_dma = true;
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 51d8f3299c10..b3ded56722f4 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2784,7 +2784,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >   * 0: success.
> >   * -EINVAL: vring does not use the dma api, so we can not enable premapped 
> > mode.
> >   */
> > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
> >  {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > u32 num;
> > @@ -2803,8 +2803,13 @@ int virtqueue_set_dma_premapped(struct virtqueue 
> > *_vq)
> > return -EINVAL;
> > }
> >
> > -   vq->premapped = true;
> > -   vq->do_unmap = false;
> > +   if (mode) {
> > +   vq->premapped = true;
> > +   vq->do_unmap = false;
> > +   } else {
> > +   vq->premapped = false;
> > +   vq->do_unmap = vq->use_dma_api;
> > +   }
> >
> > END_USE(vq);
> >
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 4cc614a38376..1cf7b004348b 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -81,7 +81,7 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >
> >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >
> > -int virtqueue_set_dma_premapped(struct virtqueue *_vq);
> > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode);
> >
> >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>
> Wait a sec I thought we never change premapped. If you make this
> dynamic don't you need a bunch of locking?
> Or maybe queue is empty when you change this?
> If yes pls add a bunch of BUG_ON checks to make sure this is not misused.


Actually, this api is called immediately after the vq init or vq reset.

We already have such a check.

Thanks.

/**
 * virtqueue_set_dma_premapped - set the vring premapped mode
 * @_vq: the struct virtqueue we're talking about.
 *
 * Enable the premapped mode of the vq.
 *
 * The vring in premapped mode does not do dma internally, so the driver must
 * do dma mapping in advance. The driver must pass the dma_address through
 * dma_address of scatterlist. When the driver got a used buffer from
 * the vring, it has to unmap the dma address.
 *
 * This function must be called immediately after creating the vq, or after vq
 * reset, and before adding any buffers to it.
 *
 * Caller must ensure we don't call this with other virtqueue operations
 * at the same time (except where noted).
 *
 * Returns zero or a negative error.
 * 0: success.
 * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
 */
int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
{
struct vring_virtqueue *vq = to_vvq(_vq);
u32 num;

START_USE(vq);

num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;

--> if (num != vq->vq.num_free) {
END_USE(vq);
return -EINVAL;
}

if (!vq->use_dma_api) {
END_USE(vq);
return -EINVAL;
}

if (mode) {
vq->premapped = true;
vq->do_unmap = false;
} else {
vq->premapped = false;
vq->do_unmap = vq->use_dma_api;
}

END_USE(vq);

return 0;
}
EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);


>
>
> > --
> > 2.32.0.3.g01195cf9f
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 21/22] virtio_net: update tx timeout record

2023-10-12 Thread Michael S. Tsirkin
On Thu, Oct 12, 2023 at 05:12:33PM +0800, Xuan Zhuo wrote:
> On Thu, 12 Oct 2023 05:10:55 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Oct 11, 2023 at 05:27:27PM +0800, Xuan Zhuo wrote:
> > > If send queue sent some packets, we update the tx timeout
> > > record to prevent the tx timeout.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/net/virtio/xsk.c | 10 ++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> > > index 7abd46bb0e3d..e605f860edb6 100644
> > > --- a/drivers/net/virtio/xsk.c
> > > +++ b/drivers/net/virtio/xsk.c
> > > @@ -274,6 +274,16 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, struct 
> > > xsk_buff_pool *pool,
> > >
> > >   virtnet_xsk_check_queue(sq);
> > >
> > > + if (stats.packets) {
> > > + struct netdev_queue *txq;
> > > + struct virtnet_info *vi;
> > > +
> > > + vi = sq->vq->vdev->priv;
> > > +
> > > + txq = netdev_get_tx_queue(vi->dev, sq - vi->sq);
> > > + txq_trans_cond_update(txq);
> > > + }
> > > +
> > >   u64_stats_update_begin(&sq->stats.syncp);
> > >   sq->stats.packets += stats.packets;
> > >   sq->stats.bytes += stats.bytes;
> >
> > I don't get what this is doing. Is there some kind of race here you
> > are trying to address? And what introduced the race?
> 
> 
> Because the xsk xmit shares the send queue with the kernel xmit,
> then when I do benchmark, the xsk will always use the send queue,
> so the kernel may have no chance to do xmit, the tx watchdog
> thinks that the send queue is hang and prints tx timeout log.
> 
> So I call the txq_trans_cond_update() to tell the tx watchdog
> that the send queue is working.
> 
> Thanks.

Don't like this hack.
So packets are stuck in queue - that's not good is it?
Is ours the only driver that shares queues like this?

> 
> >
> > > --
> > > 2.32.0.3.g01195cf9f
> >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 01/22] virtio_ring: virtqueue_set_dma_premapped support disable

2023-10-12 Thread Michael S. Tsirkin
On Thu, Oct 12, 2023 at 05:18:54PM +0800, Xuan Zhuo wrote:
> On Thu, 12 Oct 2023 05:15:52 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Wed, Oct 11, 2023 at 05:27:07PM +0800, Xuan Zhuo wrote:
> > > virtqueue_set_dma_premapped() adds a new parameter to disable the
> > > virtqueue premapped mode.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/net/virtio_net.c |  2 +-
> > >  drivers/virtio/virtio_ring.c | 11 ---
> > >  include/linux/virtio.h   |  2 +-
> > >  3 files changed, 10 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index fe7f314d65c9..6b5f47ebf9b2 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -737,7 +737,7 @@ static void virtnet_rq_set_premapped(struct 
> > > virtnet_info *vi)
> > >   return;
> > >
> > >   for (i = 0; i < vi->max_queue_pairs; i++) {
> > > - if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > + if (virtqueue_set_dma_premapped(vi->rq[i].vq, true))
> > >   continue;
> > >
> > >   vi->rq[i].do_dma = true;
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 51d8f3299c10..b3ded56722f4 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2784,7 +2784,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > >   * 0: success.
> > >   * -EINVAL: vring does not use the dma api, so we can not enable 
> > > premapped mode.
> > >   */
> > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
> > >  {
> > >   struct vring_virtqueue *vq = to_vvq(_vq);
> > >   u32 num;
> > > @@ -2803,8 +2803,13 @@ int virtqueue_set_dma_premapped(struct virtqueue 
> > > *_vq)
> > >   return -EINVAL;
> > >   }
> > >
> > > - vq->premapped = true;
> > > - vq->do_unmap = false;
> > > + if (mode) {
> > > + vq->premapped = true;
> > > + vq->do_unmap = false;
> > > + } else {
> > > + vq->premapped = false;
> > > + vq->do_unmap = vq->use_dma_api;
> > > + }
> > >
> > >   END_USE(vq);
> > >
> > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > index 4cc614a38376..1cf7b004348b 100644
> > > --- a/include/linux/virtio.h
> > > +++ b/include/linux/virtio.h
> > > @@ -81,7 +81,7 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> > >
> > >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> > >
> > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq);
> > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode);
> > >
> > >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >
> > Wait a sec I thought we never change premapped. If you make this
> > dynamic don't you need a bunch of locking?
> > Or maybe queue is empty when you change this?
> > If yes pls add a bunch of BUG_ON checks to make sure this is not misused.
> 
> 
> Actually, this api is called immediately after the vq init or vq reset.
> 
> We already have such a check.
> 
> Thanks.
> 
> /**
>  * virtqueue_set_dma_premapped - set the vring premapped mode
>  * @_vq: the struct virtqueue we're talking about.
>  *
>  * Enable the premapped mode of the vq.
>  *
>  * The vring in premapped mode does not do dma internally, so the driver must
>  * do dma mapping in advance. The driver must pass the dma_address through
>  * dma_address of scatterlist. When the driver got a used buffer from
>  * the vring, it has to unmap the dma address.
>  *
>  * This function must be called immediately after creating the vq, or after vq
>  * reset, and before adding any buffers to it.
>  *
>  * Caller must ensure we don't call this with other virtqueue operations
>  * at the same time (except where noted).
>  *
>  * Returns zero or a negative error.
>  * 0: success.
>  * -EINVAL: vring does not use the dma api, so we can not enable premapped 
> mode.
>  */
> int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
> {
>   struct vring_virtqueue *vq = to_vvq(_vq);
>   u32 num;
> 
>   START_USE(vq);
> 
>   num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
> 
> -->   if (num != vq->vq.num_free) {
>   END_USE(vq);
>   return -EINVAL;
>   }

But it turns out virtnet_rq_set_premapped actually just ignores errors.
So returning EINVAL here does nothing caller just proceeds?
And checking num_free without locks is never safe anyway.
I think the point is that this never triggers then just BUG_ON.


> 
>   if (!vq->use_dma_api) {
>   END_USE(vq);
>   return -EINVAL;
>   }
> 
>   if (mode) {
>   vq->premapped = true;
>   vq->do_unmap = false;
>   } else {
>   vq->premapped = false;
>   vq->do_unmap = vq->use_dma_api;
>   }
> 
>   END_USE(vq);
> 
>   return 0;
> }
> EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
> 
> 

Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-12 Thread Zhu, Lingshan




On 10/11/2023 4:00 PM, Parav Pandit via Virtualization wrote:

Hi Christoph,


From: Christoph Hellwig 
Sent: Wednesday, October 11, 2023 12:29 PM

On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:

Btw, what is that intel thing everyone is talking about?  And why
would the virtio core support vendor specific behavior like that?

It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
that implemented vdpa support and so Zhu Lingshan from intel is
working on vdpa and has also proposed virtio spec extensions for migration.
intel's driver is called ifcvf.  vdpa composes all this stuff that is
added to vfio in userspace, so it's a different approach.

Well, so let's call it virtio live migration instead of intel.

And please work all together in the virtio committee that you have one way of
communication between controlling and controlled functions.
If one extension does it one way and the other a different way that's just
creating a giant mess.

We in virtio committee are working on VF device migration where:
VF = controlled function
PF = controlling function

The second proposal is what Michael mentioned from Intel that somehow combine 
controlled and controlling function as single entity on VF.

The main reasons I find it weird are:
1. it must always need to do mediation to do fake the device reset, and flr 
flows
2. dma cannot work as you explained for complex device state
3. it needs constant knowledge of each tiny things for each virtio device type

Such single entity appears a bit very weird to me but maybe it is just me.
sorry for the late reply, we have discussed this for weeks in virtio 
mailing list.

I have proposed a live migration solution which is a config space solution.

We(me, Jason and Eugenio) have been working on this solution for more 
than two years

and we are implementing virtio live migration basic facilities.

The implementation is transport specific, e.g., for PCI we implement new 
or extend registers which

work as other config space registers do.

The reason we are arguing is:
I am not sure admin vq based live migration solution is a good choice, 
because:

1) it does not work for nested
2) it does not work for bare metal
3) QOS problem
4) security leaks.

Sorry to span the discussions here.

Thanks,
Zhu Lingshan

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-12 Thread Zhu, Lingshan




On 10/11/2023 2:59 PM, Christoph Hellwig wrote:

On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:

Btw, what is that intel thing everyone is talking about?  And why
would the virtio core support vendor specific behavior like that?

It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
that implemented vdpa support and so Zhu Lingshan from intel is working
on vdpa and has also proposed virtio spec extensions for migration.
intel's driver is called ifcvf.  vdpa composes all this stuff that is
added to vfio in userspace, so it's a different approach.

Well, so let's call it virtio live migration instead of intel.

And please work all together in the virtio committee that you have
one way of communication between controlling and controlled functions.
If one extension does it one way and the other a different way that's
just creating a giant mess.

I hope so, Jason Wang has proposed a solution to cooperate, but sadly
rejected...


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-12 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 03:45:36AM +, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin 
> > Sent: Tuesday, September 26, 2023 12:06 AM
> 
> > One can thinkably do that wait in hardware, though. Just defer completion 
> > until
> > read is done.
> >
> Once OASIS does such new interface and if some hw vendor _actually_ wants to 
> do such complex hw, may be vfio driver can adopt to it.

The reset behaviour I describe is already in the spec. What else do you
want OASIS to standardize? Virtio currently is just a register map it
does not yet include suggestions on how exactly do pci express
transactions look. You feel we should add that?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-12 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 12, 2023 4:23 PM
> 
> On Tue, Sep 26, 2023 at 03:45:36AM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Tuesday, September 26, 2023 12:06 AM
> >
> > > One can thinkably do that wait in hardware, though. Just defer
> > > completion until read is done.
> > >
> > Once OASIS does such new interface and if some hw vendor _actually_ wants
> to do such complex hw, may be vfio driver can adopt to it.
> 
> The reset behaviour I describe is already in the spec. What else do you want
> OASIS to standardize? Virtio currently is just a register map it does not yet
> include suggestions on how exactly do pci express transactions look. You feel 
> we
> should add that?

The reset behavior in the spec for modern as listed in [1] and [2] is just fine.

What I meant is in context of having MMIO based legacy registers to "defer 
completion until read is done".
I think you meant, "Just differ read completion, until reset is done".
This means the hw needs to finish the device reset for thousands of devices 
within the read completion timeout of the pci.
So when if OASIS does such standardization, someone can implement it.

What I recollect, is OASIS didn't not standardize such anti-scale approach and 
took the admin command approach which achieve better scale.
Hope I clarified.

I am not expecting OASIS to do anything extra for legacy registers.

[1] The device MUST reset when 0 is written to device_status, and present a 0 
in device_status once that is done.
[2] After writing 0 to device_status, the driver MUST wait for a read of 
device_status to return 0 before reinitializing
the device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-12 Thread Michael S. Tsirkin
On Thu, Oct 12, 2023 at 11:11:20AM +, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin 
> > Sent: Thursday, October 12, 2023 4:23 PM
> > 
> > On Tue, Sep 26, 2023 at 03:45:36AM +, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin 
> > > > Sent: Tuesday, September 26, 2023 12:06 AM
> > >
> > > > One can thinkably do that wait in hardware, though. Just defer
> > > > completion until read is done.
> > > >
> > > Once OASIS does such new interface and if some hw vendor _actually_ wants
> > to do such complex hw, may be vfio driver can adopt to it.
> > 
> > The reset behaviour I describe is already in the spec. What else do you want
> > OASIS to standardize? Virtio currently is just a register map it does not 
> > yet
> > include suggestions on how exactly do pci express transactions look. You 
> > feel we
> > should add that?
> 
> The reset behavior in the spec for modern as listed in [1] and [2] is just 
> fine.
> 
> What I meant is in context of having MMIO based legacy registers to "defer 
> completion until read is done".
> I think you meant, "Just differ read completion, until reset is done".

yes

> This means the hw needs to finish the device reset for thousands of devices 
> within the read completion timeout of the pci.

no, each device does it's own reset.

> So when if OASIS does such standardization, someone can implement it.
> 
> What I recollect, is OASIS didn't not standardize such anti-scale approach 
> and took the admin command approach which achieve better scale.
> Hope I clarified.

You are talking about the extension for trap and emulate.
I am instead talking about devices that work with
existing legacy linux drivers with no traps.

> I am not expecting OASIS to do anything extra for legacy registers.
> 
> [1] The device MUST reset when 0 is written to device_status, and present a 0 
> in device_status once that is done.
> [2] After writing 0 to device_status, the driver MUST wait for a read of 
> device_status to return 0 before reinitializing
> the device.

We can add a note explaining that legacy drivers do not wait
after doing reset, that is not a problem.
If someone wants to make a device that works with existing
legacy linux drivers, they can do that.
Won't work with all drivers though, which is why oasis did not
want to standardize this.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-12 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 12, 2023 5:00 PM

> I am instead talking about devices that work with existing legacy linux 
> drivers
> with no traps.
> 
Yep, I understood.

> > I am not expecting OASIS to do anything extra for legacy registers.
> >
> > [1] The device MUST reset when 0 is written to device_status, and present a > > 0
> in device_status once that is done.
> > [2] After writing 0 to device_status, the driver MUST wait for a read
> > of device_status to return 0 before reinitializing the device.
> 
> We can add a note explaining that legacy drivers do not wait after doing 
> reset,
> that is not a problem.
> If someone wants to make a device that works with existing legacy linux 
> drivers,
> they can do that.
> Won't work with all drivers though, which is why oasis did not want to
> standardize this.

Ok. thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 01/22] virtio_ring: virtqueue_set_dma_premapped support disable

2023-10-12 Thread Xuan Zhuo
On Thu, 12 Oct 2023 05:40:38 -0400, "Michael S. Tsirkin"  
wrote:
> On Thu, Oct 12, 2023 at 05:18:54PM +0800, Xuan Zhuo wrote:
> > On Thu, 12 Oct 2023 05:15:52 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Wed, Oct 11, 2023 at 05:27:07PM +0800, Xuan Zhuo wrote:
> > > > virtqueue_set_dma_premapped() adds a new parameter to disable the
> > > > virtqueue premapped mode.
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/net/virtio_net.c |  2 +-
> > > >  drivers/virtio/virtio_ring.c | 11 ---
> > > >  include/linux/virtio.h   |  2 +-
> > > >  3 files changed, 10 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index fe7f314d65c9..6b5f47ebf9b2 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -737,7 +737,7 @@ static void virtnet_rq_set_premapped(struct 
> > > > virtnet_info *vi)
> > > > return;
> > > >
> > > > for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > -   if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > > +   if (virtqueue_set_dma_premapped(vi->rq[i].vq, true))
> > > > continue;
> > > >
> > > > vi->rq[i].do_dma = true;
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 51d8f3299c10..b3ded56722f4 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2784,7 +2784,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > >   * 0: success.
> > > >   * -EINVAL: vring does not use the dma api, so we can not enable 
> > > > premapped mode.
> > > >   */
> > > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
> > > >  {
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > > u32 num;
> > > > @@ -2803,8 +2803,13 @@ int virtqueue_set_dma_premapped(struct virtqueue 
> > > > *_vq)
> > > > return -EINVAL;
> > > > }
> > > >
> > > > -   vq->premapped = true;
> > > > -   vq->do_unmap = false;
> > > > +   if (mode) {
> > > > +   vq->premapped = true;
> > > > +   vq->do_unmap = false;
> > > > +   } else {
> > > > +   vq->premapped = false;
> > > > +   vq->do_unmap = vq->use_dma_api;
> > > > +   }
> > > >
> > > > END_USE(vq);
> > > >
> > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > index 4cc614a38376..1cf7b004348b 100644
> > > > --- a/include/linux/virtio.h
> > > > +++ b/include/linux/virtio.h
> > > > @@ -81,7 +81,7 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> > > >
> > > >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> > > >
> > > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq);
> > > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode);
> > > >
> > > >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > >
> > > Wait a sec I thought we never change premapped. If you make this
> > > dynamic don't you need a bunch of locking?
> > > Or maybe queue is empty when you change this?
> > > If yes pls add a bunch of BUG_ON checks to make sure this is not misused.
> >
> >
> > Actually, this api is called immediately after the vq init or vq reset.
> >
> > We already have such a check.
> >
> > Thanks.
> >
> > /**
> >  * virtqueue_set_dma_premapped - set the vring premapped mode
> >  * @_vq: the struct virtqueue we're talking about.
> >  *
> >  * Enable the premapped mode of the vq.
> >  *
> >  * The vring in premapped mode does not do dma internally, so the driver 
> > must
> >  * do dma mapping in advance. The driver must pass the dma_address through
> >  * dma_address of scatterlist. When the driver got a used buffer from
> >  * the vring, it has to unmap the dma address.
> >  *
> >  * This function must be called immediately after creating the vq, or after 
> > vq
> >  * reset, and before adding any buffers to it.
> >  *
> >  * Caller must ensure we don't call this with other virtqueue operations
> >  * at the same time (except where noted).
> >  *
> >  * Returns zero or a negative error.
> >  * 0: success.
> >  * -EINVAL: vring does not use the dma api, so we can not enable premapped 
> > mode.
> >  */
> > int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool mode)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > u32 num;
> >
> > START_USE(vq);
> >
> > num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
> >
> > --> if (num != vq->vq.num_free) {
> > END_USE(vq);
> > return -EINVAL;
> > }
>
> But it turns out virtnet_rq_set_premapped actually just ignores errors.
> So returning EINVAL here does nothing caller just proceeds?
> And checking num_free without locks is never safe anyway.

The premise of all this is that this is called immediately after reset 

Re: [PATCH vhost 21/22] virtio_net: update tx timeout record

2023-10-12 Thread Xuan Zhuo
On Thu, 12 Oct 2023 05:36:56 -0400, "Michael S. Tsirkin"  
wrote:
> On Thu, Oct 12, 2023 at 05:12:33PM +0800, Xuan Zhuo wrote:
> > On Thu, 12 Oct 2023 05:10:55 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Wed, Oct 11, 2023 at 05:27:27PM +0800, Xuan Zhuo wrote:
> > > > If send queue sent some packets, we update the tx timeout
> > > > record to prevent the tx timeout.
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/net/virtio/xsk.c | 10 ++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> > > > index 7abd46bb0e3d..e605f860edb6 100644
> > > > --- a/drivers/net/virtio/xsk.c
> > > > +++ b/drivers/net/virtio/xsk.c
> > > > @@ -274,6 +274,16 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, 
> > > > struct xsk_buff_pool *pool,
> > > >
> > > > virtnet_xsk_check_queue(sq);
> > > >
> > > > +   if (stats.packets) {
> > > > +   struct netdev_queue *txq;
> > > > +   struct virtnet_info *vi;
> > > > +
> > > > +   vi = sq->vq->vdev->priv;
> > > > +
> > > > +   txq = netdev_get_tx_queue(vi->dev, sq - vi->sq);
> > > > +   txq_trans_cond_update(txq);
> > > > +   }
> > > > +
> > > > u64_stats_update_begin(&sq->stats.syncp);
> > > > sq->stats.packets += stats.packets;
> > > > sq->stats.bytes += stats.bytes;
> > >
> > > I don't get what this is doing. Is there some kind of race here you
> > > are trying to address? And what introduced the race?
> >
> >
> > Because the xsk xmit shares the send queue with the kernel xmit,
> > then when I do benchmark, the xsk will always use the send queue,
> > so the kernel may have no chance to do xmit, the tx watchdog
> > thinks that the send queue is hang and prints tx timeout log.
> >
> > So I call the txq_trans_cond_update() to tell the tx watchdog
> > that the send queue is working.
> >
> > Thanks.
>
> Don't like this hack.
> So packets are stuck in queue - that's not good is it?
> Is ours the only driver that shares queues like this?

NO.

And txq_trans_cond_update() is called by many net drivers for the similar 
reason.

Thanks


>
> >
> > >
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
> > >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 21/22] virtio_net: update tx timeout record

2023-10-12 Thread Michael S. Tsirkin
On Thu, Oct 12, 2023 at 07:54:02PM +0800, Xuan Zhuo wrote:
> On Thu, 12 Oct 2023 05:36:56 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Thu, Oct 12, 2023 at 05:12:33PM +0800, Xuan Zhuo wrote:
> > > On Thu, 12 Oct 2023 05:10:55 -0400, "Michael S. Tsirkin" 
> > >  wrote:
> > > > On Wed, Oct 11, 2023 at 05:27:27PM +0800, Xuan Zhuo wrote:
> > > > > If send queue sent some packets, we update the tx timeout
> > > > > record to prevent the tx timeout.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo 
> > > > > ---
> > > > >  drivers/net/virtio/xsk.c | 10 ++
> > > > >  1 file changed, 10 insertions(+)
> > > > >
> > > > > diff --git a/drivers/net/virtio/xsk.c b/drivers/net/virtio/xsk.c
> > > > > index 7abd46bb0e3d..e605f860edb6 100644
> > > > > --- a/drivers/net/virtio/xsk.c
> > > > > +++ b/drivers/net/virtio/xsk.c
> > > > > @@ -274,6 +274,16 @@ bool virtnet_xsk_xmit(struct virtnet_sq *sq, 
> > > > > struct xsk_buff_pool *pool,
> > > > >
> > > > >   virtnet_xsk_check_queue(sq);
> > > > >
> > > > > + if (stats.packets) {
> > > > > + struct netdev_queue *txq;
> > > > > + struct virtnet_info *vi;
> > > > > +
> > > > > + vi = sq->vq->vdev->priv;
> > > > > +
> > > > > + txq = netdev_get_tx_queue(vi->dev, sq - vi->sq);
> > > > > + txq_trans_cond_update(txq);
> > > > > + }
> > > > > +
> > > > >   u64_stats_update_begin(&sq->stats.syncp);
> > > > >   sq->stats.packets += stats.packets;
> > > > >   sq->stats.bytes += stats.bytes;
> > > >
> > > > I don't get what this is doing. Is there some kind of race here you
> > > > are trying to address? And what introduced the race?
> > >
> > >
> > > Because the xsk xmit shares the send queue with the kernel xmit,
> > > then when I do benchmark, the xsk will always use the send queue,
> > > so the kernel may have no chance to do xmit, the tx watchdog
> > > thinks that the send queue is hang and prints tx timeout log.
> > >
> > > So I call the txq_trans_cond_update() to tell the tx watchdog
> > > that the send queue is working.
> > >
> > > Thanks.
> >
> > Don't like this hack.
> > So packets are stuck in queue - that's not good is it?
> > Is ours the only driver that shares queues like this?
> 
> NO.
> 
> And txq_trans_cond_update() is called by many net drivers for the similar 
> reason.
> 
> Thanks

Hmm it seems you are right. OK, sorry about the noise.

> 
> >
> > >
> > > >
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost 00/22] virtio-net: support AF_XDP zero copy

2023-10-12 Thread Michael S. Tsirkin
On Thu, Oct 12, 2023 at 04:32:40PM +0800, Xuan Zhuo wrote:
> On Thu, 12 Oct 2023 15:50:13 +0800, Jason Wang  wrote:
> > On Thu, Oct 12, 2023 at 9:58 AM Xuan Zhuo  
> > wrote:
> > >
> > > On Wed, 11 Oct 2023 10:00:57 -0700, Jakub Kicinski  
> > > wrote:
> > > > On Wed, 11 Oct 2023 17:27:06 +0800 Xuan Zhuo wrote:
> > > > > ## AF_XDP
> > > > >
> > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. 
> > > > > The zero
> > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. 
> > > > > The
> > > > > performance of zero copy is very good. mlx5 and intel ixgbe already 
> > > > > support
> > > > > this feature, This patch set allows virtio-net to support xsk's 
> > > > > zerocopy xmit
> > > > > feature.
> > > >
> > > > You're moving the driver and adding a major feature.
> > > > This really needs to go via net or bpf.
> > > > If you have dependencies in other trees please wait for
> > > > after the merge window.
> > >
> > >
> > > If so, I can remove the first two commits.
> > >
> > > Then, the sq uses the premapped mode by default.
> > > And we can use the api virtqueue_dma_map_single_attrs to replace the
> > > virtqueue_dma_map_page_attrs.
> > >
> > > And then I will fix that on the top.
> > >
> > > Hi Micheal and Jason, is that ok for you?
> >
> > I would go with what looks easy for you but I think Jakub wants the
> > series to go with next-next (this is what we did in the past for
> > networking specific features that is done in virtio-net). So we need
> > to tweak the prefix to use net-next instead of vhost.
> 
> OK.
> 
> I will fix that in next version.
> 
> Thanks.

Scaling scope back as far as possible is a good idea generally.
I am not sure how this will work though. Let's see.

> >
> > Thanks
> >
> >
> > >
> > > Thanks.
> > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-12 Thread Matias Ezequiel Vara Larsen
This commit replaces the mmap mechanism with the copy() and
fill_silence() callbacks for both capturing and playback for the
virtio-sound driver. This change is required to prevent the updating of
the content of a buffer that is already in the available ring.

The current mechanism splits a dma buffer into descriptors that are
exposed to the device. This dma buffer is shared with the user
application. When the device consumes a buffer, the driver moves the
request from the used ring to available ring.

The driver exposes the buffer to the device without knowing if the
content has been updated from the user. The section 2.8.21.1 of the
virtio spec states that: "The device MAY access the descriptor chains
the driver created and the memory they refer to immediately". If the
device picks up buffers from the available ring just after it is
notified, it happens that the content may be old.

By providing the copy() callback, the driver first updates the content
of the buffer, and then, exposes the buffer to the device by enqueuing
it in the available ring. Thus, device always picks up a buffer that is
updated.

For capturing, the driver starts by exposing all the available buffers
to device. After device updates the content of a buffer, it enqueues it
in the used ring. It is only after the copy() for capturing is issued
that the driver re-enqueues the buffer in the available ring.

Note that the copy() function assumes that user is always writing a
period. Testing shows that this is true but I may be wrong. This RFC
aims at clarifying this.

Signed-off-by: Matias Ezequiel Vara Larsen 
---
 sound/virtio/virtio_pcm.c | 11 ++--
 sound/virtio/virtio_pcm.h |  9 +++-
 sound/virtio/virtio_pcm_msg.c | 50 ---
 sound/virtio/virtio_pcm_ops.c | 94 +++
 4 files changed, 137 insertions(+), 27 deletions(-)

diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
index c10d91fff2fb..bfe982952303 100644
--- a/sound/virtio/virtio_pcm.c
+++ b/sound/virtio/virtio_pcm.c
@@ -104,8 +104,6 @@ static int virtsnd_pcm_build_hw(struct virtio_pcm_substream 
*vss,
 * only message-based transport.
 */
vss->hw.info =
-   SNDRV_PCM_INFO_MMAP |
-   SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_BATCH |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_INTERLEAVED |
@@ -471,12 +469,11 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
for (kss = ks->substream; kss; kss = kss->next)
vs->substreams[kss->number]->substream = kss;
 
-   snd_pcm_set_ops(vpcm->pcm, i, &virtsnd_pcm_ops);
+   if (i == SNDRV_PCM_STREAM_CAPTURE)
+   snd_pcm_set_ops(vpcm->pcm, i, 
&virtsnd_pcm_capture_ops);
+   else
+   snd_pcm_set_ops(vpcm->pcm, i, 
&virtsnd_pcm_playback_ops);
}
-
-   snd_pcm_set_managed_buffer_all(vpcm->pcm,
-  SNDRV_DMA_TYPE_VMALLOC, NULL,
-  0, 0);
}
 
return 0;
diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
index 062eb8e8f2cf..1c1106ec971f 100644
--- a/sound/virtio/virtio_pcm.h
+++ b/sound/virtio/virtio_pcm.h
@@ -50,6 +50,8 @@ struct virtio_pcm_substream {
struct work_struct elapsed_period;
spinlock_t lock;
size_t buffer_bytes;
+   u8 *buffer;
+   size_t buffer_sz;
size_t hw_ptr;
bool xfer_enabled;
bool xfer_xrun;
@@ -90,7 +92,8 @@ struct virtio_pcm {
struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
 };
 
-extern const struct snd_pcm_ops virtsnd_pcm_ops;
+extern const struct snd_pcm_ops virtsnd_pcm_playback_ops;
+extern const struct snd_pcm_ops virtsnd_pcm_capture_ops;
 
 int virtsnd_pcm_validate(struct virtio_device *vdev);
 
@@ -117,7 +120,9 @@ int virtsnd_pcm_msg_alloc(struct virtio_pcm_substream *vss,
 
 void virtsnd_pcm_msg_free(struct virtio_pcm_substream *vss);
 
-int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss);
+int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss, bool single);
+
+int virtsnd_pcm_msg_send_locked(struct virtio_pcm_substream *vss, bool single);
 
 unsigned int virtsnd_pcm_msg_pending_num(struct virtio_pcm_substream *vss);
 
diff --git a/sound/virtio/virtio_pcm_msg.c b/sound/virtio/virtio_pcm_msg.c
index aca2dc1989ba..9a5f9814cb62 100644
--- a/sound/virtio/virtio_pcm_msg.c
+++ b/sound/virtio/virtio_pcm_msg.c
@@ -132,7 +132,6 @@ static void virtsnd_pcm_sg_from(struct scatterlist *sgs, 
int nsgs, u8 *data,
 int virtsnd_pcm_msg_alloc(struct virtio_pcm_substream *vss,
  unsigned int periods, unsigned int period_bytes)
 {
-   struct snd_pcm_runtime *runtime = vss->substream->runtime;
unsigned int i;
 
vss->msgs = kcalloc(periods

Re: [RFC PATCH] ALSA: virtio: use copy and fill_silence callbacks

2023-10-12 Thread Michael S. Tsirkin
On Thu, Oct 12, 2023 at 05:10:50PM +0200, Matias Ezequiel Vara Larsen wrote:
> This commit replaces the mmap mechanism with the copy() and
> fill_silence() callbacks for both capturing and playback for the
> virtio-sound driver. This change is required to prevent the updating of
> the content of a buffer that is already in the available ring.
> 
> The current mechanism splits a dma buffer into descriptors that are
> exposed to the device. This dma buffer is shared with the user
> application. When the device consumes a buffer, the driver moves the
> request from the used ring to available ring.
> 
> The driver exposes the buffer to the device without knowing if the
> content has been updated from the user. The section 2.8.21.1 of the
> virtio spec states that: "The device MAY access the descriptor chains
> the driver created and the memory they refer to immediately". If the
> device picks up buffers from the available ring just after it is
> notified, it happens that the content may be old.
> 
> By providing the copy() callback, the driver first updates the content
> of the buffer, and then, exposes the buffer to the device by enqueuing
> it in the available ring. Thus, device always picks up a buffer that is
> updated.
> 
> For capturing, the driver starts by exposing all the available buffers
> to device. After device updates the content of a buffer, it enqueues it
> in the used ring. It is only after the copy() for capturing is issued
> that the driver re-enqueues the buffer in the available ring.
> 
> Note that the copy() function assumes that user is always writing a
> period. Testing shows that this is true but I may be wrong. This RFC
> aims at clarifying this.
> 
> Signed-off-by: Matias Ezequiel Vara Larsen 


Thank you for working on this!

> ---
>  sound/virtio/virtio_pcm.c | 11 ++--
>  sound/virtio/virtio_pcm.h |  9 +++-
>  sound/virtio/virtio_pcm_msg.c | 50 ---
>  sound/virtio/virtio_pcm_ops.c | 94 +++
>  4 files changed, 137 insertions(+), 27 deletions(-)
> 
> diff --git a/sound/virtio/virtio_pcm.c b/sound/virtio/virtio_pcm.c
> index c10d91fff2fb..bfe982952303 100644
> --- a/sound/virtio/virtio_pcm.c
> +++ b/sound/virtio/virtio_pcm.c
> @@ -104,8 +104,6 @@ static int virtsnd_pcm_build_hw(struct 
> virtio_pcm_substream *vss,
>* only message-based transport.
>*/
>   vss->hw.info =
> - SNDRV_PCM_INFO_MMAP |
> - SNDRV_PCM_INFO_MMAP_VALID |
>   SNDRV_PCM_INFO_BATCH |
>   SNDRV_PCM_INFO_BLOCK_TRANSFER |
>   SNDRV_PCM_INFO_INTERLEAVED |
> @@ -471,12 +469,11 @@ int virtsnd_pcm_build_devs(struct virtio_snd *snd)
>   for (kss = ks->substream; kss; kss = kss->next)
>   vs->substreams[kss->number]->substream = kss;
>  
> - snd_pcm_set_ops(vpcm->pcm, i, &virtsnd_pcm_ops);
> + if (i == SNDRV_PCM_STREAM_CAPTURE)
> + snd_pcm_set_ops(vpcm->pcm, i, 
> &virtsnd_pcm_capture_ops);
> + else
> + snd_pcm_set_ops(vpcm->pcm, i, 
> &virtsnd_pcm_playback_ops);
>   }
> -
> - snd_pcm_set_managed_buffer_all(vpcm->pcm,
> -SNDRV_DMA_TYPE_VMALLOC, NULL,
> -0, 0);
>   }
>  
>   return 0;
> diff --git a/sound/virtio/virtio_pcm.h b/sound/virtio/virtio_pcm.h
> index 062eb8e8f2cf..1c1106ec971f 100644
> --- a/sound/virtio/virtio_pcm.h
> +++ b/sound/virtio/virtio_pcm.h
> @@ -50,6 +50,8 @@ struct virtio_pcm_substream {
>   struct work_struct elapsed_period;
>   spinlock_t lock;
>   size_t buffer_bytes;
> + u8 *buffer;
> + size_t buffer_sz;
>   size_t hw_ptr;
>   bool xfer_enabled;
>   bool xfer_xrun;
> @@ -90,7 +92,8 @@ struct virtio_pcm {
>   struct virtio_pcm_stream streams[SNDRV_PCM_STREAM_LAST + 1];
>  };
>  
> -extern const struct snd_pcm_ops virtsnd_pcm_ops;
> +extern const struct snd_pcm_ops virtsnd_pcm_playback_ops;
> +extern const struct snd_pcm_ops virtsnd_pcm_capture_ops;
>  
>  int virtsnd_pcm_validate(struct virtio_device *vdev);
>  
> @@ -117,7 +120,9 @@ int virtsnd_pcm_msg_alloc(struct virtio_pcm_substream 
> *vss,
>  
>  void virtsnd_pcm_msg_free(struct virtio_pcm_substream *vss);
>  
> -int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss);
> +int virtsnd_pcm_msg_send(struct virtio_pcm_substream *vss, bool single);
> +
> +int virtsnd_pcm_msg_send_locked(struct virtio_pcm_substream *vss, bool 
> single);
>  
>  unsigned int virtsnd_pcm_msg_pending_num(struct virtio_pcm_substream *vss);
>  
> diff --git a/sound/virtio/virtio_pcm_msg.c b/sound/virtio/virtio_pcm_msg.c
> index aca2dc1989ba..9a5f9814cb62 100644
> --- a/sound/virtio/virtio_pcm_msg.c
> +++ b/sound/virtio/virtio_pcm_msg.c
> @@ -132,7 +132,6 @@ static void virtsnd_pcm_sg_from(struct scatterlist *sgs, 
> int ns

Re: Report a possible vhost bug in stable branches

2023-10-12 Thread Jason Wang
On Thu, Oct 12, 2023 at 6:44 PM Xianting Tian
 wrote:
>
>
> 在 2023/10/12 下午3:55, Jason Wang 写道:
> > On Thu, Oct 12, 2023 at 9:43 AM Xianting Tian
> >  wrote:
> >> cgroup attach work and dev flush work will both be added to dev work
> >> list in vhost_attach_cgroups() when set dev owner:
> >>   static int vhost_attach_cgroups(struct vhost_dev *dev)
> >>   {
> >>   struct vhost_attach_cgroups_struct attach;
> >>
> >>   attach.owner = current;
> >>   vhost_work_init(&attach.work,
> >>  vhost_attach_cgroups_work);
> >>   vhost_work_queue(dev, &attach.work); // add cgroup
> >> attach work
> >>   vhost_work_dev_flush(dev);   // add dev
> >> flush work
> >>   return attach.ret;
> >>   }
> >>
> >> And dev kworker will be waken up to handle the two works in
> >> vhost_worker():
> >>   node = llist_del_all(&dev->work_list);
> >>   node = llist_reverse_order(node);
> >>   llist_for_each_entry_safe{
> >>   work->fn(work);
> >>   }
> >>
> >> As the list is reversed before processing in vhost_worker(), so it is
> >> possible
> >> that dev flush work is processed before cgroup attach work.
> > This sounds weird. It's llist not list so when adding the new entry
> > was added to the head that why we need llist_reverse_order() to
> > recover the order.
> >
> >   Have you ever reproduced these issues?
>
> Sorry for the disturb, No issue now.
>
> It caused by our internal changes.

If it's an optimization or features, you are welcomed to post them.

Developing new features upstream has a lot of benefits.

Thanks


>
> >
> > Thanks
> >
> >> If so,
> >> vhost_attach_cgroups
> >> may return "attach.ret" before cgroup attach work is handled, but
> >> "attach.ret" is random
> >> value as it is in stack.
> >>
> >> The possible fix maybe:
> >>
> >> static int vhost_attach_cgroups(struct vhost_dev *dev)
> >> {
> >>   struct vhost_attach_cgroups_struct attach;
> >>
> >>   attach.ret = 0;
> >>   attach.owner = current;
> >>   vhost_work_init(&attach.work, vhost_attach_cgroups_work);
> >>   vhost_work_queue(dev, &attach.work);
> >>   vhost_work_dev_flush(dev);
> >>   return attach.ret;
> >> }
> >>
> >>So this fix is just to initialize the attach.ret to 0, this fix may
> >> not the final fix,
> >>We just want you experts know this issue exists, and we met it
> >> recently in our test.
> >>
> >> And the issue exists in may stable branches.
> >>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 0/5] virtio-net: support dynamic coalescing moderation

2023-10-12 Thread Jason Wang
On Thu, Oct 12, 2023 at 4:29 PM Jason Wang  wrote:
>
> On Thu, Oct 12, 2023 at 3:44 PM Heng Qi  wrote:
> >
> > Now, virtio-net already supports per-queue moderation parameter
> > setting. Based on this, we use the netdim library of linux to support
> > dynamic coalescing moderation for virtio-net.
> >
> > Due to hardware scheduling issues, we only tested rx dim.
>
> Do you have PPS numbers? And TX numbers are also important as the
> throughput could be misleading due to various reasons.

Another consideration:

We currently have two TX mode, NAPI by default, and orphan. Simple
pktgen test shows NAPI can only achieve 30% of orphan. So we need to
make sure:

1) dim helps for NAPI, and if NAPI can compete with orphan, we can
drop the orphan code completely which is a great release and
simplification of the codes. And it means we can have BQL, and other
good stuff on top easily.
2) dim doesn't cause regression for orphan

Thanks

>
> Thanks
>
> >
> > @Test env
> > rxq0 has affinity to cpu0.
> >
> > @Test cmd
> > client: taskset -c 0 sockperf tp -i ${IP} -t 30 --tcp -m ${msg_size}
> > server: taskset -c 0 sockperf sr --tcp
> >
> > @Test res
> > The second column is the ratio of the result returned by client
> > when rx dim is enabled to the result returned by client when
> > rx dim is disabled.
> > --
> > | msg_size |  rx_dim=on / rx_dim=off |
> > --
> > |   14B| + 3%|
> > --
> > |   100B   | + 16%   |
> > --
> > |   500B   | + 25%   |
> > --
> > |   1400B  | + 28%   |
> > --
> > |   2048B  | + 22%   |
> > --
> > |   4096B  | + 5%|
> > --
> >
> > ---
> > This patch set was part of the previous netdim patch set[1].
> > [1] was split into a merged bugfix set[2] and the current set.
> > The previous relevant commentators have been Cced.
> >
> > [1] 
> > https://lore.kernel.org/all/20230811065512.22190-1-hen...@linux.alibaba.com/
> > [2] 
> > https://lore.kernel.org/all/cover.1696745452.git.hen...@linux.alibaba.com/
> >
> > Heng Qi (5):
> >   virtio-net: returns whether napi is complete
> >   virtio-net: separate rx/tx coalescing moderation cmds
> >   virtio-net: extract virtqueue coalescig cmd for reuse
> >   virtio-net: support rx netdim
> >   virtio-net: support tx netdim
> >
> >  drivers/net/virtio_net.c | 394 ---
> >  1 file changed, 322 insertions(+), 72 deletions(-)
> >
> > --
> > 2.19.1.6.gb485710b
> >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/4] vdpa: introduce .reset_map operation callback

2023-10-12 Thread Jason Wang
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
>
> Device specific IOMMU parent driver who wishes to see mapping to be
> decoupled from virtio or vdpa device life cycle (device reset) can use
> it to restore memory mapping in the device IOMMU to the initial or
> default state. The reset of mapping is done per address space basis.
>
> The reason why a separate .reset_map op is introduced is because this
> allows a simple on-chip IOMMU model without exposing too much device
> implementation details to the upper vdpa layer. The .dma_map/unmap or
> .set_map driver API is meant to be used to manipulate the IOTLB mappings,
> and has been abstracted in a way similar to how a real IOMMU device maps
> or unmaps pages for certain memory ranges. However, apart from this there
> also exists other mapping needs, in which case 1:1 passthrough mapping
> has to be used by other users (read virtio-vdpa). To ease parent/vendor
> driver implementation and to avoid abusing DMA ops in an unexpacted way,
> these on-chip IOMMU devices can start with 1:1 passthrough mapping mode
> initially at the he time of creation. Then the .reset_map op can be used
> to switch iotlb back to this initial state without having to expose a
> complex two-dimensional IOMMU device model.
>
> Signed-off-by: Si-Wei Liu 
> ---
>  include/linux/vdpa.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index d376309..26ae6ae 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -327,6 +327,15 @@ struct vdpa_map_file {
>   * @iova: iova to be unmapped
>   * @size: size of the area
>   * Returns integer: success (0) or error (< 0)
> + * @reset_map: Reset device memory mapping to the default
> + * state (optional)

I think we need to mention that this is a must for parents that use set_map()?

Other than this:

Acked-by: Jason Wang 

Thanks

> + * Needed for devices that are using device
> + * specific DMA translation and prefer mapping
> + * to be decoupled from the virtio life cycle,
> + * i.e. device .reset op does not reset mapping
> + * @vdev: vdpa device
> + * @asid: address space identifier
> + * Returns integer: success (0) or error (< 0)
>   * @get_vq_dma_dev:Get the dma device for a specific
>   * virtqueue (optional)
>   * @vdev: vdpa device
> @@ -405,6 +414,7 @@ struct vdpa_config_ops {
>u64 iova, u64 size, u64 pa, u32 perm, void *opaque);
> int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid,
>  u64 iova, u64 size);
> +   int (*reset_map)(struct vdpa_device *vdev, unsigned int asid);
> int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
>   unsigned int asid);
> struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
> --
> 1.8.3.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/4] vhost-vdpa: reset vendor specific mapping to initial state in .release

2023-10-12 Thread Jason Wang
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
>
> Devices with on-chip IOMMU or vendor specific IOTLB implementation
> may need to restore iotlb mapping to the initial or default state
> using the .reset_map op, as it's desirable for some parent devices
> to solely manipulate mappings by its own, independent of virtio device
> state. For instance, device reset does not cause mapping go away on
> such IOTLB model in need of persistent mapping. Before vhost-vdpa
> is going away, give them a chance to reset iotlb back to the initial
> state in vhost_vdpa_cleanup().
>
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vhost/vdpa.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 851535f..a3f8160 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as 
> *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
> return vhost_vdpa_alloc_as(v, asid);
>  }
>
> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
> +{
> +   struct vdpa_device *vdpa = v->vdpa;
> +   const struct vdpa_config_ops *ops = vdpa->config;
> +
> +   if (ops->reset_map)
> +   ops->reset_map(vdpa, asid);
> +}
> +
>  static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
>  {
> struct vhost_vdpa_as *as = asid_to_as(v, asid);
> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, 
> u32 asid)
>
> hlist_del(&as->hash_link);
> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
> +   /*
> +* Devices with vendor specific IOMMU may need to restore
> +* iotlb to the initial or default state which is not done
> +* through device reset, as the IOTLB mapping manipulation
> +* could be decoupled from the virtio device life cycle.
> +*/

Should we do this according to whether IOTLB_PRESIST is set? Otherwise
we may break old userspace.

Thanks

> +   vhost_vdpa_reset_map(v, asid);
> kfree(as);
>
> return 0;
> --
> 1.8.3.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/4] vdpa/mlx5: implement .reset_map driver op

2023-10-12 Thread Jason Wang
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu  wrote:
>
> Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
> virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
> creation time. This 1:1 DMA MR will be implicitly destroyed while
> the first .set_map call is invoked, in which case callers like
> vhost-vdpa will start to set up custom mappings. When the .reset
> callback is invoked, the custom mappings will be cleared and the 1:1
> DMA MR will be re-created.
>
> In order to reduce excessive memory mapping cost in live migration,
> it is desirable to decouple the vhost-vdpa IOTLB abstraction from
> the virtio device life cycle, i.e. mappings can be kept around intact
> across virtio device reset. Leverage the .reset_map callback, which
> is meant to destroy the regular MR on the given ASID and recreate the
> initial DMA mapping. That way, the device .reset op can run free from
> having to maintain and clean up memory mappings by itself.
>
> The cvq mapping also needs to be cleared if is in the given ASID.
>
> Co-developed-by: Dragos Tatulea 
> Signed-off-by: Dragos Tatulea 
> Signed-off-by: Si-Wei Liu 

I wonder if the simulator suffers from the exact same issue. If yes,
let's fix the simulator as well?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization