Re: [PATCH] vduse: Use proper spinlock for IRQ injection

2023-07-25 Thread Jason Wang
On Wed, Jul 5, 2023 at 7:45 PM Maxime Coquelin
 wrote:
>
> The IRQ injection work used spin_lock_irq() to protect the
> scheduling of the softirq, but spin_lock_bh() should be
> used.
>
> With spin_lock_irq(), we noticed delay of more than 6
> seconds between the time a NAPI polling work is scheduled
> and the time it is executed.
>
> Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
> Cc: xieyon...@bytedance.com
>
> Suggested-by: Jason Wang 
> Signed-off-by: Maxime Coquelin 

Acked-by: Jason Wang 

Thanks


> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index dc38ed21319d..df7869537ef1 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -935,10 +935,10 @@ static void vduse_dev_irq_inject(struct work_struct 
> *work)
>  {
> struct vduse_dev *dev = container_of(work, struct vduse_dev, inject);
>
> -   spin_lock_irq(&dev->irq_lock);
> +   spin_lock_bh(&dev->irq_lock);
> if (dev->config_cb.callback)
> dev->config_cb.callback(dev->config_cb.private);
> -   spin_unlock_irq(&dev->irq_lock);
> +   spin_unlock_bh(&dev->irq_lock);
>  }
>
>  static void vduse_vq_irq_inject(struct work_struct *work)
> @@ -946,10 +946,10 @@ static void vduse_vq_irq_inject(struct work_struct 
> *work)
> struct vduse_virtqueue *vq = container_of(work,
> struct vduse_virtqueue, inject);
>
> -   spin_lock_irq(&vq->irq_lock);
> +   spin_lock_bh(&vq->irq_lock);
> if (vq->ready && vq->cb.callback)
> vq->cb.callback(vq->cb.private);
> -   spin_unlock_irq(&vq->irq_lock);
> +   spin_unlock_bh(&vq->irq_lock);
>  }
>
>  static bool vduse_vq_signal_irqfd(struct vduse_virtqueue *vq)
> --
> 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-25 Thread Jason Wang
On Tue, Jul 25, 2023 at 3:36 PM Michael S. Tsirkin  wrote:
>
> On Tue, Jul 25, 2023 at 11:07:40AM +0800, Jason Wang wrote:
> > On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> > > > On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
> > > >  wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > > > >>
> > > > > >>
> > > > > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > > > 
> > > > > 
> > > > >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > > > > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > > >>>
> > > > > >>> Adding cond_resched() to the command waiting loop for a better
> > > > > >>> co-operation with the scheduler. This allows to give CPU a 
> > > > > >>> breath to
> > > > > >>> run other task(workqueue) instead of busy looping when 
> > > > > >>> preemption is
> > > > > >>> not allowed on a device whose CVQ might be slow.
> > > > > >>>
> > > > > >>> Signed-off-by: Jason Wang 
> > > > > >>
> > > > > >> This still leaves hung processes, but at least it doesn't pin 
> > > > > >> the CPU any
> > > > > >> more.  Thanks.
> > > > > >> Reviewed-by: Shannon Nelson 
> > > > > >>
> > > > > >
> > > > > > I'd like to see a full solution
> > > > > > 1- block until interrupt
> > > >
> > > > I remember in previous versions, you worried about the extra MSI
> > > > vector. (Maybe I was wrong).
> > > >
> > > > > 
> > > > >  Would it make sense to also have a timeout?
> > > > >  And when timeout expires, set FAILED bit in device status?
> > > > > >>>
> > > > > >>> virtio spec does not set any limits on the timing of vq
> > > > > >>> processing.
> > > > > >>
> > > > > >> Indeed, but I thought the driver could decide it is too long for 
> > > > > >> it.
> > > > > >>
> > > > > >> The issue is we keep waiting with rtnl locked, it can quickly make 
> > > > > >> the
> > > > > >> system unusable.
> > > > > >
> > > > > > if this is a problem we should find a way not to keep rtnl
> > > > > > locked indefinitely.
> > > >
> > > > Any ideas on this direction? Simply dropping rtnl during the busy loop
> > > > will result in a lot of races. This seems to require non-trivial
> > > > changes in the networking core.
> > > >
> > > > >
> > > > >  From the tests I have done, I think it is. With OVS, a 
> > > > > reconfiguration
> > > > > is performed when the VDUSE device is added, and when a MLX5 device is
> > > > > in the same bridge, it ends up doing an ioctl() that tries to take the
> > > > > rtnl lock. In this configuration, it is not possible to kill OVS 
> > > > > because
> > > > > it is stuck trying to acquire rtnl lock for mlx5 that is held by 
> > > > > virtio-
> > > > > net.
> > > >
> > > > Yeah, basically, any RTNL users would be blocked forever.
> > > >
> > > > And the infinite loop has other side effects like it blocks the freezer 
> > > > to work.
> > > >
> > > > To summarize, there are three issues
> > > >
> > > > 1) busy polling
> > > > 2) breaks freezer
> > > > 3) hold RTNL during the loop
> > > >
> > > > Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard or
> > > > even virtnet_restore() itself may try to hold the lock.
> > >
> > > Yep. So my feeling currently is, the only real fix is to actually
> > > queue up the work in software.
> >
> > Do you mean something like:
> >
> > rtnl_lock();
> > queue up the work
> > rtnl_unlock();
> > return success;
> >
> > ?
>
> yes

We will lose the error reporting, is it a real problem or not?

>
>
> > > It's mostly trivial to limit
> > > memory consumption, vid's is the
> > > only one where it would make sense to have more than
> > > 1 command of a given type outstanding.
> >
> > And rx mode so this implies we will fail any command if the previous
> > work is not finished.
>
> don't fail it, store it.

Ok.

Thanks

>
> > > have a tree
> > > or a bitmap with vids to add/remove?
> >
> > Probably.
> >
> > Thanks
> >
> > >
> > >
> > >
> > > > >
> > > > > >
> > > > > > 2- still handle surprise removal correctly by waking in that 
> > > > > > case
> > > >
> > > > This is basically what version 1 did?
> > > >
> > > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368...@redhat.com/t/
> > > >
> > > > Thanks
> > >
> > > Yes - except the timeout part.
> > >
> > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >>> ---
> > > > > >>>  drivers/net/virtio_net.c | 4 +++-
> > > > > >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >>>
> > > > > >>> diff --git a/drivers/net/virtio_net.c 
> > > > > >>> b/drivers/net/virtio_net.c
> > > > > >>> index

Re: [PATCH net] virtio-net: fix race between set queues and probe

2023-07-25 Thread Jason Wang
On Tue, Jul 25, 2023 at 3:53 PM Michael S. Tsirkin  wrote:
>
> On Tue, Jul 25, 2023 at 03:20:49AM -0400, Jason Wang wrote:
> > A race were found where set_channels could be called after registering
> > but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> > moving the virtnet_set_queues() before netdevice registering. While at
> > it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> > not even registered at that time.
> >
> > Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> > Signed-off-by: Jason Wang 
>
>
> Acked-by: Michael S. Tsirkin 
>
> stable material I guess?

Yes it is.

Thanks

>
> > ---
> >  drivers/net/virtio_net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 0db14f6b87d3..1270c8d23463 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
> >   if (vi->has_rss || vi->has_rss_hash_report)
> >   virtnet_init_default_rss(vi);
> >
> > + _virtnet_set_queues(vi, vi->curr_queue_pairs);
> > +
> >   /* serialize netdev register + virtio_device_ready() with ndo_open() 
> > */
> >   rtnl_lock();
> >
> > @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
> >   goto free_unregister_netdev;
> >   }
> >
> > - virtnet_set_queues(vi, vi->curr_queue_pairs);
> > -
> >   /* Assume link up if device can't report link status,
> >  otherwise get link status from config. */
> >   netif_carrier_off(dev);
> > --
> > 2.39.3
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 4/4] vsock/test: MSG_PEEK test for SOCK_SEQPACKET

2023-07-25 Thread Stefano Garzarella

On Wed, Jul 19, 2023 at 10:27:08PM +0300, Arseniy Krasnov wrote:

This adds MSG_PEEK test for SOCK_SEQPACKET. It works in the same way as
SOCK_STREAM test, except it also tests MSG_TRUNC flag.

Signed-off-by: Arseniy Krasnov 
---
tools/testing/vsock/vsock_test.c | 58 +---
1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 444a3ff0681f..2ca2cbfa9808 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -257,14 +257,19 @@ static void test_stream_multiconn_server(const struct 
test_opts *opts)

#define MSG_PEEK_BUF_LEN 64

-static void test_stream_msg_peek_client(const struct test_opts *opts)
+static void __test_msg_peek_client(const struct test_opts *opts,


Let's stay with just test_msg_peek_client(), WDYT?


+  bool seqpacket)
{
unsigned char buf[MSG_PEEK_BUF_LEN];
ssize_t send_size;
int fd;
int i;

-   fd = vsock_stream_connect(opts->peer_cid, 1234);
+   if (seqpacket)
+   fd = vsock_seqpacket_connect(opts->peer_cid, 1234);
+   else
+   fd = vsock_stream_connect(opts->peer_cid, 1234);
+
if (fd < 0) {
perror("connect");
exit(EXIT_FAILURE);
@@ -290,7 +295,8 @@ static void test_stream_msg_peek_client(const struct 
test_opts *opts)
close(fd);
}

-static void test_stream_msg_peek_server(const struct test_opts *opts)
+static void __test_msg_peek_server(const struct test_opts *opts,


Same here.

The rest LGTM!

Also the whole series should be ready for net-next, right?

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2 3/4] vsock/test: rework MSG_PEEK test for SOCK_STREAM

2023-07-25 Thread Stefano Garzarella

On Wed, Jul 19, 2023 at 10:27:07PM +0300, Arseniy Krasnov wrote:

This new version makes test more complicated by adding empty read,
partial read and data comparisons between MSG_PEEK and normal reads.

Signed-off-by: Arseniy Krasnov 
---
tools/testing/vsock/vsock_test.c | 78 ++--
1 file changed, 75 insertions(+), 3 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index ac1bd3ac1533..444a3ff0681f 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -255,9 +255,14 @@ static void test_stream_multiconn_server(const struct 
test_opts *opts)
close(fds[i]);
}

+#define MSG_PEEK_BUF_LEN 64
+
static void test_stream_msg_peek_client(const struct test_opts *opts)
{
+   unsigned char buf[MSG_PEEK_BUF_LEN];
+   ssize_t send_size;
int fd;
+   int i;

fd = vsock_stream_connect(opts->peer_cid, 1234);
if (fd < 0) {
@@ -265,12 +270,32 @@ static void test_stream_msg_peek_client(const struct 
test_opts *opts)
exit(EXIT_FAILURE);
}

-   send_byte(fd, 1, 0);
+   for (i = 0; i < sizeof(buf); i++)
+   buf[i] = rand() & 0xFF;
+
+   control_expectln("SRVREADY");
+
+   send_size = send(fd, buf, sizeof(buf), 0);
+
+   if (send_size < 0) {
+   perror("send");
+   exit(EXIT_FAILURE);
+   }
+
+   if (send_size != sizeof(buf)) {
+   fprintf(stderr, "Invalid send size %zi\n", send_size);
+   exit(EXIT_FAILURE);
+   }
+
close(fd);
}

static void test_stream_msg_peek_server(const struct test_opts *opts)
{
+   unsigned char buf_half[MSG_PEEK_BUF_LEN / 2];
+   unsigned char buf_normal[MSG_PEEK_BUF_LEN];
+   unsigned char buf_peek[MSG_PEEK_BUF_LEN];
+   ssize_t res;
int fd;

fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
@@ -279,8 +304,55 @@ static void test_stream_msg_peek_server(const struct 
test_opts *opts)
exit(EXIT_FAILURE);
}

-   recv_byte(fd, 1, MSG_PEEK);
-   recv_byte(fd, 1, 0);
+   /* Peek from empty socket. */
+   res = recv(fd, buf_peek, sizeof(buf_peek), MSG_PEEK | MSG_DONTWAIT);
+   if (res != -1) {
+   fprintf(stderr, "expected recv(2) failure, got %zi\n", res);
+   exit(EXIT_FAILURE);
+   }
+
+   if (errno != EAGAIN) {
+   perror("EAGAIN expected");
+   exit(EXIT_FAILURE);
+   }
+
+   control_writeln("SRVREADY");
+
+   /* Peek part of data. */
+   res = recv(fd, buf_half, sizeof(buf_half), MSG_PEEK);
+   if (res != sizeof(buf_half)) {
+   fprintf(stderr, "recv(2) + MSG_PEEK, expected %zu, got %zi\n",
+   sizeof(buf_half), res);
+   exit(EXIT_FAILURE);
+   }
+
+   /* Peek whole data. */
+   res = recv(fd, buf_peek, sizeof(buf_peek), MSG_PEEK);
+   if (res != sizeof(buf_peek)) {
+   fprintf(stderr, "recv(2) + MSG_PEEK, expected %zu, got %zi\n",
+   sizeof(buf_peek), res);
+   exit(EXIT_FAILURE);
+   }
+
+   /* Compare partial and full peek. */
+   if (memcmp(buf_half, buf_peek, sizeof(buf_half))) {
+   fprintf(stderr, "Partial peek data mismatch\n");
+   exit(EXIT_FAILURE);
+   }
+
+   res = recv(fd, buf_normal, sizeof(buf_normal), 0);
+   if (res != sizeof(buf_normal)) {
+   fprintf(stderr, "recv(2), expected %zu, got %zi\n",
+   sizeof(buf_normal), res);
+   exit(EXIT_FAILURE);
+   }
+
+   /* Compare full peek and normal read. */
+   if (memcmp(buf_peek, buf_normal, sizeof(buf_peek))) {
+   fprintf(stderr, "Full peek data mismatch\n");
+   exit(EXIT_FAILURE);
+   }
+
close(fd);
}

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2 2/4] virtio/vsock: support MSG_PEEK for SOCK_SEQPACKET

2023-07-25 Thread Stefano Garzarella

On Wed, Jul 19, 2023 at 10:27:06PM +0300, Arseniy Krasnov wrote:

This adds support of MSG_PEEK flag for SOCK_SEQPACKET type of socket.
Difference with SOCK_STREAM is that this callback returns either length
of the message or error.

Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/virtio_transport_common.c | 63 +++--
1 file changed, 60 insertions(+), 3 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index 2ee40574c339..352d042b130b 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -460,6 +460,63 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
return err;
}

+static ssize_t
+virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk,
+  struct msghdr *msg)
+{
+   struct virtio_vsock_sock *vvs = vsk->trans;
+   struct sk_buff *skb;
+   size_t total, len;
+
+   spin_lock_bh(&vvs->rx_lock);
+
+   if (!vvs->msg_count) {
+   spin_unlock_bh(&vvs->rx_lock);
+   return 0;
+   }
+
+   total = 0;
+   len = msg_data_left(msg);
+
+   skb_queue_walk(&vvs->rx_queue, skb) {
+   struct virtio_vsock_hdr *hdr;
+
+   if (total < len) {
+   size_t bytes;
+   int err;
+
+   bytes = len - total;
+   if (bytes > skb->len)
+   bytes = skb->len;
+
+   spin_unlock_bh(&vvs->rx_lock);
+
+   /* sk_lock is held by caller so no one else can dequeue.
+* Unlock rx_lock since memcpy_to_msg() may sleep.
+*/
+   err = memcpy_to_msg(msg, skb->data, bytes);
+   if (err)
+   return err;
+
+   spin_lock_bh(&vvs->rx_lock);
+   }
+
+   total += skb->len;
+   hdr = virtio_vsock_hdr(skb);
+
+   if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM) {
+   if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOR)
+   msg->msg_flags |= MSG_EOR;
+
+   break;
+   }
+   }
+
+   spin_unlock_bh(&vvs->rx_lock);
+
+   return total;
+}
+
static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
 struct msghdr *msg,
 int flags)
@@ -554,9 +611,9 @@ virtio_transport_seqpacket_dequeue(struct vsock_sock *vsk,
   int flags)
{
if (flags & MSG_PEEK)
-   return -EOPNOTSUPP;
-
-   return virtio_transport_seqpacket_do_dequeue(vsk, msg, flags);
+   return virtio_transport_seqpacket_do_peek(vsk, msg);
+   else
+   return virtio_transport_seqpacket_do_dequeue(vsk, msg, flags);
}
EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_dequeue);

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2 1/4] virtio/vsock: rework MSG_PEEK for SOCK_STREAM

2023-07-25 Thread Stefano Garzarella

On Wed, Jul 19, 2023 at 10:27:05PM +0300, Arseniy Krasnov wrote:

This reworks current implementation of MSG_PEEK logic:
1) Replaces 'skb_queue_walk_safe()' with 'skb_queue_walk()'. There is
  no need in the first one, as there are no removes of skb in loop.
2) Removes nested while loop - MSG_PEEK logic could be implemented
  without it: just iterate over skbs without removing it and copy
  data from each until destination buffer is not full.

Signed-off-by: Arseniy Krasnov 
Reviewed-by: Bobby Eshleman 
---
net/vmw_vsock/virtio_transport_common.c | 41 -
1 file changed, 19 insertions(+), 22 deletions(-)


Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index b769fc258931..2ee40574c339 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -348,37 +348,34 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
size_t len)
{
struct virtio_vsock_sock *vvs = vsk->trans;
-   size_t bytes, total = 0, off;
-   struct sk_buff *skb, *tmp;
-   int err = -EFAULT;
+   struct sk_buff *skb;
+   size_t total = 0;
+   int err;

spin_lock_bh(&vvs->rx_lock);

-   skb_queue_walk_safe(&vvs->rx_queue, skb,  tmp) {
-   off = 0;
+   skb_queue_walk(&vvs->rx_queue, skb) {
+   size_t bytes;

-   if (total == len)
-   break;
+   bytes = len - total;
+   if (bytes > skb->len)
+   bytes = skb->len;

-   while (total < len && off < skb->len) {
-   bytes = len - total;
-   if (bytes > skb->len - off)
-   bytes = skb->len - off;
+   spin_unlock_bh(&vvs->rx_lock);

-   /* sk_lock is held by caller so no one else can dequeue.
-* Unlock rx_lock since memcpy_to_msg() may sleep.
-*/
-   spin_unlock_bh(&vvs->rx_lock);
+   /* sk_lock is held by caller so no one else can dequeue.
+* Unlock rx_lock since memcpy_to_msg() may sleep.
+*/
+   err = memcpy_to_msg(msg, skb->data, bytes);
+   if (err)
+   goto out;

-   err = memcpy_to_msg(msg, skb->data + off, bytes);
-   if (err)
-   goto out;
+   total += bytes;

-   spin_lock_bh(&vvs->rx_lock);
+   spin_lock_bh(&vvs->rx_lock);

-   total += bytes;
-   off += bytes;
-   }
+   if (total == len)
+   break;
}

spin_unlock_bh(&vvs->rx_lock);
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 04:28:14PM +0300, Arseniy Krasnov wrote:
> 
> 
> On 25.07.2023 16:22, Michael S. Tsirkin wrote:
> > On Tue, Jul 25, 2023 at 04:04:13PM +0300, Arseniy Krasnov wrote:
> >>
> >>
> >> On 25.07.2023 14:50, Michael S. Tsirkin wrote:
> >>> On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> 
> 
>  On 21.07.2023 00:42, Arseniy Krasnov wrote:
> > This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> > flag is set and zerocopy transmission is possible (enabled in socket
> > options and transport allows zerocopy), then non-linear skb will be
> > created and filled with the pages of user's buffer. Pages of user's
> > buffer are locked in memory by 'get_user_pages()'. Second thing that
> > this patch does is replace type of skb owning: instead of calling
> > 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> > change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> > of socket, so to decrease this field correctly proper skb destructor is
> > needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> >
> > Signed-off-by: Arseniy Krasnov 
> > ---
> >  Changelog:
> >  v5(big patchset) -> v1:
> >   * Refactorings of 'if' conditions.
> >   * Remove extra blank line.
> >   * Remove 'frag_off' field unneeded init.
> >   * Add function 'virtio_transport_fill_skb()' which fills both linear
> > and non-linear skb with provided data.
> >  v1 -> v2:
> >   * Use original order of last four arguments in 
> > 'virtio_transport_alloc_skb()'.
> >  v2 -> v3:
> >   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> > provided 'iov_iter' with data could be sent in a zerocopy mode.
> > If this callback is not set in transport - transport allows to send
> > any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 
> > 'true'
> > then zerocopy is allowed. Reason of this callback is that in case of
> > G2H transmission we insert whole skb to the tx virtio queue and such
> > skb must fit to the size of the virtio queue to be sent in a single
> > iteration (may be tx logic in 'virtio_transport.c' could be reworked
> > as in vhost to support partial send of current skb). This callback
> > will be enabled only for G2H path. For details pls see comment 
> > 'Check that tx queue...' below.
> >
> >  include/net/af_vsock.h  |   3 +
> >  net/vmw_vsock/virtio_transport.c|  39 
> >  net/vmw_vsock/virtio_transport_common.c | 257 ++--
> >  3 files changed, 241 insertions(+), 58 deletions(-)
> >
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 0e7504a42925..a6b346eeeb8e 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -177,6 +177,9 @@ struct vsock_transport {
> >  
> > /* Read a single skb */
> > int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > +
> > +   /* Zero-copy. */
> > +   bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> >  };
> >  
> >  / CORE /
> > diff --git a/net/vmw_vsock/virtio_transport.c 
> > b/net/vmw_vsock/virtio_transport.c
> > index 7bbcc8093e51..23cb8ed638c4 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue 
> > *vq)
> > queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> >  }
> >  
> > +static bool virtio_transport_msgzerocopy_check_iov(const struct 
> > iov_iter *iov)
> > +{
> > +   struct virtio_vsock *vsock;
> > +   bool res = false;
> > +
> > +   rcu_read_lock();
> > +
> > +   vsock = rcu_dereference(the_virtio_vsock);
> > +   if (vsock) {
> > +   struct virtqueue *vq;
> > +   int iov_pages;
> > +
> > +   vq = vsock->vqs[VSOCK_VQ_TX];
> > +
> > +   iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> > +
> > +   /* Check that tx queue is large enough to keep whole
> > +* data to send. This is needed, because when there is
> > +* not enough free space in the queue, current skb to
> > +* send will be reinserted to the head of tx list of
> > +* the socket to retry transmission later, so if skb
> > +* is bigger than whole queue, it will be reinserted
> > +* again and again, thus blocking other skbs to be sent.
> > +* Each page of the user provided buffer will be added
> > +* as a single buffer t

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 04:10:40PM +0300, Arseniy Krasnov wrote:
> 
> 
> On 25.07.2023 14:59, Michael S. Tsirkin wrote:
> > On Tue, Jul 25, 2023 at 11:39:22AM +0300, Arseniy Krasnov wrote:
> >>
> >>
> >> On 25.07.2023 11:25, Michael S. Tsirkin wrote:
> >>> On Fri, Jul 21, 2023 at 12:42:45AM +0300, Arseniy Krasnov wrote:
>  This adds handling of MSG_ZEROCOPY flag on transmission path: if this
>  flag is set and zerocopy transmission is possible (enabled in socket
>  options and transport allows zerocopy), then non-linear skb will be
>  created and filled with the pages of user's buffer. Pages of user's
>  buffer are locked in memory by 'get_user_pages()'. Second thing that
>  this patch does is replace type of skb owning: instead of calling
>  'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
>  change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
>  of socket, so to decrease this field correctly proper skb destructor is
>  needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> 
>  Signed-off-by: Arseniy Krasnov 
>  ---
>   Changelog:
>   v5(big patchset) -> v1:
>    * Refactorings of 'if' conditions.
>    * Remove extra blank line.
>    * Remove 'frag_off' field unneeded init.
>    * Add function 'virtio_transport_fill_skb()' which fills both linear
>  and non-linear skb with provided data.
>   v1 -> v2:
>    * Use original order of last four arguments in 
>  'virtio_transport_alloc_skb()'.
>   v2 -> v3:
>    * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
>  provided 'iov_iter' with data could be sent in a zerocopy mode.
>  If this callback is not set in transport - transport allows to send
>  any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 
>  'true'
>  then zerocopy is allowed. Reason of this callback is that in case of
>  G2H transmission we insert whole skb to the tx virtio queue and such
>  skb must fit to the size of the virtio queue to be sent in a single
>  iteration (may be tx logic in 'virtio_transport.c' could be reworked
>  as in vhost to support partial send of current skb). This callback
>  will be enabled only for G2H path. For details pls see comment 
>  'Check that tx queue...' below.
> 
>   include/net/af_vsock.h  |   3 +
>   net/vmw_vsock/virtio_transport.c|  39 
>   net/vmw_vsock/virtio_transport_common.c | 257 ++--
>   3 files changed, 241 insertions(+), 58 deletions(-)
> 
>  diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
>  index 0e7504a42925..a6b346eeeb8e 100644
>  --- a/include/net/af_vsock.h
>  +++ b/include/net/af_vsock.h
>  @@ -177,6 +177,9 @@ struct vsock_transport {
>   
>   /* Read a single skb */
>   int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
>  +
>  +/* Zero-copy. */
>  +bool (*msgzerocopy_check_iov)(const struct iov_iter *);
>   };
>   
>   / CORE /
>  diff --git a/net/vmw_vsock/virtio_transport.c 
>  b/net/vmw_vsock/virtio_transport.c
>  index 7bbcc8093e51..23cb8ed638c4 100644
>  --- a/net/vmw_vsock/virtio_transport.c
>  +++ b/net/vmw_vsock/virtio_transport.c
>  @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue 
>  *vq)
>   queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>   }
>   
>  +static bool virtio_transport_msgzerocopy_check_iov(const struct 
>  iov_iter *iov)
>  +{
>  +struct virtio_vsock *vsock;
>  +bool res = false;
>  +
>  +rcu_read_lock();
>  +
>  +vsock = rcu_dereference(the_virtio_vsock);
>  +if (vsock) {
>  +struct virtqueue *vq;
>  +int iov_pages;
>  +
>  +vq = vsock->vqs[VSOCK_VQ_TX];
>  +
>  +iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
>  +
>  +/* Check that tx queue is large enough to keep whole
>  + * data to send. This is needed, because when there is
>  + * not enough free space in the queue, current skb to
>  + * send will be reinserted to the head of tx list of
>  + * the socket to retry transmission later, so if skb
>  + * is bigger than whole queue, it will be reinserted
>  + * again and again, thus blocking other skbs to be sent.
>  + * Each page of the user provided buffer will be added
>  + * as a single buffer to the tx virtqueue, so compare
>  + * number of pages against maximum capacity of the 
>  queue.
>  + 

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 04:04:13PM +0300, Arseniy Krasnov wrote:
> 
> 
> On 25.07.2023 14:50, Michael S. Tsirkin wrote:
> > On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> >>
> >>
> >> On 21.07.2023 00:42, Arseniy Krasnov wrote:
> >>> This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> >>> flag is set and zerocopy transmission is possible (enabled in socket
> >>> options and transport allows zerocopy), then non-linear skb will be
> >>> created and filled with the pages of user's buffer. Pages of user's
> >>> buffer are locked in memory by 'get_user_pages()'. Second thing that
> >>> this patch does is replace type of skb owning: instead of calling
> >>> 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> >>> change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> >>> of socket, so to decrease this field correctly proper skb destructor is
> >>> needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> >>>
> >>> Signed-off-by: Arseniy Krasnov 
> >>> ---
> >>>  Changelog:
> >>>  v5(big patchset) -> v1:
> >>>   * Refactorings of 'if' conditions.
> >>>   * Remove extra blank line.
> >>>   * Remove 'frag_off' field unneeded init.
> >>>   * Add function 'virtio_transport_fill_skb()' which fills both linear
> >>> and non-linear skb with provided data.
> >>>  v1 -> v2:
> >>>   * Use original order of last four arguments in 
> >>> 'virtio_transport_alloc_skb()'.
> >>>  v2 -> v3:
> >>>   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> >>> provided 'iov_iter' with data could be sent in a zerocopy mode.
> >>> If this callback is not set in transport - transport allows to send
> >>> any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 
> >>> 'true'
> >>> then zerocopy is allowed. Reason of this callback is that in case of
> >>> G2H transmission we insert whole skb to the tx virtio queue and such
> >>> skb must fit to the size of the virtio queue to be sent in a single
> >>> iteration (may be tx logic in 'virtio_transport.c' could be reworked
> >>> as in vhost to support partial send of current skb). This callback
> >>> will be enabled only for G2H path. For details pls see comment 
> >>> 'Check that tx queue...' below.
> >>>
> >>>  include/net/af_vsock.h  |   3 +
> >>>  net/vmw_vsock/virtio_transport.c|  39 
> >>>  net/vmw_vsock/virtio_transport_common.c | 257 ++--
> >>>  3 files changed, 241 insertions(+), 58 deletions(-)
> >>>
> >>> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> >>> index 0e7504a42925..a6b346eeeb8e 100644
> >>> --- a/include/net/af_vsock.h
> >>> +++ b/include/net/af_vsock.h
> >>> @@ -177,6 +177,9 @@ struct vsock_transport {
> >>>  
> >>>   /* Read a single skb */
> >>>   int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> >>> +
> >>> + /* Zero-copy. */
> >>> + bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> >>>  };
> >>>  
> >>>  / CORE /
> >>> diff --git a/net/vmw_vsock/virtio_transport.c 
> >>> b/net/vmw_vsock/virtio_transport.c
> >>> index 7bbcc8093e51..23cb8ed638c4 100644
> >>> --- a/net/vmw_vsock/virtio_transport.c
> >>> +++ b/net/vmw_vsock/virtio_transport.c
> >>> @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue 
> >>> *vq)
> >>>   queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> >>>  }
> >>>  
> >>> +static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter 
> >>> *iov)
> >>> +{
> >>> + struct virtio_vsock *vsock;
> >>> + bool res = false;
> >>> +
> >>> + rcu_read_lock();
> >>> +
> >>> + vsock = rcu_dereference(the_virtio_vsock);
> >>> + if (vsock) {
> >>> + struct virtqueue *vq;
> >>> + int iov_pages;
> >>> +
> >>> + vq = vsock->vqs[VSOCK_VQ_TX];
> >>> +
> >>> + iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> >>> +
> >>> + /* Check that tx queue is large enough to keep whole
> >>> +  * data to send. This is needed, because when there is
> >>> +  * not enough free space in the queue, current skb to
> >>> +  * send will be reinserted to the head of tx list of
> >>> +  * the socket to retry transmission later, so if skb
> >>> +  * is bigger than whole queue, it will be reinserted
> >>> +  * again and again, thus blocking other skbs to be sent.
> >>> +  * Each page of the user provided buffer will be added
> >>> +  * as a single buffer to the tx virtqueue, so compare
> >>> +  * number of pages against maximum capacity of the queue.
> >>> +  * +1 means buffer for the packet header.
> >>> +  */
> >>> + if (iov_pages + 1 <= vq->num_max)
> >>
> >> I think this check is actual only for case one we don't have indirect 
> >> buffer feature.
> >> With indirect mode whole data to send will be packed into one indirect 
> >> buffer.
> >>
> >> Thanks, Arseniy
> > 
> > A

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Stefano Garzarella

On Tue, Jul 25, 2023 at 09:06:02AM -0400, Michael S. Tsirkin wrote:

On Tue, Jul 25, 2023 at 02:53:39PM +0200, Stefano Garzarella wrote:

On Tue, Jul 25, 2023 at 07:50:53AM -0400, Michael S. Tsirkin wrote:
> On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> >
> >
> > On 21.07.2023 00:42, Arseniy Krasnov wrote:
> > > This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> > > flag is set and zerocopy transmission is possible (enabled in socket
> > > options and transport allows zerocopy), then non-linear skb will be
> > > created and filled with the pages of user's buffer. Pages of user's
> > > buffer are locked in memory by 'get_user_pages()'. Second thing that
> > > this patch does is replace type of skb owning: instead of calling
> > > 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> > > change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> > > of socket, so to decrease this field correctly proper skb destructor is
> > > needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> > >
> > > Signed-off-by: Arseniy Krasnov 
> > > ---
> > >  Changelog:
> > >  v5(big patchset) -> v1:
> > >   * Refactorings of 'if' conditions.
> > >   * Remove extra blank line.
> > >   * Remove 'frag_off' field unneeded init.
> > >   * Add function 'virtio_transport_fill_skb()' which fills both linear
> > > and non-linear skb with provided data.
> > >  v1 -> v2:
> > >   * Use original order of last four arguments in 
'virtio_transport_alloc_skb()'.
> > >  v2 -> v3:
> > >   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> > > provided 'iov_iter' with data could be sent in a zerocopy mode.
> > > If this callback is not set in transport - transport allows to send
> > > any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 
'true'
> > > then zerocopy is allowed. Reason of this callback is that in case of
> > > G2H transmission we insert whole skb to the tx virtio queue and such
> > > skb must fit to the size of the virtio queue to be sent in a single
> > > iteration (may be tx logic in 'virtio_transport.c' could be reworked
> > > as in vhost to support partial send of current skb). This callback
> > > will be enabled only for G2H path. For details pls see comment
> > > 'Check that tx queue...' below.
> > >
> > >  include/net/af_vsock.h  |   3 +
> > >  net/vmw_vsock/virtio_transport.c|  39 
> > >  net/vmw_vsock/virtio_transport_common.c | 257 ++--
> > >  3 files changed, 241 insertions(+), 58 deletions(-)
> > >
> > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > index 0e7504a42925..a6b346eeeb8e 100644
> > > --- a/include/net/af_vsock.h
> > > +++ b/include/net/af_vsock.h
> > > @@ -177,6 +177,9 @@ struct vsock_transport {
> > >
> > >  /* Read a single skb */
> > >  int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > > +
> > > +/* Zero-copy. */
> > > +bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> > >  };
> > >
> > >  / CORE /
> > > diff --git a/net/vmw_vsock/virtio_transport.c 
b/net/vmw_vsock/virtio_transport.c
> > > index 7bbcc8093e51..23cb8ed638c4 100644
> > > --- a/net/vmw_vsock/virtio_transport.c
> > > +++ b/net/vmw_vsock/virtio_transport.c
> > > @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue 
*vq)
> > >  queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > >  }
> > >
> > > +static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter 
*iov)
> > > +{
> > > +struct virtio_vsock *vsock;
> > > +bool res = false;
> > > +
> > > +rcu_read_lock();
> > > +
> > > +vsock = rcu_dereference(the_virtio_vsock);
> > > +if (vsock) {
> > > +struct virtqueue *vq;
> > > +int iov_pages;
> > > +
> > > +vq = vsock->vqs[VSOCK_VQ_TX];
> > > +
> > > +iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> > > +
> > > +/* Check that tx queue is large enough to keep whole
> > > + * data to send. This is needed, because when there is
> > > + * not enough free space in the queue, current skb to
> > > + * send will be reinserted to the head of tx list of
> > > + * the socket to retry transmission later, so if skb
> > > + * is bigger than whole queue, it will be reinserted
> > > + * again and again, thus blocking other skbs to be sent.
> > > + * Each page of the user provided buffer will be added
> > > + * as a single buffer to the tx virtqueue, so compare
> > > + * number of pages against maximum capacity of the queue.
> > > + * +1 means buffer for the packet header.
> > > + */
> > > +if (iov_pages + 1 <= vq->num

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 02:53:39PM +0200, Stefano Garzarella wrote:
> On Tue, Jul 25, 2023 at 07:50:53AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> > > 
> > > 
> > > On 21.07.2023 00:42, Arseniy Krasnov wrote:
> > > > This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> > > > flag is set and zerocopy transmission is possible (enabled in socket
> > > > options and transport allows zerocopy), then non-linear skb will be
> > > > created and filled with the pages of user's buffer. Pages of user's
> > > > buffer are locked in memory by 'get_user_pages()'. Second thing that
> > > > this patch does is replace type of skb owning: instead of calling
> > > > 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> > > > change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> > > > of socket, so to decrease this field correctly proper skb destructor is
> > > > needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> > > >
> > > > Signed-off-by: Arseniy Krasnov 
> > > > ---
> > > >  Changelog:
> > > >  v5(big patchset) -> v1:
> > > >   * Refactorings of 'if' conditions.
> > > >   * Remove extra blank line.
> > > >   * Remove 'frag_off' field unneeded init.
> > > >   * Add function 'virtio_transport_fill_skb()' which fills both linear
> > > > and non-linear skb with provided data.
> > > >  v1 -> v2:
> > > >   * Use original order of last four arguments in 
> > > > 'virtio_transport_alloc_skb()'.
> > > >  v2 -> v3:
> > > >   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> > > > provided 'iov_iter' with data could be sent in a zerocopy mode.
> > > > If this callback is not set in transport - transport allows to send
> > > > any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 
> > > > 'true'
> > > > then zerocopy is allowed. Reason of this callback is that in case of
> > > > G2H transmission we insert whole skb to the tx virtio queue and such
> > > > skb must fit to the size of the virtio queue to be sent in a single
> > > > iteration (may be tx logic in 'virtio_transport.c' could be reworked
> > > > as in vhost to support partial send of current skb). This callback
> > > > will be enabled only for G2H path. For details pls see comment
> > > > 'Check that tx queue...' below.
> > > >
> > > >  include/net/af_vsock.h  |   3 +
> > > >  net/vmw_vsock/virtio_transport.c|  39 
> > > >  net/vmw_vsock/virtio_transport_common.c | 257 ++--
> > > >  3 files changed, 241 insertions(+), 58 deletions(-)
> > > >
> > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > index 0e7504a42925..a6b346eeeb8e 100644
> > > > --- a/include/net/af_vsock.h
> > > > +++ b/include/net/af_vsock.h
> > > > @@ -177,6 +177,9 @@ struct vsock_transport {
> > > >
> > > > /* Read a single skb */
> > > > int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > > > +
> > > > +   /* Zero-copy. */
> > > > +   bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> > > >  };
> > > >
> > > >  / CORE /
> > > > diff --git a/net/vmw_vsock/virtio_transport.c 
> > > > b/net/vmw_vsock/virtio_transport.c
> > > > index 7bbcc8093e51..23cb8ed638c4 100644
> > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue 
> > > > *vq)
> > > > queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > > >  }
> > > >
> > > > +static bool virtio_transport_msgzerocopy_check_iov(const struct 
> > > > iov_iter *iov)
> > > > +{
> > > > +   struct virtio_vsock *vsock;
> > > > +   bool res = false;
> > > > +
> > > > +   rcu_read_lock();
> > > > +
> > > > +   vsock = rcu_dereference(the_virtio_vsock);
> > > > +   if (vsock) {
> > > > +   struct virtqueue *vq;
> > > > +   int iov_pages;
> > > > +
> > > > +   vq = vsock->vqs[VSOCK_VQ_TX];
> > > > +
> > > > +   iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> > > > +
> > > > +   /* Check that tx queue is large enough to keep whole
> > > > +* data to send. This is needed, because when there is
> > > > +* not enough free space in the queue, current skb to
> > > > +* send will be reinserted to the head of tx list of
> > > > +* the socket to retry transmission later, so if skb
> > > > +* is bigger than whole queue, it will be reinserted
> > > > +* again and again, thus blocking other skbs to be sent.
> > > > +* Each page of the user provided buffer will be added
> > > > +* as a single buffer to the tx virtqueue, so compare
> > > > +* number of pages against maximum capacity of the 
> > > > queue.
>

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Stefano Garzarella

On Tue, Jul 25, 2023 at 07:50:53AM -0400, Michael S. Tsirkin wrote:

On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:



On 21.07.2023 00:42, Arseniy Krasnov wrote:
> This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> flag is set and zerocopy transmission is possible (enabled in socket
> options and transport allows zerocopy), then non-linear skb will be
> created and filled with the pages of user's buffer. Pages of user's
> buffer are locked in memory by 'get_user_pages()'. Second thing that
> this patch does is replace type of skb owning: instead of calling
> 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> of socket, so to decrease this field correctly proper skb destructor is
> needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
>
> Signed-off-by: Arseniy Krasnov 
> ---
>  Changelog:
>  v5(big patchset) -> v1:
>   * Refactorings of 'if' conditions.
>   * Remove extra blank line.
>   * Remove 'frag_off' field unneeded init.
>   * Add function 'virtio_transport_fill_skb()' which fills both linear
> and non-linear skb with provided data.
>  v1 -> v2:
>   * Use original order of last four arguments in 
'virtio_transport_alloc_skb()'.
>  v2 -> v3:
>   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> provided 'iov_iter' with data could be sent in a zerocopy mode.
> If this callback is not set in transport - transport allows to send
> any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
> then zerocopy is allowed. Reason of this callback is that in case of
> G2H transmission we insert whole skb to the tx virtio queue and such
> skb must fit to the size of the virtio queue to be sent in a single
> iteration (may be tx logic in 'virtio_transport.c' could be reworked
> as in vhost to support partial send of current skb). This callback
> will be enabled only for G2H path. For details pls see comment
> 'Check that tx queue...' below.
>
>  include/net/af_vsock.h  |   3 +
>  net/vmw_vsock/virtio_transport.c|  39 
>  net/vmw_vsock/virtio_transport_common.c | 257 ++--
>  3 files changed, 241 insertions(+), 58 deletions(-)
>
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 0e7504a42925..a6b346eeeb8e 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -177,6 +177,9 @@ struct vsock_transport {
>
>/* Read a single skb */
>int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> +
> +  /* Zero-copy. */
> +  bool (*msgzerocopy_check_iov)(const struct iov_iter *);
>  };
>
>  / CORE /
> diff --git a/net/vmw_vsock/virtio_transport.c 
b/net/vmw_vsock/virtio_transport.c
> index 7bbcc8093e51..23cb8ed638c4 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>  }
>
> +static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter 
*iov)
> +{
> +  struct virtio_vsock *vsock;
> +  bool res = false;
> +
> +  rcu_read_lock();
> +
> +  vsock = rcu_dereference(the_virtio_vsock);
> +  if (vsock) {
> +  struct virtqueue *vq;
> +  int iov_pages;
> +
> +  vq = vsock->vqs[VSOCK_VQ_TX];
> +
> +  iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> +
> +  /* Check that tx queue is large enough to keep whole
> +   * data to send. This is needed, because when there is
> +   * not enough free space in the queue, current skb to
> +   * send will be reinserted to the head of tx list of
> +   * the socket to retry transmission later, so if skb
> +   * is bigger than whole queue, it will be reinserted
> +   * again and again, thus blocking other skbs to be sent.
> +   * Each page of the user provided buffer will be added
> +   * as a single buffer to the tx virtqueue, so compare
> +   * number of pages against maximum capacity of the queue.
> +   * +1 means buffer for the packet header.
> +   */
> +  if (iov_pages + 1 <= vq->num_max)

I think this check is actual only for case one we don't have indirect buffer 
feature.
With indirect mode whole data to send will be packed into one indirect buffer.

Thanks, Arseniy


Actually the reverse. With indirect you are limited to num_max.
Without you are limited to whatever space is left in the
queue (which you did not check here, so you should).



> +  res = true;
> +  }
> +
> +  rcu_read_unlock();


Just curious:
is the point of all this RCU dance to allow vsock
to change from under us? then why is it ok to
have it change? the virtio_transport_msgzerocopy_check_iov
will then refer to the old vsock ...


IIRC we introduced the RCU to

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Stefano Garzarella

On Tue, Jul 25, 2023 at 08:39:17AM -0400, Michael S. Tsirkin wrote:

On Tue, Jul 25, 2023 at 02:28:02PM +0200, Stefano Garzarella wrote:

On Tue, Jul 25, 2023 at 12:16:11PM +0300, Arseniy Krasnov wrote:
>
>
> On 25.07.2023 11:46, Arseniy Krasnov wrote:
> >
> >
> > On 25.07.2023 11:43, Stefano Garzarella wrote:
> > > On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> > > >
> > > >
> > > > On 21.07.2023 00:42, Arseniy Krasnov wrote:
> > > > > This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> > > > > flag is set and zerocopy transmission is possible (enabled in socket
> > > > > options and transport allows zerocopy), then non-linear skb will be
> > > > > created and filled with the pages of user's buffer. Pages of user's
> > > > > buffer are locked in memory by 'get_user_pages()'. Second thing that
> > > > > this patch does is replace type of skb owning: instead of calling
> > > > > 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> > > > > change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> > > > > of socket, so to decrease this field correctly proper skb destructor 
is
> > > > > needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> > > > >
> > > > > Signed-off-by: Arseniy Krasnov 
> > > > > ---
> > > > >  Changelog:
> > > > >  v5(big patchset) -> v1:
> > > > >   * Refactorings of 'if' conditions.
> > > > >   * Remove extra blank line.
> > > > >   * Remove 'frag_off' field unneeded init.
> > > > >   * Add function 'virtio_transport_fill_skb()' which fills both linear
> > > > >     and non-linear skb with provided data.
> > > > >  v1 -> v2:
> > > > >   * Use original order of last four arguments in 
'virtio_transport_alloc_skb()'.
> > > > >  v2 -> v3:
> > > > >   * Add new transport callback: 'msgzerocopy_check_iov'. It checks 
that
> > > > >     provided 'iov_iter' with data could be sent in a zerocopy mode.
> > > > >     If this callback is not set in transport - transport allows to 
send
> > > > >     any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 
'true'
> > > > >     then zerocopy is allowed. Reason of this callback is that in case 
of
> > > > >     G2H transmission we insert whole skb to the tx virtio queue and 
such
> > > > >     skb must fit to the size of the virtio queue to be sent in a 
single
> > > > >     iteration (may be tx logic in 'virtio_transport.c' could be 
reworked
> > > > >     as in vhost to support partial send of current skb). This callback
> > > > >     will be enabled only for G2H path. For details pls see comment
> > > > >     'Check that tx queue...' below.
> > > > >
> > > > >  include/net/af_vsock.h  |   3 +
> > > > >  net/vmw_vsock/virtio_transport.c    |  39 
> > > > >  net/vmw_vsock/virtio_transport_common.c | 257 
++--
> > > > >  3 files changed, 241 insertions(+), 58 deletions(-)
> > > > >
> > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > index 0e7504a42925..a6b346eeeb8e 100644
> > > > > --- a/include/net/af_vsock.h
> > > > > +++ b/include/net/af_vsock.h
> > > > > @@ -177,6 +177,9 @@ struct vsock_transport {
> > > > >
> > > > >  /* Read a single skb */
> > > > >  int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > > > > +
> > > > > +    /* Zero-copy. */
> > > > > +    bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> > > > >  };
> > > > >
> > > > >  / CORE /
> > > > > diff --git a/net/vmw_vsock/virtio_transport.c 
b/net/vmw_vsock/virtio_transport.c
> > > > > index 7bbcc8093e51..23cb8ed638c4 100644
> > > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > > @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct 
virtqueue *vq)
> > > > >  queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > > > >  }
> > > > >
> > > > > +static bool
> > > > > virtio_transport_msgzerocopy_check_iov(const struct
> > > > > iov_iter *iov)
> > > > > +{
> > > > > +    struct virtio_vsock *vsock;
> > > > > +    bool res = false;
> > > > > +
> > > > > +    rcu_read_lock();
> > > > > +
> > > > > +    vsock = rcu_dereference(the_virtio_vsock);
> > > > > +    if (vsock) {

Just noted, what about the following to reduce the indentation?

if (!vsock) {
goto out;
}


no {} pls


ooops, true, too much QEMU code today, but luckily checkpatch would have
spotted it ;-)




...
...
out:
rcu_read_unlock();
return res;


indentation is quite modest here. Not sure goto is worth it.


It's a pattern we follow a lot in this file, and I find the early
return/goto more readable.
Anyway, I don't have a strong opinion, it's fine the way it is now,
actually we don't have too many indentations for now in this function.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linu

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 02:28:02PM +0200, Stefano Garzarella wrote:
> On Tue, Jul 25, 2023 at 12:16:11PM +0300, Arseniy Krasnov wrote:
> > 
> > 
> > On 25.07.2023 11:46, Arseniy Krasnov wrote:
> > > 
> > > 
> > > On 25.07.2023 11:43, Stefano Garzarella wrote:
> > > > On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> > > > > 
> > > > > 
> > > > > On 21.07.2023 00:42, Arseniy Krasnov wrote:
> > > > > > This adds handling of MSG_ZEROCOPY flag on transmission path: if 
> > > > > > this
> > > > > > flag is set and zerocopy transmission is possible (enabled in socket
> > > > > > options and transport allows zerocopy), then non-linear skb will be
> > > > > > created and filled with the pages of user's buffer. Pages of user's
> > > > > > buffer are locked in memory by 'get_user_pages()'. Second thing that
> > > > > > this patch does is replace type of skb owning: instead of calling
> > > > > > 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of 
> > > > > > this
> > > > > > change is that '__zerocopy_sg_from_iter()' increments 
> > > > > > 'sk_wmem_alloc'
> > > > > > of socket, so to decrease this field correctly proper skb 
> > > > > > destructor is
> > > > > > needed: 'sock_wfree()'. This destructor is set by 
> > > > > > 'skb_set_owner_w()'.
> > > > > > 
> > > > > > Signed-off-by: Arseniy Krasnov 
> > > > > > ---
> > > > > >  Changelog:
> > > > > >  v5(big patchset) -> v1:
> > > > > >   * Refactorings of 'if' conditions.
> > > > > >   * Remove extra blank line.
> > > > > >   * Remove 'frag_off' field unneeded init.
> > > > > >   * Add function 'virtio_transport_fill_skb()' which fills both 
> > > > > > linear
> > > > > >     and non-linear skb with provided data.
> > > > > >  v1 -> v2:
> > > > > >   * Use original order of last four arguments in 
> > > > > > 'virtio_transport_alloc_skb()'.
> > > > > >  v2 -> v3:
> > > > > >   * Add new transport callback: 'msgzerocopy_check_iov'. It checks 
> > > > > > that
> > > > > >     provided 'iov_iter' with data could be sent in a zerocopy mode.
> > > > > >     If this callback is not set in transport - transport allows to 
> > > > > > send
> > > > > >     any 'iov_iter' in zerocopy mode. Otherwise - if callback 
> > > > > > returns 'true'
> > > > > >     then zerocopy is allowed. Reason of this callback is that in 
> > > > > > case of
> > > > > >     G2H transmission we insert whole skb to the tx virtio queue and 
> > > > > > such
> > > > > >     skb must fit to the size of the virtio queue to be sent in a 
> > > > > > single
> > > > > >     iteration (may be tx logic in 'virtio_transport.c' could be 
> > > > > > reworked
> > > > > >     as in vhost to support partial send of current skb). This 
> > > > > > callback
> > > > > >     will be enabled only for G2H path. For details pls see comment
> > > > > >     'Check that tx queue...' below.
> > > > > > 
> > > > > >  include/net/af_vsock.h  |   3 +
> > > > > >  net/vmw_vsock/virtio_transport.c    |  39 
> > > > > >  net/vmw_vsock/virtio_transport_common.c | 257 
> > > > > > ++--
> > > > > >  3 files changed, 241 insertions(+), 58 deletions(-)
> > > > > > 
> > > > > > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > > > > > index 0e7504a42925..a6b346eeeb8e 100644
> > > > > > --- a/include/net/af_vsock.h
> > > > > > +++ b/include/net/af_vsock.h
> > > > > > @@ -177,6 +177,9 @@ struct vsock_transport {
> > > > > > 
> > > > > >  /* Read a single skb */
> > > > > >  int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > > > > > +
> > > > > > +    /* Zero-copy. */
> > > > > > +    bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> > > > > >  };
> > > > > > 
> > > > > >  / CORE /
> > > > > > diff --git a/net/vmw_vsock/virtio_transport.c 
> > > > > > b/net/vmw_vsock/virtio_transport.c
> > > > > > index 7bbcc8093e51..23cb8ed638c4 100644
> > > > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > > > @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct 
> > > > > > virtqueue *vq)
> > > > > >  queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> > > > > >  }
> > > > > > 
> > > > > > +static bool
> > > > > > virtio_transport_msgzerocopy_check_iov(const struct
> > > > > > iov_iter *iov)
> > > > > > +{
> > > > > > +    struct virtio_vsock *vsock;
> > > > > > +    bool res = false;
> > > > > > +
> > > > > > +    rcu_read_lock();
> > > > > > +
> > > > > > +    vsock = rcu_dereference(the_virtio_vsock);
> > > > > > +    if (vsock) {
> 
> Just noted, what about the following to reduce the indentation?
> 
> if (!vsock) {
> goto out;
> }

no {} pls

> ...
> ...
> out:
> rcu_read_unlock();
> return res;

indentation is quite modest here. Not sure goto is worth it.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://li

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Stefano Garzarella

On Tue, Jul 25, 2023 at 12:16:11PM +0300, Arseniy Krasnov wrote:



On 25.07.2023 11:46, Arseniy Krasnov wrote:



On 25.07.2023 11:43, Stefano Garzarella wrote:

On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:



On 21.07.2023 00:42, Arseniy Krasnov wrote:

This adds handling of MSG_ZEROCOPY flag on transmission path: if this
flag is set and zerocopy transmission is possible (enabled in socket
options and transport allows zerocopy), then non-linear skb will be
created and filled with the pages of user's buffer. Pages of user's
buffer are locked in memory by 'get_user_pages()'. Second thing that
this patch does is replace type of skb owning: instead of calling
'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
of socket, so to decrease this field correctly proper skb destructor is
needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.

Signed-off-by: Arseniy Krasnov 
---
 Changelog:
 v5(big patchset) -> v1:
  * Refactorings of 'if' conditions.
  * Remove extra blank line.
  * Remove 'frag_off' field unneeded init.
  * Add function 'virtio_transport_fill_skb()' which fills both linear
    and non-linear skb with provided data.
 v1 -> v2:
  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
 v2 -> v3:
  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
    provided 'iov_iter' with data could be sent in a zerocopy mode.
    If this callback is not set in transport - transport allows to send
    any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
    then zerocopy is allowed. Reason of this callback is that in case of
    G2H transmission we insert whole skb to the tx virtio queue and such
    skb must fit to the size of the virtio queue to be sent in a single
    iteration (may be tx logic in 'virtio_transport.c' could be reworked
    as in vhost to support partial send of current skb). This callback
    will be enabled only for G2H path. For details pls see comment
    'Check that tx queue...' below.

 include/net/af_vsock.h  |   3 +
 net/vmw_vsock/virtio_transport.c    |  39 
 net/vmw_vsock/virtio_transport_common.c | 257 ++--
 3 files changed, 241 insertions(+), 58 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 0e7504a42925..a6b346eeeb8e 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -177,6 +177,9 @@ struct vsock_transport {

 /* Read a single skb */
 int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
+
+    /* Zero-copy. */
+    bool (*msgzerocopy_check_iov)(const struct iov_iter *);
 };

 / CORE /
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 7bbcc8093e51..23cb8ed638c4 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
 queue_work(virtio_vsock_workqueue, &vsock->rx_work);
 }

+static bool virtio_transport_msgzerocopy_check_iov(const struct 
iov_iter *iov)

+{
+    struct virtio_vsock *vsock;
+    bool res = false;
+
+    rcu_read_lock();
+
+    vsock = rcu_dereference(the_virtio_vsock);
+    if (vsock) {


Just noted, what about the following to reduce the indentation?

if (!vsock) {
goto out;
}
...
...
out:
rcu_read_unlock();
return res;


+    struct virtqueue *vq;
+    int iov_pages;
+
+    vq = vsock->vqs[VSOCK_VQ_TX];
+
+    iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
+
+    /* Check that tx queue is large enough to keep whole
+ * data to send. This is needed, because when there is
+ * not enough free space in the queue, current skb to
+ * send will be reinserted to the head of tx list of
+ * the socket to retry transmission later, so if skb
+ * is bigger than whole queue, it will be reinserted
+ * again and again, thus blocking other skbs to be sent.
+ * Each page of the user provided buffer will be added
+ * as a single buffer to the tx virtqueue, so compare
+ * number of pages against maximum capacity of the queue.
+ * +1 means buffer for the packet header.
+ */
+    if (iov_pages + 1 <= vq->num_max)


I think this check is actual only for case one we don't have indirect buffer 
feature.
With indirect mode whole data to send will be packed into one indirect buffer.


I think so.
So, should we check also that here?



Thanks, Arseniy


+    res = true;
+    }
+
+    rcu_read_unlock();
+
+    return res;
+}
+
 static bool virtio_transport_seqpacket_allow(u32 remote_cid);

 static struct virtio_transport virtio_transport = {
@@ -475,6 +512,8 @@ static struct virtio_transport virtio_transport = {
 .seqpacket_allow  = virtio_t

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 11:39:22AM +0300, Arseniy Krasnov wrote:
> 
> 
> On 25.07.2023 11:25, Michael S. Tsirkin wrote:
> > On Fri, Jul 21, 2023 at 12:42:45AM +0300, Arseniy Krasnov wrote:
> >> This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> >> flag is set and zerocopy transmission is possible (enabled in socket
> >> options and transport allows zerocopy), then non-linear skb will be
> >> created and filled with the pages of user's buffer. Pages of user's
> >> buffer are locked in memory by 'get_user_pages()'. Second thing that
> >> this patch does is replace type of skb owning: instead of calling
> >> 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> >> change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> >> of socket, so to decrease this field correctly proper skb destructor is
> >> needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> >>
> >> Signed-off-by: Arseniy Krasnov 
> >> ---
> >>  Changelog:
> >>  v5(big patchset) -> v1:
> >>   * Refactorings of 'if' conditions.
> >>   * Remove extra blank line.
> >>   * Remove 'frag_off' field unneeded init.
> >>   * Add function 'virtio_transport_fill_skb()' which fills both linear
> >> and non-linear skb with provided data.
> >>  v1 -> v2:
> >>   * Use original order of last four arguments in 
> >> 'virtio_transport_alloc_skb()'.
> >>  v2 -> v3:
> >>   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> >> provided 'iov_iter' with data could be sent in a zerocopy mode.
> >> If this callback is not set in transport - transport allows to send
> >> any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
> >> then zerocopy is allowed. Reason of this callback is that in case of
> >> G2H transmission we insert whole skb to the tx virtio queue and such
> >> skb must fit to the size of the virtio queue to be sent in a single
> >> iteration (may be tx logic in 'virtio_transport.c' could be reworked
> >> as in vhost to support partial send of current skb). This callback
> >> will be enabled only for G2H path. For details pls see comment 
> >> 'Check that tx queue...' below.
> >>
> >>  include/net/af_vsock.h  |   3 +
> >>  net/vmw_vsock/virtio_transport.c|  39 
> >>  net/vmw_vsock/virtio_transport_common.c | 257 ++--
> >>  3 files changed, 241 insertions(+), 58 deletions(-)
> >>
> >> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> >> index 0e7504a42925..a6b346eeeb8e 100644
> >> --- a/include/net/af_vsock.h
> >> +++ b/include/net/af_vsock.h
> >> @@ -177,6 +177,9 @@ struct vsock_transport {
> >>  
> >>/* Read a single skb */
> >>int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> >> +
> >> +  /* Zero-copy. */
> >> +  bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> >>  };
> >>  
> >>  / CORE /
> >> diff --git a/net/vmw_vsock/virtio_transport.c 
> >> b/net/vmw_vsock/virtio_transport.c
> >> index 7bbcc8093e51..23cb8ed638c4 100644
> >> --- a/net/vmw_vsock/virtio_transport.c
> >> +++ b/net/vmw_vsock/virtio_transport.c
> >> @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> >>queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> >>  }
> >>  
> >> +static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter 
> >> *iov)
> >> +{
> >> +  struct virtio_vsock *vsock;
> >> +  bool res = false;
> >> +
> >> +  rcu_read_lock();
> >> +
> >> +  vsock = rcu_dereference(the_virtio_vsock);
> >> +  if (vsock) {
> >> +  struct virtqueue *vq;
> >> +  int iov_pages;
> >> +
> >> +  vq = vsock->vqs[VSOCK_VQ_TX];
> >> +
> >> +  iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> >> +
> >> +  /* Check that tx queue is large enough to keep whole
> >> +   * data to send. This is needed, because when there is
> >> +   * not enough free space in the queue, current skb to
> >> +   * send will be reinserted to the head of tx list of
> >> +   * the socket to retry transmission later, so if skb
> >> +   * is bigger than whole queue, it will be reinserted
> >> +   * again and again, thus blocking other skbs to be sent.
> >> +   * Each page of the user provided buffer will be added
> >> +   * as a single buffer to the tx virtqueue, so compare
> >> +   * number of pages against maximum capacity of the queue.
> >> +   * +1 means buffer for the packet header.
> >> +   */
> >> +  if (iov_pages + 1 <= vq->num_max)
> >> +  res = true;
> > 
> > 
> > Yes but can't there already be buffers in the queue?
> > Then you can't stick num_max there.
> 
> I think, that it is not critical, because vhost part always tries to process 
> all
> incoming buffers (yes, 'vhost_exceeds_weight()' breaks at some moment, but it 
> will
> reschedule tx kick ('vhost_vsock_handle_tx_kic

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:
> 
> 
> On 21.07.2023 00:42, Arseniy Krasnov wrote:
> > This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> > flag is set and zerocopy transmission is possible (enabled in socket
> > options and transport allows zerocopy), then non-linear skb will be
> > created and filled with the pages of user's buffer. Pages of user's
> > buffer are locked in memory by 'get_user_pages()'. Second thing that
> > this patch does is replace type of skb owning: instead of calling
> > 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> > change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> > of socket, so to decrease this field correctly proper skb destructor is
> > needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> > 
> > Signed-off-by: Arseniy Krasnov 
> > ---
> >  Changelog:
> >  v5(big patchset) -> v1:
> >   * Refactorings of 'if' conditions.
> >   * Remove extra blank line.
> >   * Remove 'frag_off' field unneeded init.
> >   * Add function 'virtio_transport_fill_skb()' which fills both linear
> > and non-linear skb with provided data.
> >  v1 -> v2:
> >   * Use original order of last four arguments in 
> > 'virtio_transport_alloc_skb()'.
> >  v2 -> v3:
> >   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> > provided 'iov_iter' with data could be sent in a zerocopy mode.
> > If this callback is not set in transport - transport allows to send
> > any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
> > then zerocopy is allowed. Reason of this callback is that in case of
> > G2H transmission we insert whole skb to the tx virtio queue and such
> > skb must fit to the size of the virtio queue to be sent in a single
> > iteration (may be tx logic in 'virtio_transport.c' could be reworked
> > as in vhost to support partial send of current skb). This callback
> > will be enabled only for G2H path. For details pls see comment 
> > 'Check that tx queue...' below.
> > 
> >  include/net/af_vsock.h  |   3 +
> >  net/vmw_vsock/virtio_transport.c|  39 
> >  net/vmw_vsock/virtio_transport_common.c | 257 ++--
> >  3 files changed, 241 insertions(+), 58 deletions(-)
> > 
> > diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> > index 0e7504a42925..a6b346eeeb8e 100644
> > --- a/include/net/af_vsock.h
> > +++ b/include/net/af_vsock.h
> > @@ -177,6 +177,9 @@ struct vsock_transport {
> >  
> > /* Read a single skb */
> > int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> > +
> > +   /* Zero-copy. */
> > +   bool (*msgzerocopy_check_iov)(const struct iov_iter *);
> >  };
> >  
> >  / CORE /
> > diff --git a/net/vmw_vsock/virtio_transport.c 
> > b/net/vmw_vsock/virtio_transport.c
> > index 7bbcc8093e51..23cb8ed638c4 100644
> > --- a/net/vmw_vsock/virtio_transport.c
> > +++ b/net/vmw_vsock/virtio_transport.c
> > @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
> > queue_work(virtio_vsock_workqueue, &vsock->rx_work);
> >  }
> >  
> > +static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter 
> > *iov)
> > +{
> > +   struct virtio_vsock *vsock;
> > +   bool res = false;
> > +
> > +   rcu_read_lock();
> > +
> > +   vsock = rcu_dereference(the_virtio_vsock);
> > +   if (vsock) {
> > +   struct virtqueue *vq;
> > +   int iov_pages;
> > +
> > +   vq = vsock->vqs[VSOCK_VQ_TX];
> > +
> > +   iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> > +
> > +   /* Check that tx queue is large enough to keep whole
> > +* data to send. This is needed, because when there is
> > +* not enough free space in the queue, current skb to
> > +* send will be reinserted to the head of tx list of
> > +* the socket to retry transmission later, so if skb
> > +* is bigger than whole queue, it will be reinserted
> > +* again and again, thus blocking other skbs to be sent.
> > +* Each page of the user provided buffer will be added
> > +* as a single buffer to the tx virtqueue, so compare
> > +* number of pages against maximum capacity of the queue.
> > +* +1 means buffer for the packet header.
> > +*/
> > +   if (iov_pages + 1 <= vq->num_max)
> 
> I think this check is actual only for case one we don't have indirect buffer 
> feature.
> With indirect mode whole data to send will be packed into one indirect buffer.
> 
> Thanks, Arseniy

Actually the reverse. With indirect you are limited to num_max.
Without you are limited to whatever space is left in the
queue (which you did not check here, so you should).


> > +   res = true;
> > +   }
> > +
> > +   rcu_read_unlock();

Just curious:
is the point of all this RCU dance to

Re: [PATCH net] virtio-net: fix race between set queues and probe

2023-07-25 Thread Xuan Zhuo
On Tue, 25 Jul 2023 03:20:49 -0400, Jason Wang  wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
>
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang 

Reviewed-by: Xuan Zhuo 

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0db14f6b87d3..1270c8d23463 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>   if (vi->has_rss || vi->has_rss_hash_report)
>   virtnet_init_default_rss(vi);
>
> + _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +
>   /* serialize netdev register + virtio_device_ready() with ndo_open() */
>   rtnl_lock();
>
> @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>   goto free_unregister_netdev;
>   }
>
> - virtnet_set_queues(vi, vi->curr_queue_pairs);
> -
>   /* Assume link up if device can't report link status,
>  otherwise get link status from config. */
>   netif_carrier_off(dev);
> --
> 2.39.3
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-25 Thread Xuan Zhuo
On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin"  
wrote:
> On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig  
> > wrote:
> > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > There are NOP for non-dma so passing the dma device is harmless.
> > >
> > > Yes, please.
> >
> >
> > I am not sure I got this fully.
> >
> > Are you mean this:
> > https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> > https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> >
> > Then the driver must do dma operation(map and sync) by these virtio_dma_* 
> > APIs.
> > No care the device is non-dma device or dma device.
>
> yes
>
> > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
>
> We'll worry about AF_XDP when the patch is posted.

YES.

We discussed it. They voted 'no'.

http://lore.kernel.org/all/20230424082856.15c1e...@kernel.org

Thanks.


>
> --
> MST
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Stefano Garzarella

On Fri, Jul 21, 2023 at 08:09:03AM +0300, Arseniy Krasnov wrote:



On 21.07.2023 00:42, Arseniy Krasnov wrote:

This adds handling of MSG_ZEROCOPY flag on transmission path: if this
flag is set and zerocopy transmission is possible (enabled in socket
options and transport allows zerocopy), then non-linear skb will be
created and filled with the pages of user's buffer. Pages of user's
buffer are locked in memory by 'get_user_pages()'. Second thing that
this patch does is replace type of skb owning: instead of calling
'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
of socket, so to decrease this field correctly proper skb destructor is
needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.

Signed-off-by: Arseniy Krasnov 
---
 Changelog:
 v5(big patchset) -> v1:
  * Refactorings of 'if' conditions.
  * Remove extra blank line.
  * Remove 'frag_off' field unneeded init.
  * Add function 'virtio_transport_fill_skb()' which fills both linear
and non-linear skb with provided data.
 v1 -> v2:
  * Use original order of last four arguments in 'virtio_transport_alloc_skb()'.
 v2 -> v3:
  * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
provided 'iov_iter' with data could be sent in a zerocopy mode.
If this callback is not set in transport - transport allows to send
any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
then zerocopy is allowed. Reason of this callback is that in case of
G2H transmission we insert whole skb to the tx virtio queue and such
skb must fit to the size of the virtio queue to be sent in a single
iteration (may be tx logic in 'virtio_transport.c' could be reworked
as in vhost to support partial send of current skb). This callback
will be enabled only for G2H path. For details pls see comment
'Check that tx queue...' below.

 include/net/af_vsock.h  |   3 +
 net/vmw_vsock/virtio_transport.c|  39 
 net/vmw_vsock/virtio_transport_common.c | 257 ++--
 3 files changed, 241 insertions(+), 58 deletions(-)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 0e7504a42925..a6b346eeeb8e 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -177,6 +177,9 @@ struct vsock_transport {

/* Read a single skb */
int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
+
+   /* Zero-copy. */
+   bool (*msgzerocopy_check_iov)(const struct iov_iter *);
 };

 / CORE /
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 7bbcc8093e51..23cb8ed638c4 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
queue_work(virtio_vsock_workqueue, &vsock->rx_work);
 }

+static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter *iov)
+{
+   struct virtio_vsock *vsock;
+   bool res = false;
+
+   rcu_read_lock();
+
+   vsock = rcu_dereference(the_virtio_vsock);
+   if (vsock) {
+   struct virtqueue *vq;
+   int iov_pages;
+
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
+
+   /* Check that tx queue is large enough to keep whole
+* data to send. This is needed, because when there is
+* not enough free space in the queue, current skb to
+* send will be reinserted to the head of tx list of
+* the socket to retry transmission later, so if skb
+* is bigger than whole queue, it will be reinserted
+* again and again, thus blocking other skbs to be sent.
+* Each page of the user provided buffer will be added
+* as a single buffer to the tx virtqueue, so compare
+* number of pages against maximum capacity of the queue.
+* +1 means buffer for the packet header.
+*/
+   if (iov_pages + 1 <= vq->num_max)


I think this check is actual only for case one we don't have indirect buffer 
feature.
With indirect mode whole data to send will be packed into one indirect buffer.


I think so.
So, should we check also that here?



Thanks, Arseniy


+   res = true;
+   }
+
+   rcu_read_unlock();
+
+   return res;
+}
+
 static bool virtio_transport_seqpacket_allow(u32 remote_cid);

 static struct virtio_transport virtio_transport = {
@@ -475,6 +512,8 @@ static struct virtio_transport virtio_transport = {
.seqpacket_allow  = virtio_transport_seqpacket_allow,
.seqpacket_has_data   = virtio_transport_seqpacket_has_data,

+   .msgzerocopy_check_iov= 
virtio_transport_msgzerocopy_check_iov,
+
 

Re: [PATCH v1] vdpa: Complement vdpa_nl_policy for nlattr length check

2023-07-25 Thread Dragos Tatulea via Virtualization
On Mon, 2023-07-24 at 16:08 -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 24, 2023 at 11:42:42AM +, Dragos Tatulea wrote:
> > On Mon, 2023-07-24 at 05:16 -0400, Michael S. Tsirkin wrote:
> > > On Mon, Jul 24, 2023 at 08:38:04AM +, Dragos Tatulea wrote:
> > > > 
> > > > On Mon, 2023-07-24 at 15:11 +0800, Jason Wang wrote:
> > > > > On Sun, Jul 23, 2023 at 6:02 PM Michael S. Tsirkin 
> > > > > wrote:
> > > > > > 
> > > > > > On Sun, Jul 23, 2023 at 05:48:46PM +0800, Lin Ma wrote:
> > > > > > > 
> > > > > > > > Sure, that is another undergoing task I'm working on. If the
> > > > > > > > nlattr
> > > > > > > > is
> > > > > > > > parsed with
> > > > > > > > NL_VALIDATE_UNSPEC, any forgotten nlattr will be rejected,
> > > > > > > > therefore
> > > > > > > > (which is the default
> > > > > > > > for modern nla_parse).
> > > > > > > 
> > > > > > > For the general netlink interface, the deciding flag should be
> > > > > > > genl_ops.validate defined in
> > > > > > > each ops. The default validate flag is strict, while the developer
> > > > > > > can
> > > > > > > overwrite the flag
> > > > > > > with GENL_DONT_VALIDATE_STRICT to ease the validation. That is to
> > > > > > > say,
> > > > > > > safer code should
> > > > > > > enforce NL_VALIDATE_STRICT by not overwriting the validate flag.
> > > > > > > 
> > > > > > > Regrads
> > > > > > > Lin
> > > > > > 
> > > > > > 
> > > > > > Oh I see.
> > > > > > 
> > > > > > It started here:
> > > > > > 
> > > > > > commit 33b347503f014ebf76257327cbc7001c6b721956
> > > > > > Author: Parav Pandit 
> > > > > > Date:   Tue Jan 5 12:32:00 2021 +0200
> > > > > > 
> > > > > >     vdpa: Define vdpa mgmt device, ops and a netlink interface
> > > > > > 
> > > > > > which did:
> > > > > > 
> > > > > > +   .validate = GENL_DONT_VALIDATE_STRICT |
> > > > > > GENL_DONT_VALIDATE_DUMP,
> > > > > > 
> > > > > > 
> > > > > > which was most likely just a copy paste from somewhere, right Parav?
> > > > > > 
> > > > > > and then everyone kept copying this around.
> > > > > > 
> > > > > > Parav, Eli can we drop these? There's a tiny chance of breaking
> > > > > > something
> > > > > > but I feel there aren't that many users outside mlx5 yet, so if you
> > > > > > guys can test on mlx5 and confirm no breakage, I think we are good.
> > > > > 
> > > > > Adding Dragos.
> > > > > 
> > > > I will check. Just to make sure I understand correctly: you want me to
> > > > drop
> > > > the
> > > > .validate flags all together in all vdpa ops and check, right?
> > > > 
> > > > Thanks,
> > > > Dragos
> > > 
> > > yes - I suspect you will then need this patch to make things work.
> > > 
> > Yep. Adding the patch and removing the .validate config on the vdpa_nl_ops
> > seems to work just fine.
> > 
> > Thanks,
> > Dragos
> 
> OK, post a patch?
> 
Sure, but how do I make it depend on this patch? Otherwise it will break things.

Thanks,
Dragos

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 4/4] vsock/virtio: MSG_ZEROCOPY flag support

2023-07-25 Thread Michael S. Tsirkin
On Fri, Jul 21, 2023 at 12:42:45AM +0300, Arseniy Krasnov wrote:
> This adds handling of MSG_ZEROCOPY flag on transmission path: if this
> flag is set and zerocopy transmission is possible (enabled in socket
> options and transport allows zerocopy), then non-linear skb will be
> created and filled with the pages of user's buffer. Pages of user's
> buffer are locked in memory by 'get_user_pages()'. Second thing that
> this patch does is replace type of skb owning: instead of calling
> 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this
> change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc'
> of socket, so to decrease this field correctly proper skb destructor is
> needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'.
> 
> Signed-off-by: Arseniy Krasnov 
> ---
>  Changelog:
>  v5(big patchset) -> v1:
>   * Refactorings of 'if' conditions.
>   * Remove extra blank line.
>   * Remove 'frag_off' field unneeded init.
>   * Add function 'virtio_transport_fill_skb()' which fills both linear
> and non-linear skb with provided data.
>  v1 -> v2:
>   * Use original order of last four arguments in 
> 'virtio_transport_alloc_skb()'.
>  v2 -> v3:
>   * Add new transport callback: 'msgzerocopy_check_iov'. It checks that
> provided 'iov_iter' with data could be sent in a zerocopy mode.
> If this callback is not set in transport - transport allows to send
> any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true'
> then zerocopy is allowed. Reason of this callback is that in case of
> G2H transmission we insert whole skb to the tx virtio queue and such
> skb must fit to the size of the virtio queue to be sent in a single
> iteration (may be tx logic in 'virtio_transport.c' could be reworked
> as in vhost to support partial send of current skb). This callback
> will be enabled only for G2H path. For details pls see comment 
> 'Check that tx queue...' below.
> 
>  include/net/af_vsock.h  |   3 +
>  net/vmw_vsock/virtio_transport.c|  39 
>  net/vmw_vsock/virtio_transport_common.c | 257 ++--
>  3 files changed, 241 insertions(+), 58 deletions(-)
> 
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index 0e7504a42925..a6b346eeeb8e 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -177,6 +177,9 @@ struct vsock_transport {
>  
>   /* Read a single skb */
>   int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
> +
> + /* Zero-copy. */
> + bool (*msgzerocopy_check_iov)(const struct iov_iter *);
>  };
>  
>  / CORE /
> diff --git a/net/vmw_vsock/virtio_transport.c 
> b/net/vmw_vsock/virtio_transport.c
> index 7bbcc8093e51..23cb8ed638c4 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -442,6 +442,43 @@ static void virtio_vsock_rx_done(struct virtqueue *vq)
>   queue_work(virtio_vsock_workqueue, &vsock->rx_work);
>  }
>  
> +static bool virtio_transport_msgzerocopy_check_iov(const struct iov_iter 
> *iov)
> +{
> + struct virtio_vsock *vsock;
> + bool res = false;
> +
> + rcu_read_lock();
> +
> + vsock = rcu_dereference(the_virtio_vsock);
> + if (vsock) {
> + struct virtqueue *vq;
> + int iov_pages;
> +
> + vq = vsock->vqs[VSOCK_VQ_TX];
> +
> + iov_pages = round_up(iov->count, PAGE_SIZE) / PAGE_SIZE;
> +
> + /* Check that tx queue is large enough to keep whole
> +  * data to send. This is needed, because when there is
> +  * not enough free space in the queue, current skb to
> +  * send will be reinserted to the head of tx list of
> +  * the socket to retry transmission later, so if skb
> +  * is bigger than whole queue, it will be reinserted
> +  * again and again, thus blocking other skbs to be sent.
> +  * Each page of the user provided buffer will be added
> +  * as a single buffer to the tx virtqueue, so compare
> +  * number of pages against maximum capacity of the queue.
> +  * +1 means buffer for the packet header.
> +  */
> + if (iov_pages + 1 <= vq->num_max)
> + res = true;


Yes but can't there already be buffers in the queue?
Then you can't stick num_max there.


> + }
> +
> + rcu_read_unlock();
> +
> + return res;
> +}
> +
>  static bool virtio_transport_seqpacket_allow(u32 remote_cid);
>  
>  static struct virtio_transport virtio_transport = {
> @@ -475,6 +512,8 @@ static struct virtio_transport virtio_transport = {
>   .seqpacket_allow  = virtio_transport_seqpacket_allow,
>   .seqpacket_has_data   = virtio_transport_seqpacket_has_data,
>  
> + .msgzerocopy_check_iov= 
> virtio_transport_msgzerocopy_check_iov,
> +
>   .notify_poll_in

Re: [PATCH net-next v3 2/4] vsock/virtio: support to send non-linear skb

2023-07-25 Thread Stefano Garzarella

On Fri, Jul 21, 2023 at 12:42:43AM +0300, Arseniy Krasnov wrote:

For non-linear skb use its pages from fragment array as buffers in
virtio tx queue. These pages are already pinned by 'get_user_pages()'
during such skb creation.

Signed-off-by: Arseniy Krasnov 
Reviewed-by: Stefano Garzarella 
---
Changelog:
v2 -> v3:
 * Comment about 'page_to_virt()' is updated. I don't remove R-b,
   as this change is quiet small I guess.


Ack!

Thanks,
Stefano



net/vmw_vsock/virtio_transport.c | 41 +++-
1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e95df847176b..7bbcc8093e51 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -100,7 +100,9 @@ virtio_transport_send_pkt_work(struct work_struct *work)
vq = vsock->vqs[VSOCK_VQ_TX];

for (;;) {
-   struct scatterlist hdr, buf, *sgs[2];
+   /* +1 is for packet header. */
+   struct scatterlist *sgs[MAX_SKB_FRAGS + 1];
+   struct scatterlist bufs[MAX_SKB_FRAGS + 1];
int ret, in_sg = 0, out_sg = 0;
struct sk_buff *skb;
bool reply;
@@ -111,12 +113,39 @@ virtio_transport_send_pkt_work(struct work_struct *work)

virtio_transport_deliver_tap_pkt(skb);
reply = virtio_vsock_skb_reply(skb);
+   sg_init_one(&bufs[out_sg], virtio_vsock_hdr(skb),
+   sizeof(*virtio_vsock_hdr(skb)));
+   sgs[out_sg] = &bufs[out_sg];
+   out_sg++;
+
+   if (!skb_is_nonlinear(skb)) {
+   if (skb->len > 0) {
+   sg_init_one(&bufs[out_sg], skb->data, skb->len);
+   sgs[out_sg] = &bufs[out_sg];
+   out_sg++;
+   }
+   } else {
+   struct skb_shared_info *si;
+   int i;
+
+   si = skb_shinfo(skb);
+
+   for (i = 0; i < si->nr_frags; i++) {
+   skb_frag_t *skb_frag = &si->frags[i];
+   void *va;

-   sg_init_one(&hdr, virtio_vsock_hdr(skb), 
sizeof(*virtio_vsock_hdr(skb)));
-   sgs[out_sg++] = &hdr;
-   if (skb->len > 0) {
-   sg_init_one(&buf, skb->data, skb->len);
-   sgs[out_sg++] = &buf;
+   /* We will use 'page_to_virt()' for the 
userspace page
+* here, because virtio or dma-mapping layers 
will call
+* 'virt_to_phys()' later to fill the buffer 
descriptor.
+* We don't touch memory at "virtual" address 
of this page.
+*/
+   va = page_to_virt(skb_frag->bv_page);
+   sg_init_one(&bufs[out_sg],
+   va + skb_frag->bv_offset,
+   skb_frag->bv_len);
+   sgs[out_sg] = &bufs[out_sg];
+   out_sg++;
+   }
}

ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, 
GFP_KERNEL);
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: fix race between set queues and probe

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 03:20:49AM -0400, Jason Wang wrote:
> A race were found where set_channels could be called after registering
> but before virtnet_set_queues() in virtnet_probe(). Fixing this by
> moving the virtnet_set_queues() before netdevice registering. While at
> it, use _virtnet_set_queues() to avoid holding rtnl as the device is
> not even registered at that time.
> 
> Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
> Signed-off-by: Jason Wang 


Acked-by: Michael S. Tsirkin 

stable material I guess?

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0db14f6b87d3..1270c8d23463 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>   if (vi->has_rss || vi->has_rss_hash_report)
>   virtnet_init_default_rss(vi);
>  
> + _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +
>   /* serialize netdev register + virtio_device_ready() with ndo_open() */
>   rtnl_lock();
>  
> @@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
>   goto free_unregister_netdev;
>   }
>  
> - virtnet_set_queues(vi, vi->curr_queue_pairs);
> -
>   /* Assume link up if device can't report link status,
>  otherwise get link status from config. */
>   netif_carrier_off(dev);
> -- 
> 2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 11:07:40AM +0800, Jason Wang wrote:
> On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
> > > On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On 7/21/23 17:10, Michael S. Tsirkin wrote:
> > > > > On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
> > > > >>
> > > > >>
> > > > >> On 7/21/23 16:45, Michael S. Tsirkin wrote:
> > > > >>> On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
> > > > 
> > > > 
> > > >  On 7/20/23 23:02, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
> > > > >> On 7/20/23 1:38 AM, Jason Wang wrote:
> > > > >>>
> > > > >>> Adding cond_resched() to the command waiting loop for a better
> > > > >>> co-operation with the scheduler. This allows to give CPU a 
> > > > >>> breath to
> > > > >>> run other task(workqueue) instead of busy looping when 
> > > > >>> preemption is
> > > > >>> not allowed on a device whose CVQ might be slow.
> > > > >>>
> > > > >>> Signed-off-by: Jason Wang 
> > > > >>
> > > > >> This still leaves hung processes, but at least it doesn't pin 
> > > > >> the CPU any
> > > > >> more.  Thanks.
> > > > >> Reviewed-by: Shannon Nelson 
> > > > >>
> > > > >
> > > > > I'd like to see a full solution
> > > > > 1- block until interrupt
> > >
> > > I remember in previous versions, you worried about the extra MSI
> > > vector. (Maybe I was wrong).
> > >
> > > > 
> > > >  Would it make sense to also have a timeout?
> > > >  And when timeout expires, set FAILED bit in device status?
> > > > >>>
> > > > >>> virtio spec does not set any limits on the timing of vq
> > > > >>> processing.
> > > > >>
> > > > >> Indeed, but I thought the driver could decide it is too long for it.
> > > > >>
> > > > >> The issue is we keep waiting with rtnl locked, it can quickly make 
> > > > >> the
> > > > >> system unusable.
> > > > >
> > > > > if this is a problem we should find a way not to keep rtnl
> > > > > locked indefinitely.
> > >
> > > Any ideas on this direction? Simply dropping rtnl during the busy loop
> > > will result in a lot of races. This seems to require non-trivial
> > > changes in the networking core.
> > >
> > > >
> > > >  From the tests I have done, I think it is. With OVS, a reconfiguration
> > > > is performed when the VDUSE device is added, and when a MLX5 device is
> > > > in the same bridge, it ends up doing an ioctl() that tries to take the
> > > > rtnl lock. In this configuration, it is not possible to kill OVS because
> > > > it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
> > > > net.
> > >
> > > Yeah, basically, any RTNL users would be blocked forever.
> > >
> > > And the infinite loop has other side effects like it blocks the freezer 
> > > to work.
> > >
> > > To summarize, there are three issues
> > >
> > > 1) busy polling
> > > 2) breaks freezer
> > > 3) hold RTNL during the loop
> > >
> > > Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard or
> > > even virtnet_restore() itself may try to hold the lock.
> >
> > Yep. So my feeling currently is, the only real fix is to actually
> > queue up the work in software.
> 
> Do you mean something like:
> 
> rtnl_lock();
> queue up the work
> rtnl_unlock();
> return success;
> 
> ?

yes


> > It's mostly trivial to limit
> > memory consumption, vid's is the
> > only one where it would make sense to have more than
> > 1 command of a given type outstanding.
> 
> And rx mode so this implies we will fail any command if the previous
> work is not finished.

don't fail it, store it.

> > have a tree
> > or a bitmap with vids to add/remove?
> 
> Probably.
> 
> Thanks
> 
> >
> >
> >
> > > >
> > > > >
> > > > > 2- still handle surprise removal correctly by waking in that case
> > >
> > > This is basically what version 1 did?
> > >
> > > https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368...@redhat.com/t/
> > >
> > > Thanks
> >
> > Yes - except the timeout part.
> >
> >
> > > > >
> > > > >
> > > > >
> > > > >>> ---
> > > > >>>  drivers/net/virtio_net.c | 4 +++-
> > > > >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >>>
> > > > >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > >>> index 9f3b1d6ac33d..e7533f29b219 100644
> > > > >>> --- a/drivers/net/virtio_net.c
> > > > >>> +++ b/drivers/net/virtio_net.c
> > > > >>> @@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct 
> > > > >>> virtnet_info *vi, u8 class, u8 cmd,
> > > > >>>  * into the hypervisor, so the request should be 
> > > > >>> handled immediately.
> > > > >>>  */
> > > > >>> while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > > > >

Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-25 Thread Michael S. Tsirkin
On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig  
> wrote:
> > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > There are NOP for non-dma so passing the dma device is harmless.
> >
> > Yes, please.
> 
> 
> I am not sure I got this fully.
> 
> Are you mean this:
> https://lore.kernel.org/all/20230214072704.126660-8-xuanz...@linux.alibaba.com/
> https://lore.kernel.org/all/20230214072704.126660-9-xuanz...@linux.alibaba.com/
> 
> Then the driver must do dma operation(map and sync) by these virtio_dma_* 
> APIs.
> No care the device is non-dma device or dma device.

yes

> Then the AF_XDP must use these virtio_dma_* APIs for virtio device.

We'll worry about AF_XDP when the patch is posted.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net] virtio-net: fix race between set queues and probe

2023-07-25 Thread Jason Wang
A race were found where set_channels could be called after registering
but before virtnet_set_queues() in virtnet_probe(). Fixing this by
moving the virtnet_set_queues() before netdevice registering. While at
it, use _virtnet_set_queues() to avoid holding rtnl as the device is
not even registered at that time.

Fixes: a220871be66f ("virtio-net: correctly enable multiqueue")
Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0db14f6b87d3..1270c8d23463 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -4219,6 +4219,8 @@ static int virtnet_probe(struct virtio_device *vdev)
if (vi->has_rss || vi->has_rss_hash_report)
virtnet_init_default_rss(vi);
 
+   _virtnet_set_queues(vi, vi->curr_queue_pairs);
+
/* serialize netdev register + virtio_device_ready() with ndo_open() */
rtnl_lock();
 
@@ -4257,8 +4259,6 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_unregister_netdev;
}
 
-   virtnet_set_queues(vi, vi->curr_queue_pairs);
-
/* Assume link up if device can't report link status,
   otherwise get link status from config. */
netif_carrier_off(dev);
-- 
2.39.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization