Re: [RFC v3 6/7] virtio: in order support for virtio_ring

2022-09-06 Thread Jason Wang


在 2022/9/1 13:54, Guo Zhi 写道:

If in order feature negotiated, we can skip the used ring to get
buffer's desc id sequentially.  For skipped buffers in the batch, the
used ring doesn't contain the buffer length, actually there is not need
to get skipped buffers' length as they are tx buffer.

Signed-off-by: Guo Zhi 
---
  drivers/virtio/virtio_ring.c | 74 +++-
  1 file changed, 64 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 00aa4b7a49c2..d52624179b43 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -103,6 +103,9 @@ struct vring_virtqueue {
/* Host supports indirect buffers */
bool indirect;
  
+	/* Host supports in order feature */

+   bool in_order;
+
/* Host publishes avail event idx */
bool event;
  
@@ -144,6 +147,19 @@ struct vring_virtqueue {

/* DMA address and size information */
dma_addr_t queue_dma_addr;
size_t queue_size_in_bytes;
+
+   /* If in_order feature is negotiated, here is the next 
head to consume */
+   u16 next_desc_begin;
+   /*
+* If in_order feature is negotiated,
+* here is the last descriptor's id in the batch
+*/
+   u16 last_desc_in_batch;
+   /*
+* If in_order feature is negotiated,
+* buffers except last buffer in the batch are skipped 
buffer
+*/
+   bool is_skipped_buffer;
} split;
  
  		/* Available for packed ring */

@@ -584,8 +600,6 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 total_sg * sizeof(struct vring_desc),
 VRING_DESC_F_INDIRECT,
 false);
-   vq->split.desc_extra[head & (vq->split.vring.num - 1)].flags &=
-   ~VRING_DESC_F_NEXT;



This seems irrelevant.



}
  
  	/* We're using some buffers from the free list. */

@@ -701,8 +715,16 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
}
  
  	vring_unmap_one_split(vq, i);

-   vq->split.desc_extra[i].next = vq->free_head;
-   vq->free_head = head;
+   /*
+* If in_order feature is negotiated,
+* the descriptors are made available in order.
+* Since the free_head is already a circular list,
+* it must consume it sequentially.
+*/
+   if (!vq->in_order) {
+   vq->split.desc_extra[i].next = vq->free_head;
+   vq->free_head = head;
+   }
  
  	/* Plus final descriptor */

vq->vq.num_free++;
@@ -744,7 +766,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
*_vq,
  {
struct vring_virtqueue *vq = to_vvq(_vq);
void *ret;
-   unsigned int i;
+   unsigned int i, j;
u16 last_used;
  
  	START_USE(vq);

@@ -763,11 +785,38 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
*_vq,
/* Only get used array entries after they have been exposed by host. */
virtio_rmb(vq->weak_barriers);
  
-	last_used = (vq->last_used_idx & (vq->split.vring.num - 1));

-   i = virtio32_to_cpu(_vq->vdev,
-   vq->split.vring.used->ring[last_used].id);
-   *len = virtio32_to_cpu(_vq->vdev,
-   vq->split.vring.used->ring[last_used].len);
+   if (vq->in_order) {
+   last_used = (vq->last_used_idx & (vq->split.vring.num - 1));



Let's move this beyond the in_order check.



+   if (!vq->split.is_skipped_buffer) {
+   vq->split.last_desc_in_batch =
+   virtio32_to_cpu(_vq->vdev,
+   
vq->split.vring.used->ring[last_used].id);
+   vq->split.is_skipped_buffer = true;
+   }
+   /* For skipped buffers in batch, we can ignore the len info, 
simply set len as 0 */



This seems to break the caller that depends on a correct len.



+   if (vq->split.next_desc_begin != vq->split.last_desc_in_batch) {
+   *len = 0;
+   } else {
+   *len = virtio32_to_cpu(_vq->vdev,
+  
vq->split.vring.used->ring[last_used].len);
+   vq->split.is_skipped_buffer = false;
+   }
+   i = vq->split.next_desc_begin;
+   j = i;
+   /* Indirect only takes one descriptor in descriptor table */
+   while (!vq->indirect && (vq->split.desc_extra[j].flags & 
VRING_DESC_F_NEXT))
+   j = (j + 1) & 

Re: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-06 Thread Michael S. Tsirkin
On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> packets even when GUEST_* offloads are not present on the device.
> However, if guest GSO is not supported, it would be sufficient to
> allocate segments to cover just up the MTU size and no further.
> Allocating the maximum amount of segments results in a large waste of
> buffer space in the queue, which limits the number of packets that can
> be buffered and can result in reduced performance.
> 
> Therefore, if guest GSO is not supported, use the MTU to calculate the
> optimal amount of segments required.
> 
> When guest offload is enabled at runtime, RQ already has packets of bytes
> less than 64K. So when packet of 64KB arrives, all the packets of such
> size will be dropped. and RQ is now not usable.
> 
> So this means that during set_guest_offloads() phase, RQs have to be
> destroyed and recreated, which requires almost driver reload.
> 
> If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it should
> always treat them as GSO enabled.
> 
> Accordingly, for now the assumption is that if guest GSO has been
> negotiated then it has been enabled, even if it's actually been disabled
> at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> 
> Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
> 1 VQ, queue size 1024, before and after the change, with the iperf
> server running over the virtio-net interface.
> 
> MTU(Bytes)/Bandwidth (Gbit/s)
>  Before   After
>   150022.5 22.4
>   900012.8 25.9
> 
> Signed-off-by: Gavin Li 
> Reviewed-by: Gavi Teitz 
> Reviewed-by: Parav Pandit 
> Reviewed-by: Xuan Zhuo 
> Reviewed-by: Si-Wei Liu 


Which configurations were tested?
Did you test devices without VIRTIO_NET_F_MTU ?

> ---
> changelog:
> v4->v5
> - Addressed comments from Michael S. Tsirkin
> - Improve commit message
> v3->v4
> - Addressed comments from Si-Wei
> - Rename big_packets_sg_num with big_packets_num_skbfrags
> v2->v3
> - Addressed comments from Si-Wei
> - Simplify the condition check to enable the optimization
> v1->v2
> - Addressed comments from Jason, Michael, Si-Wei.
> - Remove the flag of guest GSO support, set sg_num for big packets and
>   use it directly
> - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> - Replace the round up algorithm with DIV_ROUND_UP
> ---
>  drivers/net/virtio_net.c | 37 -
>  1 file changed, 24 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f831a0290998..dbffd5f56fb8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -225,6 +225,9 @@ struct virtnet_info {
>   /* I like... big packets and I cannot lie! */
>   bool big_packets;
>  
> + /* number of sg entries allocated for big packets */
> + unsigned int big_packets_num_skbfrags;
> +
>   /* Host will merge rx buffers for big packets (shake it! shake it!) */
>   bool mergeable_rx_bufs;
>  
> @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
> struct receive_queue *rq,
>   char *p;
>   int i, err, offset;
>  
> - sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> + sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
>  
> - /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> - for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> + /* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> + for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
>   first = get_a_page(rq, gfp);
>   if (!first) {
>   if (list)
> @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
> struct receive_queue *rq,
>  
>   /* chain first in list head */
>   first->private = (unsigned long)list;
> - err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> + err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags 
> + 2,
> first, gfp);
>   if (err < 0)
>   give_pages(rq, first);
> @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const struct 
> virtnet_info *vi)
>   virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);
>  }
>  
> +static void virtnet_set_big_packets_fields(struct virtnet_info *vi, const 
> int mtu)


I'd rename this to just virtnet_set_big_packets.


> +{
> + bool guest_gso = virtnet_check_guest_gso(vi);
> +
> + /* If device can receive ANY guest GSO packets, regardless of mtu,
> +  * allocate packets of maximum size, otherwise limit it to only
> +  * mtu size worth only.
> +  */
> + if (mtu > ETH_DATA_LEN || guest_gso) {
> + vi->big_packets = true;
> + vi->big_packets_num_skbfrags = guest_gso ? MAX_SKB_FRAGS : 
> DIV_ROUND_UP(mtu, PAGE_SIZE);
> + }
> +}
> +
>  static int 

Re: [RFC v3 3/7] vsock: batch buffers in tx

2022-09-06 Thread Jason Wang


在 2022/9/1 13:54, Guo Zhi 写道:

Vsock uses buffers in order, and for tx driver doesn't have to
know the length of the buffer. So we can do a batch for vsock if
in order negotiated, only write one used ring for a batch of buffers

Signed-off-by: Guo Zhi 
---
  drivers/vhost/vsock.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 368330417bde..e08fbbb5439e 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -497,7 +497,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
 dev);
struct virtio_vsock_pkt *pkt;
-   int head, pkts = 0, total_len = 0;
+   int head, pkts = 0, total_len = 0, add = 0;
unsigned int out, in;
bool added = false;
  
@@ -551,10 +551,18 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)

else
virtio_transport_free_pkt(pkt);
  
-		vhost_add_used(vq, head, 0);

+   if (!vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
+   vhost_add_used(vq, head, 0);



I'd do this step by step.

1) switch to use vhost_add_used_n() for vsock, less copy_to_user() 
better performance

2) do in-order on top



+   } else {
+   vq->heads[add].id = head;
+   vq->heads[add++].len = 0;



How can we guarantee that we are in the boundary of the heads array?

Btw in the case of in-order we don't need to record the heads, instead 
we just need to know the head of the last buffer, it reduces the stress 
on the cache.


Thanks



+   }
added = true;
} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
  
+	/* If in order feature negotiaged, we can do a batch to increase performance */

+   if (vhost_has_feature(vq, VIRTIO_F_IN_ORDER) && added)
+   vhost_add_used_n(vq, vq->heads, add);
  no_more_replies:
if (added)
vhost_signal(>dev, vq);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 1/7] vhost: expose used buffers

2022-09-06 Thread Jason Wang


在 2022/9/1 13:54, Guo Zhi 写道:

Follow VIRTIO 1.1 spec, only writing out a single used ring for a batch
of descriptors.

Signed-off-by: Guo Zhi 
---
  drivers/vhost/vhost.c | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 40097826cff0..26862c8bf751 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2376,10 +2376,20 @@ static int __vhost_add_used_n(struct vhost_virtqueue 
*vq,
vring_used_elem_t __user *used;
u16 old, new;
int start;
+   int copy_n = count;
  
+	/**

+* If in order feature negotiated, devices can notify the use of a 
batch of buffers to
+* the driver by only writing out a single used ring entry with the id 
corresponding
+* to the head entry of the descriptor chain describing the last buffer 
in the batch.
+*/
+   if (vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {
+   copy_n = 1;
+   heads = [count - 1];
+   }



Would it better to have a dedicated helper like 
vhost_add_used_in_order() here?




start = vq->last_used_idx & (vq->num - 1);
used = vq->used->ring + start;
-   if (vhost_put_used(vq, heads, start, count)) {
+   if (vhost_put_used(vq, heads, start, copy_n)) {
vq_err(vq, "Failed to write used");
return -EFAULT;
}
@@ -2388,7 +2398,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
smp_wmb();
/* Log used ring entry write. */
log_used(vq, ((void __user *)used - (void __user *)vq->used),
-count * sizeof *used);
+copy_n * sizeof(*used));
}
old = vq->last_used_idx;
new = (vq->last_used_idx += count);
@@ -2410,7 +2420,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vring_used_elem *heads,
  
  	start = vq->last_used_idx & (vq->num - 1);

n = vq->num - start;
-   if (n < count) {
+   if (n < count && !vhost_has_feature(vq, VIRTIO_F_IN_ORDER)) {



This seems strange, any reason for this? (Actually if we support 
in-order we only need one used slot which fit for the case here)


Thanks



r = __vhost_add_used_n(vq, heads, n);
if (r < 0)
return r;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC v3 0/7] In order support for virtio_ring, vhost and vsock.

2022-09-06 Thread Jason Wang


在 2022/9/1 13:54, Guo Zhi 写道:

In virtio-spec 1.1, new feature bit VIRTIO_F_IN_ORDER was introduced.
When this feature has been negotiated, virtio driver will use
descriptors in ring order: starting from offset 0 in the table, and
wrapping around at the end of the table. Vhost devices will always use
descriptors in the same order in which they have been made available.
This can reduce virtio accesses to used ring.

Based on updated virtio-spec, this series realized IN_ORDER prototype in virtio
driver and vhost. Currently IN_ORDER feature supported devices are *vhost_test*
and *vsock* in vhost and virtio-net in QEMU. IN_ORDER feature works well
combined with INDIRECT feature in this patch series.



As stated in the previous versions, I'd like to see performance numbers. 
We need to prove that the feature actually help for the performance.


And it would be even better if we do the in-order in this order (vhost 
side):


1) enable in-order but without batching used
2) enable in-order with batching used

Then we can see how:

1) in-order helps
2) batching helps

Thanks




Virtio driver in_order support for packed vq hasn't been done in this patch
series now.

Guo Zhi (7):
   vhost: expose used buffers
   vhost_test: batch used buffer
   vsock: batch buffers in tx
   vsock: announce VIRTIO_F_IN_ORDER in vsock
   virtio: unmask F_NEXT flag in desc_extra
   virtio: in order support for virtio_ring
   virtio: announce VIRTIO_F_IN_ORDER support

  drivers/vhost/test.c | 16 ++--
  drivers/vhost/vhost.c| 16 ++--
  drivers/vhost/vsock.c| 13 +-
  drivers/virtio/virtio_ring.c | 79 +++-
  4 files changed, 104 insertions(+), 20 deletions(-)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-06 Thread Jason Wang


在 2022/9/1 10:10, Gavin Li 写道:

Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
packets even when GUEST_* offloads are not present on the device.
However, if guest GSO is not supported, it would be sufficient to
allocate segments to cover just up the MTU size and no further.
Allocating the maximum amount of segments results in a large waste of
buffer space in the queue, which limits the number of packets that can
be buffered and can result in reduced performance.

Therefore, if guest GSO is not supported, use the MTU to calculate the
optimal amount of segments required.

When guest offload is enabled at runtime, RQ already has packets of bytes
less than 64K. So when packet of 64KB arrives, all the packets of such
size will be dropped. and RQ is now not usable.

So this means that during set_guest_offloads() phase, RQs have to be
destroyed and recreated, which requires almost driver reload.

If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it should
always treat them as GSO enabled.

Accordingly, for now the assumption is that if guest GSO has been
negotiated then it has been enabled, even if it's actually been disabled
at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.



Nit: Actually, it's not the assumption but the behavior of the codes 
itself. Since we don't try to change guest offloading in probe so it's 
ok to check GSO via negotiated features?


Thanks




Below is the iperf TCP test results over a Mellanox NIC, using vDPA for
1 VQ, queue size 1024, before and after the change, with the iperf
server running over the virtio-net interface.

MTU(Bytes)/Bandwidth (Gbit/s)
  Before   After
   150022.5 22.4
   900012.8 25.9

Signed-off-by: Gavin Li 
Reviewed-by: Gavi Teitz 
Reviewed-by: Parav Pandit 
Reviewed-by: Xuan Zhuo 
Reviewed-by: Si-Wei Liu 
---
changelog:
v4->v5
- Addressed comments from Michael S. Tsirkin
- Improve commit message
v3->v4
- Addressed comments from Si-Wei
- Rename big_packets_sg_num with big_packets_num_skbfrags
v2->v3
- Addressed comments from Si-Wei
- Simplify the condition check to enable the optimization
v1->v2
- Addressed comments from Jason, Michael, Si-Wei.
- Remove the flag of guest GSO support, set sg_num for big packets and
   use it directly
- Recalculate sg_num for big packets in virtnet_set_guest_offloads
- Replace the round up algorithm with DIV_ROUND_UP
---
  drivers/net/virtio_net.c | 37 -
  1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f831a0290998..dbffd5f56fb8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -225,6 +225,9 @@ struct virtnet_info {
/* I like... big packets and I cannot lie! */
bool big_packets;
  
+	/* number of sg entries allocated for big packets */

+   unsigned int big_packets_num_skbfrags;
+
/* Host will merge rx buffers for big packets (shake it! shake it!) */
bool mergeable_rx_bufs;
  
@@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,

char *p;
int i, err, offset;
  
-	sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);

+   sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
  
-	/* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */

-   for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
+   /* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
+   for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
first = get_a_page(rq, gfp);
if (!first) {
if (list)
@@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, 
struct receive_queue *rq,
  
  	/* chain first in list head */

first->private = (unsigned long)list;
-   err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
+   err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags 
+ 2,
  first, gfp);
if (err < 0)
give_pages(rq, first);
@@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const struct 
virtnet_info *vi)
virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);
  }
  
+static void virtnet_set_big_packets_fields(struct virtnet_info *vi, const int mtu)

+{
+   bool guest_gso = virtnet_check_guest_gso(vi);
+
+   /* If device can receive ANY guest GSO packets, regardless of mtu,
+* allocate packets of maximum size, otherwise limit it to only
+* mtu size worth only.
+*/
+   if (mtu > ETH_DATA_LEN || guest_gso) {
+   vi->big_packets = true;
+   vi->big_packets_num_skbfrags = guest_gso ? MAX_SKB_FRAGS : 
DIV_ROUND_UP(mtu, PAGE_SIZE);
+   }
+}
+
  static int virtnet_probe(struct virtio_device *vdev)
  {
int i, err = -ENOMEM;
struct net_device *dev;
struct virtnet_info *vi;
u16 

Re: [PATCH net] virtio-net: add cond_resched() to the command waiting loop

2022-09-06 Thread Jason Wang
On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni  wrote:
>
> On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > Adding cond_resched() to the command waiting loop for a better
> > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > run other task(workqueue) instead of busy looping when preemption is
> > > > not allowed.
> > > >
> > > > What's more important. This is a must for some vDPA parent to work
> > > > since control virtqueue is emulated via a workqueue for those parents.
> > > >
> > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > >
> > > That's a weird commit to fix. so it fixes the simulator?
> >
> > Yes, since the simulator is using a workqueue to handle control virtueue.
>
> Uhmm... touching a driver for a simulator's sake looks a little weird.

Simulator is not the only one that is using a workqueue (but should be
the first).

I can see  that the mlx5 vDPA driver is using a workqueue as well (see
mlx5_vdpa_kick_vq()).

And in the case of VDUSE, it needs to wait for the response from the
userspace, this means cond_resched() is probably a must for the case
like UP.

>
> Additionally, if the bug is vdpasim, I think it's better to try to
> solve it there, if possible.
>
> Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> neither needs a process context, so perhaps you could rework it to run
> the work_fn() directly from vdpasim_kick_vq(), at least for the control
> virtqueue?

It's possible (but require some rework on the simulator core). But
considering we have other similar use cases, it looks better to solve
it in the virtio-net driver.

Additionally, this may have better behaviour when using for the buggy
hardware (e.g the control virtqueue takes too long to respond). We may
consider switching to use interrupt/sleep in the future (but not
suitable for -net).

Thanks

>
> Thanks!
>
> Paolo
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-09-06 Thread Daniel Vetter
On Tue, Sep 06, 2022 at 10:01:47PM +0200, Daniel Vetter wrote:
> On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> > Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > > Higher order pages allocated using alloc_pages() aren't refcounted and 
> > > they
> > > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > > faults.
> > > 
> > > Without this change guest virgl driver can't map host buffers into guest
> > > and can't provide OpenGL 4.5 profile support to the guest. The host
> > > mappings are also needed for enabling the Venus driver using host GPU
> > > drivers that are utilizing TTM.
> > > 
> > > Based on a patch proposed by Trigger Huang.
> > 
> > Well I can't count how often I have repeated this: This is an absolutely
> > clear NAK!
> > 
> > TTM pages are not reference counted in the first place and because of this
> > giving them to virgl is illegal.
> > 
> > Please immediately stop this completely broken approach. We have discussed
> > this multiple times now.
> 
> Yeah we need to get this stuff closed for real by tagging them all with
> VM_IO or VM_PFNMAP asap.

For a bit more context: Anything mapping a bo should be VM_SPECIAL. And I
think we should add the checks to the gem and dma-buf mmap functions to
validate for that, and fix all the fallout.

Otherwise this dragon keeps resurrecting ...

VM_SPECIAL _will_ block get_user_pages, which will block everyone from
even trying to refcount this stuff.

Minimally we need to fix this for all ttm drivers, and it sounds like
that's still not yet the case :-( Iirc last time around some funky amdkfd
userspace was the hold-up because regressions?
-Daniel

> 
> It seems ot be a recurring amount of fun that people try to mmap dma-buf
> and then call get_user_pages on them.
> 
> Which just doesn't work. I guess this is also why Rob Clark send out that
> dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).
> 
> There seems to be some serious bonghits going on :-/
> -Daniel
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > Cc: sta...@vger.kernel.org
> > > Cc: Trigger Huang 
> > > Link: 
> > > https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > > Tested-by: Dmitry Osipenko  # AMDGPU (Qemu 
> > > and crosvm)
> > > Signed-off-by: Dmitry Osipenko 
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_pool.c | 25 -
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c 
> > > b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index 21b61631f73a..11e92bb149c9 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
> > > *pool, gfp_t gfp_flags,
> > >   unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > >   struct ttm_pool_dma *dma;
> > >   struct page *p;
> > > + unsigned int i;
> > >   void *vaddr;
> > >   /* Don't set the __GFP_COMP flag for higher order allocations.
> > > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct 
> > > ttm_pool *pool, gfp_t gfp_flags,
> > >   if (!pool->use_dma_alloc) {
> > >   p = alloc_pages(gfp_flags, order);
> > > - if (p)
> > > + if (p) {
> > >   p->private = order;
> > > + goto ref_tail_pages;
> > > + }
> > >   return p;
> > >   }
> > > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct 
> > > ttm_pool *pool, gfp_t gfp_flags,
> > >   dma->vaddr = (unsigned long)vaddr | order;
> > >   p->private = (unsigned long)dma;
> > > +
> > > +ref_tail_pages:
> > > + /*
> > > +  * KVM requires mapped tail pages to be refcounted because put_page()
> > > +  * is invoked on them in the end of the page fault handling, and thus,
> > > +  * tail pages need to be protected from the premature releasing.
> > > +  * In fact, KVM page fault handler refuses to map tail pages to guest
> > > +  * if they aren't refcounted because hva_to_pfn_remapped() checks the
> > > +  * refcount specifically for this case.
> > > +  *
> > > +  * In particular, unreferenced tail pages result in a KVM "Bad address"
> > > +  * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > > +  * accesses mapped host TTM buffer that contains tail pages.
> > > +  */
> > > + for (i = 1; i < 1 << order; i++)
> > > + page_ref_inc(p + i);
> > > +
> > >   return p;
> > >   error_free:
> > > @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, 
> > > enum ttm_caching caching,
> > >   {
> > >   unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > >   struct ttm_pool_dma *dma;
> > > + unsigned int i;
> > >   void *vaddr;
> > >   #ifdef 

Re: [PATCH v1] drm/ttm: Refcount allocated tail pages

2022-09-06 Thread Daniel Vetter
On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > Higher order pages allocated using alloc_pages() aren't refcounted and they
> > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > faults.
> > 
> > Without this change guest virgl driver can't map host buffers into guest
> > and can't provide OpenGL 4.5 profile support to the guest. The host
> > mappings are also needed for enabling the Venus driver using host GPU
> > drivers that are utilizing TTM.
> > 
> > Based on a patch proposed by Trigger Huang.
> 
> Well I can't count how often I have repeated this: This is an absolutely
> clear NAK!
> 
> TTM pages are not reference counted in the first place and because of this
> giving them to virgl is illegal.
> 
> Please immediately stop this completely broken approach. We have discussed
> this multiple times now.

Yeah we need to get this stuff closed for real by tagging them all with
VM_IO or VM_PFNMAP asap.

It seems ot be a recurring amount of fun that people try to mmap dma-buf
and then call get_user_pages on them.

Which just doesn't work. I guess this is also why Rob Clark send out that
dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).

There seems to be some serious bonghits going on :-/
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > Cc: sta...@vger.kernel.org
> > Cc: Trigger Huang 
> > Link: 
> > https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > Tested-by: Dmitry Osipenko  # AMDGPU (Qemu 
> > and crosvm)
> > Signed-off-by: Dmitry Osipenko 
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 25 -
> >   1 file changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> > index 21b61631f73a..11e92bb149c9 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
> > *pool, gfp_t gfp_flags,
> > unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > struct ttm_pool_dma *dma;
> > struct page *p;
> > +   unsigned int i;
> > void *vaddr;
> > /* Don't set the __GFP_COMP flag for higher order allocations.
> > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool 
> > *pool, gfp_t gfp_flags,
> > if (!pool->use_dma_alloc) {
> > p = alloc_pages(gfp_flags, order);
> > -   if (p)
> > +   if (p) {
> > p->private = order;
> > +   goto ref_tail_pages;
> > +   }
> > return p;
> > }
> > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct 
> > ttm_pool *pool, gfp_t gfp_flags,
> > dma->vaddr = (unsigned long)vaddr | order;
> > p->private = (unsigned long)dma;
> > +
> > +ref_tail_pages:
> > +   /*
> > +* KVM requires mapped tail pages to be refcounted because put_page()
> > +* is invoked on them in the end of the page fault handling, and thus,
> > +* tail pages need to be protected from the premature releasing.
> > +* In fact, KVM page fault handler refuses to map tail pages to guest
> > +* if they aren't refcounted because hva_to_pfn_remapped() checks the
> > +* refcount specifically for this case.
> > +*
> > +* In particular, unreferenced tail pages result in a KVM "Bad address"
> > +* failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > +* accesses mapped host TTM buffer that contains tail pages.
> > +*/
> > +   for (i = 1; i < 1 << order; i++)
> > +   page_ref_inc(p + i);
> > +
> > return p;
> >   error_free:
> > @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, 
> > enum ttm_caching caching,
> >   {
> > unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > struct ttm_pool_dma *dma;
> > +   unsigned int i;
> > void *vaddr;
> >   #ifdef CONFIG_X86
> > @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, 
> > enum ttm_caching caching,
> > if (caching != ttm_cached && !PageHighMem(p))
> > set_pages_wb(p, 1 << order);
> >   #endif
> > +   for (i = 1; i < 1 << order; i++)
> > +   page_ref_dec(p + i);
> > if (!pool || !pool->use_dma_alloc) {
> > __free_pages(p, order);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] MAINTAINERS: Change status of some VMware drivers

2022-09-06 Thread Nadav Amit via Virtualization
On Sep 6, 2022, at 10:27 AM, Vishnu Dasa  wrote:

> From: Vishnu Dasa 
> 
> Change the status from 'Maintained' to 'Supported' for VMWARE
> BALLOON DRIVER, VMWARE PVRDMA DRIVER, VMWARE PVSCSI driver,
> VMWARE VMCI DRIVER, VMWARE VMMOUSE SUBDRIVER and VMWARE VMXNET3
> ETHERNET DRIVER.
> 
> This needs to be done to conform to the guidelines in [1].
> Maintainers for these drivers are VMware employees.
> 
> [1] https://docs.kernel.org/process/maintainers.html
> 
> Signed-off-by: Vishnu Dasa 
> ---
> MAINTAINERS | 12 ++--
> 1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index b75eb23a099b..5a634b5d6f6c 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -21812,7 +21812,7 @@ VMWARE BALLOON DRIVER
> M:Nadav Amit 
> R:VMware PV-Drivers Reviewers 
> L:linux-ker...@vger.kernel.org
> -S:   Maintained
> +S:   Supported
> F:drivers/misc/vmw_balloon.c

Acked-by: Nadav Amit 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/3] MAINTAINERS: Add a new entry for VMWARE VSOCK VMCI TRANSPORT DRIVER

2022-09-06 Thread vdasa--- via Virtualization
From: Vishnu Dasa 

Add a new entry for VMWARE VSOCK VMCI TRANSPORT DRIVER in the
MAINTAINERS file.

Signed-off-by: Vishnu Dasa 
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5a634b5d6f6c..0e52ee3521c3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21873,6 +21873,14 @@ L: net...@vger.kernel.org
 S: Supported
 F: drivers/net/vmxnet3/
 
+VMWARE VSOCK VMCI TRANSPORT DRIVER
+M: Bryan Tan 
+M: Vishnu Dasa 
+R: VMware PV-Drivers Reviewers 
+L: linux-ker...@vger.kernel.org
+S: Supported
+F: net/vmw_vsock/vmci_transport*
+
 VOCORE VOCORE2 BOARD
 M: Harvey Hunt 
 L: linux-m...@vger.kernel.org
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 2/3] MAINTAINERS: Change status of some VMware drivers

2022-09-06 Thread vdasa--- via Virtualization
From: Vishnu Dasa 

Change the status from 'Maintained' to 'Supported' for VMWARE
BALLOON DRIVER, VMWARE PVRDMA DRIVER, VMWARE PVSCSI driver,
VMWARE VMCI DRIVER, VMWARE VMMOUSE SUBDRIVER and VMWARE VMXNET3
ETHERNET DRIVER.

This needs to be done to conform to the guidelines in [1].
Maintainers for these drivers are VMware employees.

[1] https://docs.kernel.org/process/maintainers.html

Signed-off-by: Vishnu Dasa 
---
 MAINTAINERS | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b75eb23a099b..5a634b5d6f6c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21812,7 +21812,7 @@ VMWARE BALLOON DRIVER
 M: Nadav Amit 
 R: VMware PV-Drivers Reviewers 
 L: linux-ker...@vger.kernel.org
-S: Maintained
+S: Supported
 F: drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
@@ -21831,14 +21831,14 @@ M:Bryan Tan 
 M: Vishnu Dasa 
 R: VMware PV-Drivers Reviewers 
 L: linux-r...@vger.kernel.org
-S: Maintained
+S: Supported
 F: drivers/infiniband/hw/vmw_pvrdma/
 
 VMWARE PVSCSI DRIVER
 M: Vishal Bhakta 
 R: VMware PV-Drivers Reviewers 
 L: linux-s...@vger.kernel.org
-S: Maintained
+S: Supported
 F: drivers/scsi/vmw_pvscsi.c
 F: drivers/scsi/vmw_pvscsi.h
 
@@ -21854,7 +21854,7 @@ M:  Bryan Tan 
 M: Vishnu Dasa 
 R: VMware PV-Drivers Reviewers 
 L: linux-ker...@vger.kernel.org
-S: Maintained
+S: Supported
 F: drivers/misc/vmw_vmci/
 
 VMWARE VMMOUSE SUBDRIVER
@@ -21862,7 +21862,7 @@ M:  Zack Rusin 
 R: VMware Graphics Reviewers 
 R: VMware PV-Drivers Reviewers 
 L: linux-in...@vger.kernel.org
-S: Maintained
+S: Supported
 F: drivers/input/mouse/vmmouse.c
 F: drivers/input/mouse/vmmouse.h
 
@@ -21870,7 +21870,7 @@ VMWARE VMXNET3 ETHERNET DRIVER
 M: Ronak Doshi 
 R: VMware PV-Drivers Reviewers 
 L: net...@vger.kernel.org
-S: Maintained
+S: Supported
 F: drivers/net/vmxnet3/
 
 VOCORE VOCORE2 BOARD
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/3] MAINTAINERS: Change VMware PVSCSI driver entry to upper case

2022-09-06 Thread vdasa--- via Virtualization
From: Vishnu Dasa 

Change 'VMware PVSCSI driver' entry to upper case.

This is a trivial change being done for uniformity.

Signed-off-by: Vishnu Dasa 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38f9ef4b53ec..b75eb23a099b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21834,7 +21834,7 @@ L:  linux-r...@vger.kernel.org
 S: Maintained
 F: drivers/infiniband/hw/vmw_pvrdma/
 
-VMware PVSCSI driver
+VMWARE PVSCSI DRIVER
 M: Vishal Bhakta 
 R: VMware PV-Drivers Reviewers 
 L: linux-s...@vger.kernel.org
-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/3] MAINTAINERS: Update entries for some VMware drivers

2022-09-06 Thread vdasa--- via Virtualization
From: Vishnu Dasa 

This series updates a few existing maintainer entries for VMware
supported drivers and adds a new entry for vsock vmci transport
driver.

Vishnu Dasa (3):
  MAINTAINERS: Change VMware PVSCSI driver entry to upper case
  MAINTAINERS: Change status of some VMware drivers
  MAINTAINERS: Add a new entry for VMWARE VSOCK VMCI TRANSPORT DRIVER

 MAINTAINERS | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

-- 
2.35.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] drm/bochs: fix blanking

2022-09-06 Thread Gerd Hoffmann
VGA_IS1_RC is the color mode register (VGA_IS1_RM the one for monochrome
mode, note C vs. M at the end).  So when using VGA_IS1_RC make sure the
vga device is actually in color mode and set the corresponding bit in the
misc register.

Reproducible when booting VMs in UEFI mode with some edk2 versions (edk2
fix is on the way too).  Doesn't happen in BIOS mode because in that
case the vgabios already flips the bit.

Fixes: 250e743915d4 ("drm/bochs: Add screen blanking support")
Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/tiny/bochs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 08de13774862..a51262289aef 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -309,6 +309,8 @@ static void bochs_hw_fini(struct drm_device *dev)
 static void bochs_hw_blank(struct bochs_device *bochs, bool blank)
 {
DRM_DEBUG_DRIVER("hw_blank %d\n", blank);
+   /* enable color bit (so VGA_IS1_RC access works) */
+   bochs_vga_writeb(bochs, VGA_MIS_W, VGA_MIS_COLOR);
/* discard ar_flip_flop */
(void)bochs_vga_readb(bochs, VGA_IS1_RC);
/* blank or unblank; we need only update index and set 0x20 */
-- 
2.37.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtiofs: Drop unnecessary initialization in send_forget_request and virtio_fs_get_tree

2022-09-06 Thread Stefan Hajnoczi
On Tue, Sep 06, 2022 at 01:38:48AM -0400, Deming Wang wrote:
> The variable is initialized but it is only used after its assignment.
> 
> Signed-off-by: Deming Wang 
> ---
>  fs/fuse/virtio_fs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 4d8d4f16c..bffe74d44 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -414,7 +414,7 @@ static int send_forget_request(struct virtio_fs_vq *fsvq,
>  {
>   struct scatterlist sg;
>   struct virtqueue *vq;
> - int ret = 0;
> + int ret;
>   bool notify;
>   struct virtio_fs_forget_req *req = >req;
>  

That causes an uninitialized access in the source tree I'm looking at
(c5e4d5e99162ba8025d58a3af7ad103f155d2df7):

  static int send_forget_request(struct virtio_fs_vq *fsvq,
 struct virtio_fs_forget *forget,
 bool in_flight)
  {
  struct scatterlist sg;
  struct virtqueue *vq;
  int ret = 0;
  ^^^
  bool notify;
  struct virtio_fs_forget_req *req = >req;
  
  spin_lock(>lock);
  if (!fsvq->connected) {
  if (in_flight)
  dec_in_flight_req(fsvq);
  kfree(forget);
  goto out;
  ...
  out:
  spin_unlock(>lock);
  return ret;
 ^^^
  }

What is the purpose of this patch? Is there a compiler warning (if so,
which compiler and version)? Do you have a static analysis tool that
reported this (if yes, then maybe it's broken)?

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc

2022-09-06 Thread Stefan Hajnoczi
Hi Bobby,
If you are attending Linux Foundation conferences in Dublin, Ireland
next week (Linux Plumbers Conference, Open Source Summit Europe, KVM
Forum, ContainerCon Europe, CloudOpen Europe, etc) then you could meet
Stefano Garzarella and others to discuss this patch series.

Using netdev and sk_buff is a big change to vsock. Discussing your
requirements and the future direction of vsock in person could help.

If you won't be in Dublin, don't worry. You can schedule a video call if
you feel it would be helpful to discuss these topics.

Stefan


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/6] vsock: add netdev to vhost/virtio vsock

2022-09-06 Thread Michael S. Tsirkin
On Mon, Aug 15, 2022 at 10:56:06AM -0700, Bobby Eshleman wrote:
> In order to support usage of qdisc on vsock traffic, this commit
> introduces a struct net_device to vhost and virtio vsock.
> 
> Two new devices are created, vhost-vsock for vhost and virtio-vsock
> for virtio. The devices are attached to the respective transports.
> 
> To bypass the usage of the device, the user may "down" the associated
> network interface using common tools. For example, "ip link set dev
> virtio-vsock down" lets vsock bypass the net_device and qdisc entirely,
> simply using the FIFO logic of the prior implementation.
> 
> For both hosts and guests, there is one device for all G2H vsock sockets
> and one device for all H2G vsock sockets. This makes sense for guests
> because the driver only supports a single vsock channel (one pair of
> TX/RX virtqueues), so one device and qdisc fits. For hosts, this may not
> seem ideal for some workloads. However, it is possible to use a
> multi-queue qdisc, where a given queue is responsible for a range of
> sockets. This seems to be a better solution than having one device per
> socket, which may yield a very large number of devices and qdiscs, all
> of which are dynamically being created and destroyed. Because of this
> dynamism, it would also require a complex policy management daemon, as
> devices would constantly be spun up and down as sockets were created and
> destroyed. To avoid this, one device and qdisc also applies to all H2G
> sockets.
> 
> Signed-off-by: Bobby Eshleman 


I've been thinking about this generally. vsock currently
assumes reliability, but with qdisc can't we get
packet drops e.g. depending on the queueing?

What prevents user from configuring such a discipline?
One thing people like about vsock is that it's very hard
to break H2G communication even with misconfigured
networking.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: add cond_resched() to the command waiting loop

2022-09-06 Thread Paolo Abeni
On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin  wrote:
> > 
> > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > Adding cond_resched() to the command waiting loop for a better
> > > co-operation with the scheduler. This allows to give CPU a breath to
> > > run other task(workqueue) instead of busy looping when preemption is
> > > not allowed.
> > > 
> > > What's more important. This is a must for some vDPA parent to work
> > > since control virtqueue is emulated via a workqueue for those parents.
> > > 
> > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > 
> > That's a weird commit to fix. so it fixes the simulator?
> 
> Yes, since the simulator is using a workqueue to handle control virtueue.

Uhmm... touching a driver for a simulator's sake looks a little weird. 

Additionally, if the bug is vdpasim, I think it's better to try to
solve it there, if possible.

Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
neither needs a process context, so perhaps you could rework it to run
the work_fn() directly from vdpasim_kick_vq(), at least for the control
virtqueue?

Thanks!

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtiofs: Drop unnecessary initialization in send_forget_request and virtio_fs_get_tree

2022-09-06 Thread kernel test robot
Hi Deming,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v6.0-rc4]
[also build test WARNING on linus/master next-20220901]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Deming-Wang/virtiofs-Drop-unnecessary-initialization-in-send_forget_request-and-virtio_fs_get_tree/20220906-135058
base:7e18e42e4b280c85b76967a9106a13ca61c16179
config: hexagon-randconfig-r035-20220906 
(https://download.01.org/0day-ci/archive/20220906/202209061738.epufa2ef-...@intel.com/config)
compiler: clang version 16.0.0 (https://github.com/llvm/llvm-project 
c55b41d5199d2394dd6cdb8f52180d8b81d809d4)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/intel-lab-lkp/linux/commit/a61f879fdb56490afddb6ddea4a9d57226f339f3
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Deming-Wang/virtiofs-Drop-unnecessary-initialization-in-send_forget_request-and-virtio_fs_get_tree/20220906-135058
git checkout a61f879fdb56490afddb6ddea4a9d57226f339f3
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=hexagon SHELL=/bin/bash fs/fuse/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> fs/fuse/virtio_fs.c:422:2: warning: variable 'ret' is used uninitialized 
>> whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (!fsvq->connected) {
   ^
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
  ^~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))

^~
   fs/fuse/virtio_fs.c:465:9: note: uninitialized use occurs here
   return ret;
  ^~~
   fs/fuse/virtio_fs.c:422:2: note: remove the 'if' if its condition is always 
false
   if (!fsvq->connected) {
   ^~~
   include/linux/compiler.h:56:23: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
 ^
   fs/fuse/virtio_fs.c:417:9: note: initialize the variable 'ret' to silence 
this warning
   int ret;
  ^
   = 0
>> fs/fuse/virtio_fs.c:1433:2: warning: variable 'err' is used uninitialized 
>> whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (WARN_ON(virtqueue_size <= FUSE_HEADER_OVERHEAD))
   ^~~~
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
  ^~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))

^~
   fs/fuse/virtio_fs.c:1481:9: note: uninitialized use occurs here
   return err;
  ^~~
   fs/fuse/virtio_fs.c:1433:2: note: remove the 'if' if its condition is always 
false
   if (WARN_ON(virtqueue_size <= FUSE_HEADER_OVERHEAD))
   ^~~~
   include/linux/compiler.h:56:23: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
 ^
   fs/fuse/virtio_fs.c:1420:9: note: initialize the variable 'err' to silence 
this warning
   int err;
  ^
   = 0
   2 warnings generated.


vim +422 fs/fuse/virtio_fs.c

a62a8ef9d97da2 Stefan Hajnoczi 2018-06-12  406  
58ada94f95f71d Vivek Goyal 2019-10-30  407  /*
58ada94f95f71d Vivek Goyal 2019-10-30  408   * Returns 1 if queue is full 
and sender should wait a bit before sending
58ada94f95f71d Vivek Goyal 2019-10-30  409   * next request, 0 otherwise.
58ada94f95f71d Vivek Goyal 2019-10-30  410   */
58ada94f95f71d Vivek Goyal 2019-10-30  411  static int 
send_forget_request(struct virti

Re: [PATCH] Documentation: add basic information on vDPA

2022-09-06 Thread Xuan Zhuo
On Wed, 17 Aug 2022 18:19:56 -0400, Stefan Hajnoczi  wrote:
> The vDPA driver framework is largely undocumented. Add a basic page that
> describes what vDPA is, where to get more information, and how to use
> the simulator for testing.
>
> In the future it would be nice to add an overview of the driver API as
> well as comprehensive doc comments.
>
> Cc: "Michael S. Tsirkin" 
> Cc: Jason Wang 
> Cc: Stefano Garzarella 
> Signed-off-by: Stefan Hajnoczi 

Reviewed-by: Xuan Zhuo 

> ---
>  Documentation/driver-api/index.rst |  1 +
>  Documentation/driver-api/vdpa.rst  | 40 ++
>  2 files changed, 41 insertions(+)
>  create mode 100644 Documentation/driver-api/vdpa.rst
>
> diff --git a/Documentation/driver-api/index.rst 
> b/Documentation/driver-api/index.rst
> index d3a58f77328e..e307779568d4 100644
> --- a/Documentation/driver-api/index.rst
> +++ b/Documentation/driver-api/index.rst
> @@ -103,6 +103,7 @@ available subsections can be seen below.
> switchtec
> sync_file
> tty/index
> +   vdpa
> vfio-mediated-device
> vfio
> vfio-pci-device-specific-driver-acceptance
> diff --git a/Documentation/driver-api/vdpa.rst 
> b/Documentation/driver-api/vdpa.rst
> new file mode 100644
> index ..75c666548e1d
> --- /dev/null
> +++ b/Documentation/driver-api/vdpa.rst
> @@ -0,0 +1,40 @@
> +
> +vDPA - VIRTIO Data Path Acceleration
> +
> +
> +The vDPA driver framework can be used to implement VIRTIO devices that are
> +backed by physical hardware or by software. While the device's data path
> +complies with the VIRTIO specification, the control path is driver-specific 
> and
> +a netlink interface exists for instantiating devices.
> +
> +vDPA devices can be attached to the kernel's VIRTIO device drivers or exposed
> +to userspace emulators/virtualizers such as QEMU through vhost.
> +
> +The driver API is not documented beyond the doc comments in . 
> The
> +netlink API is not documented beyond the doc comments in .
> +The existing vDPA drivers serve as reference code for those wishing to 
> develop
> +their own drivers.
> +
> +See https://vdpa-dev.gitlab.io/ for more information about vDPA.
> +
> +Questions can be sent to the virtualization@lists.linux-foundation.org 
> mailing
> +list.
> +
> +Device simulators
> +-
> +
> +There are software vDPA device simulators for testing, prototyping, and
> +development purposes. The simulators do not require physical hardware.
> +
> +Available simulators include:
> +
> +- `vdpa_sim_net` implements a virtio-net loopback device.
> +- `vdpa_sim_blk` implements a memory-backed virtio-blk device.
> +
> +To use `vdpa_sim_blk` through vhost::
> +
> +  # modprobe vhost_vdpa
> +  # modprobe vdpa_sim_blk
> +  # vdpa dev add name vdpa-blk1 mgmtdev vdpasim_blk
> +  ...use /dev/vhost-dev-0...
> +  # vdpa dev del vdpa-blk1
> --
> 2.37.2
>
> ___
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/6] virtio/vsock: introduce dgrams, sk_buff, and qdisc

2022-09-06 Thread Stefano Garzarella

On Thu, Aug 18, 2022 at 12:28:48PM +0800, Jason Wang wrote:


在 2022/8/17 14:54, Michael S. Tsirkin 写道:

On Mon, Aug 15, 2022 at 10:56:03AM -0700, Bobby Eshleman wrote:

Hey everybody,

This series introduces datagrams, packet scheduling, and sk_buff usage
to virtio vsock.

The usage of struct sk_buff benefits users by a) preparing vsock to use
other related systems that require sk_buff, such as sockmap and qdisc,
b) supporting basic congestion control via sock_alloc_send_skb, and c)
reducing copying when delivering packets to TAP.

The socket layer no longer forces errors to be -ENOMEM, as typically
userspace expects -EAGAIN when the sk_sndbuf threshold is reached and
messages are being sent with option MSG_DONTWAIT.

The datagram work is based off previous patches by Jiang Wang[1].

The introduction of datagrams creates a transport layer fairness issue
where datagrams may freely starve streams of queue access. This happens
because, unlike streams, datagrams lack the transactions necessary for
calculating credits and throttling.

Previous proposals introduce changes to the spec to add an additional
virtqueue pair for datagrams[1]. Although this solution works, using
Linux's qdisc for packet scheduling leverages already existing systems,
avoids the need to change the virtio specification, and gives additional
capabilities. The usage of SFQ or fq_codel, for example, may solve the
transport layer starvation problem. It is easy to imagine other use
cases as well. For example, services of varying importance may be
assigned different priorities, and qdisc will apply appropriate
priority-based scheduling. By default, the system default pfifo qdisc is
used. The qdisc may be bypassed and legacy queuing is resumed by simply
setting the virtio-vsock%d network device to state DOWN. This technique
still allows vsock to work with zero-configuration.

The basic question to answer then is this: with a net device qdisc
etc in the picture, how is this different from virtio net then?
Why do you still want to use vsock?



Or maybe it's time to revisit an old idea[1] to unify at least the 
driver part (e.g using virtio-net driver for vsock then we can all 
features that vsock is lacking now)?


Sorry for coming late to the discussion!

This would be great, though, last time I had looked at it, I had found 
it quite complicated. The main problem is trying to avoid all the 
net-specific stuff (MTU, ethernet header, HW offloading, etc.).


Maybe we could start thinking about this idea by adding a new transport 
to vsock (e.g. virtio-net-vsock) completely separate from what we have 
now.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] Documentation: add basic information on vDPA

2022-09-06 Thread Stefano Garzarella

On Wed, Aug 17, 2022 at 06:19:56PM -0400, Stefan Hajnoczi wrote:

The vDPA driver framework is largely undocumented. Add a basic page that
describes what vDPA is, where to get more information, and how to use
the simulator for testing.

In the future it would be nice to add an overview of the driver API as
well as comprehensive doc comments.

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Stefano Garzarella 
Signed-off-by: Stefan Hajnoczi 
---
Documentation/driver-api/index.rst |  1 +
Documentation/driver-api/vdpa.rst  | 40 ++
2 files changed, 41 insertions(+)
create mode 100644 Documentation/driver-api/vdpa.rst


Thank you for this work!

Reviewed-by: Stefano Garzarella 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization