Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-30 Thread Christoph Hellwig
On Thu, Apr 20, 2023 at 06:42:17PM +0200, Alexander Lobakin wrote:
> When there's no recycling of pages, then yes. And since recycling is
> done asynchronously, sometimes new allocations happen either way.
> Anyways, that was roughly a couple years ago right when you introduced
> dma_alloc_noncoherent(). Things might've been changed since then.
> I could try again while next is closed (i.e. starting this Sunday), the
> only thing I'd like to mention: Page Pool allocates pages via
> alloc_pages_bulk_array_node(). Bulking helps a lot (and PP uses bulks of
> 16 IIRC), explicit node setting helps when Rx queues are distributed
> between several nodes. We can then have one struct device for several nodes.
> As I can see, there's now no function to allocate in bulks and no
> explicit node setting option (e.g. mlx5 works around this using
> set_dev_node() + allocate + set_dev_node(orig_node)). Could such options
> be added in near future? That would help a lot switching to the
> functions intended for use when DMA mappings can stay for a long time.
> >From what I see from the code, that shouldn't be a problem (except for
> non-direct DMA cases, where we'd need to introduce new callbacks or
> extend the existing ones).

So the node hint is something we can triviall pass through, and
something the mlx5 maintainers should have done from the beginning
instead of this nasty hack.  Patches gladly accepted.

A alloc_pages_bulk_array_node-like allocator also seems doable, we
just need to make sure it has a decent fallback as I don't think
we can wire it up to all the crazy legacy iommu drivers.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v7 01/11] virtio_ring: split: separate dma codes

2023-04-30 Thread Christoph Hellwig
> +static dma_addr_t vring_sg_address(struct scatterlist *sg)
> +{
> + if (sg->dma_address)
> + return sg->dma_address;

0 is a perfectly valid DMA address.  So I have no idea how this is
even supposed to work.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-30 Thread Christoph Hellwig
On Tue, Apr 25, 2023 at 04:12:05AM -0400, Michael S. Tsirkin wrote:
> In theory, absolutely. In practice modern virtio devices are ok,
> the reason we are stuck supporting old legacy ones is because legacy
> devices are needed to run old guests. And then people turn
> around and run a new guest on the same device,
> for example because they switch back and forth e.g.
> for data recovery? Or because whoever is selling the
> host wants to opt for maximum compatibility.
> 
> Teaching all of linux to sometimes use dma and sometimes not
> is a lot of work, and for limited benefit of these legacy systems.

It's not like virtio is the only case where blindly assuming
your can use DMA operations in a higher layer is the problem..
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2

2023-04-30 Thread Alvaro Karsz
> > At the moment, if a network device uses vrings with less than
> > MAX_SKB_FRAGS + 2 entries, the device won't be functional.
> >
> > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
> > evaluate to false, leading to TX timeouts.
> >
> > This patch introduces a new variable, single_pkt_max_descs, that holds
> > the max number of descriptors we may need to handle a single packet.
> >
> > This patch also detects the small vring during probe, blocks some
> > features that can't be used with small vrings, and fails probe,
> > leading to a reset and features re-negotiation.
> >
> > Features that can't be used with small vrings:
> > GRO features (VIRTIO_NET_F_GUEST_*):
> > When we use small vrings, we may not have enough entries in the ring to
> > chain page size buffers and form a 64K buffer.
> > So we may need to allocate 64k of continuous memory, which may be too
> > much when the system is stressed.
> >
> > This patch also fixes the MTU size in small vring cases to be up to the
> > default one, 1500B.
> 
> and then it should clear VIRTIO_NET_F_MTU?
> 

Following [1], I was thinking to accept the feature and a let the device figure 
out that it can't transmit a big packet, since the RX buffers are not big 
enough (without VIRTIO_NET_F_MRG_RXBUF).
But, I think that we may need to block the MTU feature after all.
Quoting the spec:

A driver SHOULD negotiate VIRTIO_NET_F_MTU if the device offers it.
If the driver negotiates VIRTIO_NET_F_MTU, it MUST supply enough receive 
buffers to receive at least one receive packet of size mtu (plus low level 
ethernet header length) with gso_type NONE or ECN.

So, if VIRTIO_NET_F_MTU is negotiated, we MUST supply enough receive buffers.
So I think that blocking VIRTIO_NET_F_MTU  should be the way to go, If mtu > 
1500.

[1] https://lore.kernel.org/lkml/20230417031052-mutt-send-email-...@kernel.org/

> > + /* How many ring descriptors we may need to transmit a single packet 
> > */
> > + u16 single_pkt_max_descs;
> > +
> > + /* Do we have virtqueues with small vrings? */
> > + bool svring;
> > +
> >   /* CPU hotplug instances for online & dead */
> >   struct hlist_node node;
> >   struct hlist_node node_dead;
> 
> worth checking that all these layout changes don't push useful things to
> a different cache line. can you add that analysis?
> 

Good point.
I think that we can just move these to the bottom of the struct.

> 
> I see confusiong here wrt whether some rings are "small"? all of them?
> some rx rings? some tx rings? names should make it clear.

The small vring is a device attribute, not a vq attribute. It blocks features, 
which affects the entire device.
Maybe we can call it "small vring mode".

> also do we really need bool svring? can't we just check single_pkt_max_descs
> all the time?
> 

We can work without the bool, we could always check if single_pkt_max_descs != 
MAX_SKB_FRAGS + 2.
It doesn't really matter to me, I was thinking it may be more readable this way.

> > +static bool virtnet_uses_svring(struct virtnet_info *vi)
> > +{
> > + u32 i;
> > +
> > + /* If a transmit/receive virtqueue is small,
> > +  * we cannot handle fragmented packets.
> > +  */
> > + for (i = 0; i < vi->max_queue_pairs; i++) {
> > + if (IS_SMALL_VRING(virtqueue_get_vring_size(vi->sq[i].vq)) ||
> > + IS_SMALL_VRING(virtqueue_get_vring_size(vi->rq[i].vq)))
> > + return true;
> > + }
> > +
> > + return false;
> > +}
> 
> I see even if only some rings are too small we force everything to use
> small ones. Wouldn't it be better to just disable small ones in this
> case? That would not need a reset.
> 

I'm not sure. It may complicate things.

What if all TX vqs are small?
What if all RX vqs are small?
What if we end up with an unbalanced number of TX vqs and RX vqs? is this 
allowed by the spec?
What if we end up disabling the RX default vq (receiveq1)?

I guess we could do it, after checking some conditions.
Maybe we can do it in a follow up patch?
Do you think it's important for it to be included since day 1?

I think that the question is: what's more important, to use all the vqs while 
blocking some features, or to use part of the vqs without blocking features?

> > +
> > +/* Function returns the number of features it blocked */
> 
> We don't need the # though. Make it bool?
> 

Sure.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 1/3] virtio: re-negotiate features if probe fails and features are blocked

2023-04-30 Thread Alvaro Karsz
> > +void virtio_block_feature(struct virtio_device *dev, unsigned int f)
> > +{
> > + BUG_ON(f >= 64);
> > + dev->blocked_features |= (1ULL << f);
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_block_feature);
> > +
> 
> Let's add documentation please. Also pls call it __virtio_block_feature
> since it has to be used in a special way - specifically only during
> probe.
> 

Ok.

> > + /* Store blocked features and attempt to negotiate features & probe.
> > +  * If the probe fails, we check if the driver has blocked any new 
> > features.
> > +  * If it has, we reset the device and try again with the new features.
> > +  */
> > + while (renegotiate) {
> > + blocked_features = dev->blocked_features;
> > + err = virtio_negotiate_features(dev);
> > + if (err)
> > + break;
> > +
> > + err = drv->probe(dev);
> 
> 
> there's no way to driver to clear blocked features, but
> just in case, I'd add BUG_ON to check.
> 

Ok.

> >   * @features: the features supported by both driver and device.
> > + * @blocked_features: the features blocked by the driver that can't be 
> > negotiated.
> >   * @priv: private pointer for the driver's use.
> >   */
> >  struct virtio_device {
> > @@ -124,6 +125,7 @@ struct virtio_device {
> >   const struct vringh_config_ops *vringh_config;
> >   struct list_head vqs;
> >   u64 features;
> > + u64 blocked_features;
> 
> add comment here too, explain purpose and rules of use
> 

Ok.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 0/3] virtio-net: allow usage of small vrings

2023-04-30 Thread Alvaro Karsz


> > This patchset follows a discussion in the mailing list [1].
> >
> > This fixes only part of the bug, rings with less than 4 entries won't
> > work.
> 
> Why the difference?
> 

Because the RING_SIZE < 4 case requires much more adjustments.

* We may need to squeeze the virtio header into the headroom.
* We may need to squeeze the GSO header into the headroom, or block the 
features.
* At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we 
may need to reduce it to 1.
* We may need to change all the control commands, so class,  command and 
command specific data will fit in a single segment.
* We may need to disable the control command and all the features depending on 
it.
* We may need to disable NAPI?

There may be more changes..

I was thinking that it may be easier to start with the easier case RING_SIZE >= 
4, make sure everything is working fine, then send a follow up patchset with 
the required adjustments for RING_SIZE < 4.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 0/3] virtio-net: allow usage of small vrings

2023-04-30 Thread Michael S. Tsirkin
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote:
> At the moment, if a virtio network device uses vrings with less than
> MAX_SKB_FRAGS + 2 entries, the device won't be functional.
> 
> The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
> evaluate to false, leading to TX timeouts.
> 
> This patchset attempts this fix this bug, and to allow small rings down
> to 4 entries.
> The first patch introduces a new mechanism in virtio core - it allows to
> block features in probe time.
> 
> If a virtio drivers blocks features and fails probe, virtio core will
> reset the device, re-negotiate the features and probe again.
> 
> This is needed since some virtio net features are not supported with
> small rings.
> 
> This patchset follows a discussion in the mailing list [1].
> 
> This fixes only part of the bug, rings with less than 4 entries won't
> work.

Why the difference?

> My intention is to split the effort and fix the RING_SIZE < 4 case in a
> follow up patchset.
> 
> Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset?

I'd keep current behaviour.

> I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed
> and split VQs, with rings down to 4 entries, with and without
> VIRTIO_NET_F_MRG_RXBUF, with big MTUs.
> 
> I would appreciate more testing.
> Xuan: I wasn't able to test XDP with my setup, maybe you can help with
> that?
> 
> [1] 
> https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.ka...@solid-run.com/
> 
> Alvaro Karsz (3):
>   virtio: re-negotiate features if probe fails and features are blocked
>   virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
>   virtio-net: block ethtool from converting a ring to a small ring
> 
>  drivers/net/virtio_net.c | 161 +--
>  drivers/virtio/virtio.c  |  73 +-
>  include/linux/virtio.h   |   3 +
>  3 files changed, 212 insertions(+), 25 deletions(-)
> 
> -- 
> 2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2

2023-04-30 Thread Michael S. Tsirkin
On Sun, Apr 30, 2023 at 04:15:17PM +0300, Alvaro Karsz wrote:
> At the moment, if a network device uses vrings with less than
> MAX_SKB_FRAGS + 2 entries, the device won't be functional.
> 
> The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
> evaluate to false, leading to TX timeouts.
> 
> This patch introduces a new variable, single_pkt_max_descs, that holds
> the max number of descriptors we may need to handle a single packet.
> 
> This patch also detects the small vring during probe, blocks some
> features that can't be used with small vrings, and fails probe,
> leading to a reset and features re-negotiation.
> 
> Features that can't be used with small vrings:
> GRO features (VIRTIO_NET_F_GUEST_*):
> When we use small vrings, we may not have enough entries in the ring to
> chain page size buffers and form a 64K buffer.
> So we may need to allocate 64k of continuous memory, which may be too
> much when the system is stressed.
> 
> This patch also fixes the MTU size in small vring cases to be up to the
> default one, 1500B.

and then it should clear VIRTIO_NET_F_MTU?

> Signed-off-by: Alvaro Karsz 




> ---
>  drivers/net/virtio_net.c | 149 +--
>  1 file changed, 144 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8d8038538fc..b4441d63890 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -103,6 +103,8 @@ struct virtnet_rq_stats {
>  #define VIRTNET_SQ_STAT(m)   offsetof(struct virtnet_sq_stats, m)
>  #define VIRTNET_RQ_STAT(m)   offsetof(struct virtnet_rq_stats, m)
>  
> +#define IS_SMALL_VRING(size) ((size) < MAX_SKB_FRAGS + 2)
> +
>  static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
>   { "packets",VIRTNET_SQ_STAT(packets) },
>   { "bytes",  VIRTNET_SQ_STAT(bytes) },
> @@ -268,6 +270,12 @@ struct virtnet_info {
>   /* Does the affinity hint is set for virtqueues? */
>   bool affinity_hint_set;
>  
> + /* How many ring descriptors we may need to transmit a single packet */
> + u16 single_pkt_max_descs;
> +
> + /* Do we have virtqueues with small vrings? */
> + bool svring;
> +
>   /* CPU hotplug instances for online & dead */
>   struct hlist_node node;
>   struct hlist_node node_dead;

worth checking that all these layout changes don't push useful things to
a different cache line. can you add that analysis?

I see confusiong here wrt whether some rings are "small"? all of them?
some rx rings? some tx rings? names should make it clear.
also do we really need bool svring? can't we just check single_pkt_max_descs
all the time?

> @@ -455,6 +463,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   unsigned int copy, hdr_len, hdr_padded_len;
>   struct page *page_to_free = NULL;
>   int tailroom, shinfo_size;
> + u16 max_frags = MAX_SKB_FRAGS;
>   char *p, *hdr_p, *buf;
>  
>   p = page_address(page) + offset;
> @@ -520,7 +529,10 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>* tries to receive more than is possible. This is usually
>* the case of a broken device.
>*/
> - if (unlikely(len > MAX_SKB_FRAGS * PAGE_SIZE)) {
> + if (unlikely(vi->svring))
> + max_frags = 1;
> +
> + if (unlikely(len > max_frags * PAGE_SIZE)) {
>   net_dbg_ratelimited("%s: too much data\n", skb->dev->name);
>   dev_kfree_skb(skb);
>   return NULL;
> @@ -612,7 +624,7 @@ static void check_sq_full_and_disable(struct virtnet_info 
> *vi,
>* Since most packets only take 1 or 2 ring slots, stopping the queue
>* early means 16 slots are typically wasted.
>*/
> - if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
> + if (sq->vq->num_free < vi->single_pkt_max_descs) {
>   netif_stop_subqueue(dev, qnum);
>   if (use_napi) {
>   if (unlikely(!virtqueue_enable_cb_delayed(sq->vq)))
> @@ -620,7 +632,7 @@ static void check_sq_full_and_disable(struct virtnet_info 
> *vi,
>   } else if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>   /* More just got used, free them then recheck. */
>   free_old_xmit_skbs(sq, false);
> - if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
> + if (sq->vq->num_free >= vi->single_pkt_max_descs) {
>   netif_start_subqueue(dev, qnum);
>   virtqueue_disable_cb(sq->vq);
>   }
> @@ -1108,6 +1120,10 @@ static int virtnet_build_xdp_buff_mrg(struct 
> net_device *dev,
>   return 0;
>  
>   if (*num_buf > 1) {
> + /* Small vring - can't be more than 1 buffer */
> + if (unlikely(vi->svring))
> + return -EINVAL;
> +
>   /* If we want to build multi-buffer xdp, 

Re: [RFC PATCH net 1/3] virtio: re-negotiate features if probe fails and features are blocked

2023-04-30 Thread Michael S. Tsirkin
On Sun, Apr 30, 2023 at 04:15:16PM +0300, Alvaro Karsz wrote:
> This patch exports a new virtio core function: virtio_block_feature.
> The function should be called during a virtio driver probe.
> 
> If a virtio driver blocks features during probe and fails probe, virtio
> core will reset the device, try to re-negotiate the new features and
> probe again.
> 
> Signed-off-by: Alvaro Karsz 
> ---
>  drivers/virtio/virtio.c | 73 ++---
>  include/linux/virtio.h  |  3 ++
>  2 files changed, 56 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 3893dc29eb2..eaad5b6a7a9 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -167,6 +167,13 @@ void virtio_add_status(struct virtio_device *dev, 
> unsigned int status)
>  }
>  EXPORT_SYMBOL_GPL(virtio_add_status);
>  
> +void virtio_block_feature(struct virtio_device *dev, unsigned int f)
> +{
> + BUG_ON(f >= 64);
> + dev->blocked_features |= (1ULL << f);
> +}
> +EXPORT_SYMBOL_GPL(virtio_block_feature);
> +

Let's add documentation please. Also pls call it __virtio_block_feature
since it has to be used in a special way - specifically only during
probe.

>  /* Do some validation, then set FEATURES_OK */
>  static int virtio_features_ok(struct virtio_device *dev)
>  {
> @@ -234,17 +241,13 @@ void virtio_reset_device(struct virtio_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(virtio_reset_device);
>  
> -static int virtio_dev_probe(struct device *_d)
> +static int virtio_negotiate_features(struct virtio_device *dev)
>  {
> - int err, i;
> - struct virtio_device *dev = dev_to_virtio(_d);
>   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
>   u64 device_features;
>   u64 driver_features;
>   u64 driver_features_legacy;
> -
> - /* We have a driver! */
> - virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> + int i, ret;
>  
>   /* Figure out what features the device supports. */
>   device_features = dev->config->get_features(dev);
> @@ -279,30 +282,61 @@ static int virtio_dev_probe(struct device *_d)
>   if (device_features & (1ULL << i))
>   __virtio_set_bit(dev, i);
>  
> - err = dev->config->finalize_features(dev);
> - if (err)
> - goto err;
> + /* Remove blocked features */
> + dev->features &= ~dev->blocked_features;
> +
> + ret = dev->config->finalize_features(dev);
> + if (ret)
> + goto exit;
>  
>   if (drv->validate) {
>   u64 features = dev->features;
>  
> - err = drv->validate(dev);
> - if (err)
> - goto err;
> + ret = drv->validate(dev);
> + if (ret)
> + goto exit;
>  
>   /* Did validation change any features? Then write them again. */
>   if (features != dev->features) {
> - err = dev->config->finalize_features(dev);
> - if (err)
> - goto err;
> + ret = dev->config->finalize_features(dev);
> + if (ret)
> + goto exit;
>   }
>   }
>  
> - err = virtio_features_ok(dev);
> - if (err)
> - goto err;
> + ret = virtio_features_ok(dev);
> +exit:
> + return ret;
> +}
> +
> +static int virtio_dev_probe(struct device *_d)
> +{
> + int err;
> + struct virtio_device *dev = dev_to_virtio(_d);
> + struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> + u64 blocked_features;
> + bool renegotiate = true;
> +
> + /* We have a driver! */
> + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +
> + /* Store blocked features and attempt to negotiate features & probe.
> +  * If the probe fails, we check if the driver has blocked any new 
> features.
> +  * If it has, we reset the device and try again with the new features.
> +  */
> + while (renegotiate) {
> + blocked_features = dev->blocked_features;
> + err = virtio_negotiate_features(dev);
> + if (err)
> + break;
> +
> + err = drv->probe(dev);


there's no way to driver to clear blocked features, but
just in case, I'd add BUG_ON to check.

> + if (err && blocked_features != dev->blocked_features)
> + virtio_reset_device(dev);
> + else
> + renegotiate = false;
> + }
>  
> - err = drv->probe(dev);
>   if (err)
>   goto err;
>  
> @@ -319,7 +353,6 @@ static int virtio_dev_probe(struct device *_d)
>  err:
>   virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>   return err;
> -
>  }
>  
>  static void virtio_dev_remove(struct device *_d)
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b93238db94e..2de9b2d3ca4 100644
> --- a/include/linux/virtio.h
> 

[RFC PATCH net 3/3] virtio-net: block ethtool from converting a ring to a small ring

2023-04-30 Thread Alvaro Karsz
Stop ethtool from resizing a TX/RX ring to size less than
MAX_SKB_FRAGS + 2, if the ring was initialized with a bigger size.

We cannot convert a "normal" ring to a "small" ring in runtime.

Signed-off-by: Alvaro Karsz 
---
 drivers/net/virtio_net.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b4441d63890..b8238eaa1e4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2071,6 +2071,12 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
bool running = netif_running(vi->dev);
int err, qindex;
 
+   /* We cannot convert a ring to a small vring */
+   if (!vi->svring && IS_SMALL_VRING(ring_num)) {
+   netdev_err(vi->dev, "resize rx fail: size is too small..\n");
+   return -EINVAL;
+   }
+
qindex = rq - vi->rq;
 
if (running)
@@ -2097,6 +2103,12 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
 
qindex = sq - vi->sq;
 
+   /* We cannot convert a ring to a small vring */
+   if (!vi->svring && IS_SMALL_VRING(ring_num)) {
+   netdev_err(vi->dev, "resize tx fail: size is too small..\n");
+   return -EINVAL;
+   }
+
if (running)
virtnet_napi_tx_disable(>napi);
 
-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[RFC PATCH net 2/3] virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2

2023-04-30 Thread Alvaro Karsz
At the moment, if a network device uses vrings with less than
MAX_SKB_FRAGS + 2 entries, the device won't be functional.

The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
evaluate to false, leading to TX timeouts.

This patch introduces a new variable, single_pkt_max_descs, that holds
the max number of descriptors we may need to handle a single packet.

This patch also detects the small vring during probe, blocks some
features that can't be used with small vrings, and fails probe,
leading to a reset and features re-negotiation.

Features that can't be used with small vrings:
GRO features (VIRTIO_NET_F_GUEST_*):
When we use small vrings, we may not have enough entries in the ring to
chain page size buffers and form a 64K buffer.
So we may need to allocate 64k of continuous memory, which may be too
much when the system is stressed.

This patch also fixes the MTU size in small vring cases to be up to the
default one, 1500B.

Signed-off-by: Alvaro Karsz 
---
 drivers/net/virtio_net.c | 149 +--
 1 file changed, 144 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8d8038538fc..b4441d63890 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -103,6 +103,8 @@ struct virtnet_rq_stats {
 #define VIRTNET_SQ_STAT(m) offsetof(struct virtnet_sq_stats, m)
 #define VIRTNET_RQ_STAT(m) offsetof(struct virtnet_rq_stats, m)
 
+#define IS_SMALL_VRING(size)   ((size) < MAX_SKB_FRAGS + 2)
+
 static const struct virtnet_stat_desc virtnet_sq_stats_desc[] = {
{ "packets",VIRTNET_SQ_STAT(packets) },
{ "bytes",  VIRTNET_SQ_STAT(bytes) },
@@ -268,6 +270,12 @@ struct virtnet_info {
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
 
+   /* How many ring descriptors we may need to transmit a single packet */
+   u16 single_pkt_max_descs;
+
+   /* Do we have virtqueues with small vrings? */
+   bool svring;
+
/* CPU hotplug instances for online & dead */
struct hlist_node node;
struct hlist_node node_dead;
@@ -455,6 +463,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
unsigned int copy, hdr_len, hdr_padded_len;
struct page *page_to_free = NULL;
int tailroom, shinfo_size;
+   u16 max_frags = MAX_SKB_FRAGS;
char *p, *hdr_p, *buf;
 
p = page_address(page) + offset;
@@ -520,7 +529,10 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 * tries to receive more than is possible. This is usually
 * the case of a broken device.
 */
-   if (unlikely(len > MAX_SKB_FRAGS * PAGE_SIZE)) {
+   if (unlikely(vi->svring))
+   max_frags = 1;
+
+   if (unlikely(len > max_frags * PAGE_SIZE)) {
net_dbg_ratelimited("%s: too much data\n", skb->dev->name);
dev_kfree_skb(skb);
return NULL;
@@ -612,7 +624,7 @@ static void check_sq_full_and_disable(struct virtnet_info 
*vi,
 * Since most packets only take 1 or 2 ring slots, stopping the queue
 * early means 16 slots are typically wasted.
 */
-   if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
+   if (sq->vq->num_free < vi->single_pkt_max_descs) {
netif_stop_subqueue(dev, qnum);
if (use_napi) {
if (unlikely(!virtqueue_enable_cb_delayed(sq->vq)))
@@ -620,7 +632,7 @@ static void check_sq_full_and_disable(struct virtnet_info 
*vi,
} else if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
/* More just got used, free them then recheck. */
free_old_xmit_skbs(sq, false);
-   if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
+   if (sq->vq->num_free >= vi->single_pkt_max_descs) {
netif_start_subqueue(dev, qnum);
virtqueue_disable_cb(sq->vq);
}
@@ -1108,6 +1120,10 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
*dev,
return 0;
 
if (*num_buf > 1) {
+   /* Small vring - can't be more than 1 buffer */
+   if (unlikely(vi->svring))
+   return -EINVAL;
+
/* If we want to build multi-buffer xdp, we need
 * to specify that the flags of xdp_buff have the
 * XDP_FLAGS_HAS_FRAG bit.
@@ -1828,7 +1844,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
free_old_xmit_skbs(sq, true);
} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
-   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+   if (sq->vq->num_free >= vi->single_pkt_max_descs)
netif_tx_wake_queue(txq);
 
__netif_tx_unlock(txq);
@@ -1919,7 +1935,7 

[RFC PATCH net 1/3] virtio: re-negotiate features if probe fails and features are blocked

2023-04-30 Thread Alvaro Karsz
This patch exports a new virtio core function: virtio_block_feature.
The function should be called during a virtio driver probe.

If a virtio driver blocks features during probe and fails probe, virtio
core will reset the device, try to re-negotiate the new features and
probe again.

Signed-off-by: Alvaro Karsz 
---
 drivers/virtio/virtio.c | 73 ++---
 include/linux/virtio.h  |  3 ++
 2 files changed, 56 insertions(+), 20 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb2..eaad5b6a7a9 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -167,6 +167,13 @@ void virtio_add_status(struct virtio_device *dev, unsigned 
int status)
 }
 EXPORT_SYMBOL_GPL(virtio_add_status);
 
+void virtio_block_feature(struct virtio_device *dev, unsigned int f)
+{
+   BUG_ON(f >= 64);
+   dev->blocked_features |= (1ULL << f);
+}
+EXPORT_SYMBOL_GPL(virtio_block_feature);
+
 /* Do some validation, then set FEATURES_OK */
 static int virtio_features_ok(struct virtio_device *dev)
 {
@@ -234,17 +241,13 @@ void virtio_reset_device(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_reset_device);
 
-static int virtio_dev_probe(struct device *_d)
+static int virtio_negotiate_features(struct virtio_device *dev)
 {
-   int err, i;
-   struct virtio_device *dev = dev_to_virtio(_d);
struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
u64 device_features;
u64 driver_features;
u64 driver_features_legacy;
-
-   /* We have a driver! */
-   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+   int i, ret;
 
/* Figure out what features the device supports. */
device_features = dev->config->get_features(dev);
@@ -279,30 +282,61 @@ static int virtio_dev_probe(struct device *_d)
if (device_features & (1ULL << i))
__virtio_set_bit(dev, i);
 
-   err = dev->config->finalize_features(dev);
-   if (err)
-   goto err;
+   /* Remove blocked features */
+   dev->features &= ~dev->blocked_features;
+
+   ret = dev->config->finalize_features(dev);
+   if (ret)
+   goto exit;
 
if (drv->validate) {
u64 features = dev->features;
 
-   err = drv->validate(dev);
-   if (err)
-   goto err;
+   ret = drv->validate(dev);
+   if (ret)
+   goto exit;
 
/* Did validation change any features? Then write them again. */
if (features != dev->features) {
-   err = dev->config->finalize_features(dev);
-   if (err)
-   goto err;
+   ret = dev->config->finalize_features(dev);
+   if (ret)
+   goto exit;
}
}
 
-   err = virtio_features_ok(dev);
-   if (err)
-   goto err;
+   ret = virtio_features_ok(dev);
+exit:
+   return ret;
+}
+
+static int virtio_dev_probe(struct device *_d)
+{
+   int err;
+   struct virtio_device *dev = dev_to_virtio(_d);
+   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+   u64 blocked_features;
+   bool renegotiate = true;
+
+   /* We have a driver! */
+   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+
+   /* Store blocked features and attempt to negotiate features & probe.
+* If the probe fails, we check if the driver has blocked any new 
features.
+* If it has, we reset the device and try again with the new features.
+*/
+   while (renegotiate) {
+   blocked_features = dev->blocked_features;
+   err = virtio_negotiate_features(dev);
+   if (err)
+   break;
+
+   err = drv->probe(dev);
+   if (err && blocked_features != dev->blocked_features)
+   virtio_reset_device(dev);
+   else
+   renegotiate = false;
+   }
 
-   err = drv->probe(dev);
if (err)
goto err;
 
@@ -319,7 +353,6 @@ static int virtio_dev_probe(struct device *_d)
 err:
virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
return err;
-
 }
 
 static void virtio_dev_remove(struct device *_d)
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e..2de9b2d3ca4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,6 +109,7 @@ int virtqueue_resize(struct virtqueue *vq, u32 num,
  * @vringh_config: configuration ops for host vrings.
  * @vqs: the list of virtqueues for this device.
  * @features: the features supported by both driver and device.
+ * @blocked_features: the features blocked by the driver that can't be 
negotiated.
  * @priv: private pointer for the driver's use.
  */
 struct virtio_device {
@@ -124,6 

[RFC PATCH net 0/3] virtio-net: allow usage of small vrings

2023-04-30 Thread Alvaro Karsz
At the moment, if a virtio network device uses vrings with less than
MAX_SKB_FRAGS + 2 entries, the device won't be functional.

The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always
evaluate to false, leading to TX timeouts.

This patchset attempts this fix this bug, and to allow small rings down
to 4 entries.

The first patch introduces a new mechanism in virtio core - it allows to
block features in probe time.

If a virtio drivers blocks features and fails probe, virtio core will
reset the device, re-negotiate the features and probe again.

This is needed since some virtio net features are not supported with
small rings.

This patchset follows a discussion in the mailing list [1].

This fixes only part of the bug, rings with less than 4 entries won't
work.
My intention is to split the effort and fix the RING_SIZE < 4 case in a
follow up patchset.

Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset?

I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed
and split VQs, with rings down to 4 entries, with and without
VIRTIO_NET_F_MRG_RXBUF, with big MTUs.

I would appreciate more testing.
Xuan: I wasn't able to test XDP with my setup, maybe you can help with
that?

[1] 
https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.ka...@solid-run.com/

Alvaro Karsz (3):
  virtio: re-negotiate features if probe fails and features are blocked
  virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2
  virtio-net: block ethtool from converting a ring to a small ring

 drivers/net/virtio_net.c | 161 +--
 drivers/virtio/virtio.c  |  73 +-
 include/linux/virtio.h   |   3 +
 3 files changed, 212 insertions(+), 25 deletions(-)

-- 
2.34.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization