Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members

2022-06-28 Thread Geert Uytterhoeven
Hi Gustavo,

Thanks for your patch!

On Mon, Jun 27, 2022 at 8:04 PM Gustavo A. R. Silva
 wrote:
> There is a regular need in the kernel to provide a way to declare
> having a dynamically sized set of trailing elements in a structure.
> Kernel code should always use “flexible array members”[1] for these
> cases. The older style of one-element or zero-length arrays should
> no longer be used[2].

These rules apply to the kernel, but uapi is not considered part of the
kernel, so different rules apply.  Uapi header files should work with
whatever compiler that can be used for compiling userspace.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/3] vdpa_sim_blk: limit the number of request handled per batch

2022-06-28 Thread Stefano Garzarella
On Tue, Jun 28, 2022 at 6:01 AM Jason Wang  wrote:
>
> On Thu, Jun 23, 2022 at 4:58 PM Stefano Garzarella  
> wrote:
> >
> > On Thu, Jun 23, 2022 at 11:50:22AM +0800, Jason Wang wrote:
> > >On Wed, Jun 22, 2022 at 12:09 AM Stefano Garzarella  
> > >wrote:
> > >>
> > >> Limit the number of requests (4 per queue as for vdpa_sim_net) handled
> > >> in a batch to prevent the worker from using the CPU for too long.
> > >>
> > >> Suggested-by: Eugenio Pérez 
> > >> Signed-off-by: Stefano Garzarella 
> > >> ---
> > >>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 15 ++-
> > >>  1 file changed, 14 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
> > >> b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > >> index a83a5c76f620..ac86478845b6 100644
> > >> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > >> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> > >> @@ -197,6 +197,7 @@ static bool vdpasim_blk_handle_req(struct vdpasim 
> > >> *vdpasim,
> > >>  static void vdpasim_blk_work(struct work_struct *work)
> > >>  {
> > >> struct vdpasim *vdpasim = container_of(work, struct vdpasim, 
> > >> work);
> > >> +   bool reschedule = false;
> > >> int i;
> > >>
> > >> spin_lock(&vdpasim->lock);
> > >> @@ -206,11 +207,15 @@ static void vdpasim_blk_work(struct work_struct 
> > >> *work)
> > >>
> > >> for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> > >> struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
> > >> +   bool vq_work = true;
> > >> +   int reqs = 0;
> > >>
> > >> if (!vq->ready)
> > >> continue;
> > >>
> > >> -   while (vdpasim_blk_handle_req(vdpasim, vq)) {
> > >> +   while (vq_work) {
> > >> +   vq_work = vdpasim_blk_handle_req(vdpasim, vq);
> > >> +
> > >
> > >Is it better to check and exit the loop early here?
> >
> > Maybe, but I'm not sure.
> >
> > In vdpa_sim_net we call vringh_complete_iotlb() and send notification
> > also in the error path,
>
> Looks not?
>
> read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov, &ctrl,
>  sizeof(ctrl));
> if (read != sizeof(ctrl))
> break;
>
> We break the loop.

I was looking at vdpasim_net_work(), but I was confused since it
handles 2 queues.

I'll break the loop as it was before.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] virtio-net: fix the race between refill work and close

2022-06-28 Thread Jason Wang
We try using cancel_delayed_work_sync() to prevent the work from
enabling NAPI. This is insufficient since we don't disable the the
source the scheduling of the refill work. This means an NAPI after
cancel_delayed_work_sync() can schedule the refill work then can
re-enable the NAPI that leads to use-after-free [1].

Since the work can enable NAPI, we can't simply disable NAPI before
calling cancel_delayed_work_sync(). So fix this by introducing a
dedicated boolean to control whether or not the work could be
scheduled from NAPI.

[1]
==
BUG: KASAN: use-after-free in refill_work+0x43/0xd4
Read of size 2 at addr 88810562c92e by task kworker/2:1/42

CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: events refill_work
Call Trace:
 
 dump_stack_lvl+0x34/0x44
 print_report.cold+0xbb/0x6ac
 ? _printk+0xad/0xde
 ? refill_work+0x43/0xd4
 kasan_report+0xa8/0x130
 ? refill_work+0x43/0xd4
 refill_work+0x43/0xd4
 process_one_work+0x43d/0x780
 worker_thread+0x2a0/0x6f0
 ? process_one_work+0x780/0x780
 kthread+0x167/0x1a0
 ? kthread_exit+0x50/0x50
 ret_from_fork+0x22/0x30
 
...

Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop")
Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index db05b5e930be..21bf1e5c81ef 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -251,6 +251,12 @@ struct virtnet_info {
/* Does the affinity hint is set for virtqueues? */
bool affinity_hint_set;
 
+   /* Is refill work enabled? */
+   bool refill_work_enabled;
+
+   /* The lock to synchronize the access to refill_work_enabled */
+   spinlock_t refill_lock;
+
/* CPU hotplug instances for online & dead */
struct hlist_node node;
struct hlist_node node_dead;
@@ -348,6 +354,20 @@ static struct page *get_a_page(struct receive_queue *rq, 
gfp_t gfp_mask)
return p;
 }
 
+static void enable_refill_work(struct virtnet_info *vi)
+{
+   spin_lock(&vi->refill_lock);
+   vi->refill_work_enabled = true;
+   spin_unlock(&vi->refill_lock);
+}
+
+static void disable_refill_work(struct virtnet_info *vi)
+{
+   spin_lock(&vi->refill_lock);
+   vi->refill_work_enabled = false;
+   spin_unlock(&vi->refill_lock);
+}
+
 static void virtqueue_napi_schedule(struct napi_struct *napi,
struct virtqueue *vq)
 {
@@ -1527,8 +1547,12 @@ static int virtnet_receive(struct receive_queue *rq, int 
budget,
}
 
if (rq->vq->num_free > min((unsigned int)budget, 
virtqueue_get_vring_size(rq->vq)) / 2) {
-   if (!try_fill_recv(vi, rq, GFP_ATOMIC))
-   schedule_delayed_work(&vi->refill, 0);
+   if (!try_fill_recv(vi, rq, GFP_ATOMIC)) {
+   spin_lock(&vi->refill_lock);
+   if (vi->refill_work_enabled)
+   schedule_delayed_work(&vi->refill, 0);
+   spin_unlock(&vi->refill_lock);
+   }
}
 
u64_stats_update_begin(&rq->stats.syncp);
@@ -1651,6 +1675,8 @@ static int virtnet_open(struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int i, err;
 
+   enable_refill_work(vi);
+
for (i = 0; i < vi->max_queue_pairs; i++) {
if (i < vi->curr_queue_pairs)
/* Make sure we have some buffers: if oom use wq. */
@@ -2033,6 +2059,8 @@ static int virtnet_close(struct net_device *dev)
struct virtnet_info *vi = netdev_priv(dev);
int i;
 
+   /* Make sure NAPI doesn't schedule refill work */
+   disable_refill_work(vi);
/* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(&vi->refill);
 
@@ -2776,6 +2804,9 @@ static void virtnet_freeze_down(struct virtio_device 
*vdev)
netif_tx_lock_bh(vi->dev);
netif_device_detach(vi->dev);
netif_tx_unlock_bh(vi->dev);
+   /* Make sure NAPI doesn't schedule refill work */
+   disable_refill_work(vi);
+   /* Make sure refill_work doesn't re-enable napi! */
cancel_delayed_work_sync(&vi->refill);
 
if (netif_running(vi->dev)) {
@@ -2799,6 +2830,8 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 
virtio_device_ready(vdev);
 
+   enable_refill_work(vi);
+
if (netif_running(vi->dev)) {
for (i = 0; i < vi->curr_queue_pairs; i++)
if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
@@ -3548,6 +3581,7 @@ static int virtnet_probe(struct virtio_device *vdev)
vdev->priv = vi;
 
INIT_WORK(&vi->config_work, vi

Re: [PATCH -next] vdpa/mlx5: Use eth_zero_addr() to assign zero address

2022-06-28 Thread Michael S. Tsirkin
On Tue, Jun 28, 2022 at 09:44:18AM +, Xu Qiang wrote:
> Using eth_zero_addr() to assign zero address insetad of

typo

> memset().
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Qiang 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index e85c1d71f4ed..f738c78ef446 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1457,8 +1457,8 @@ static int mlx5_vdpa_add_mac_vlan_rules(struct 
> mlx5_vdpa_net *ndev, u8 *mac,
>  
>   *ucast = rule;
>  
> - memset(dmac_c, 0, ETH_ALEN);
> - memset(dmac_v, 0, ETH_ALEN);
> + eth_zero_addr(dmac_c);
> + eth_zero_addr(dmac_v);
>   dmac_c[0] = 1;
>   dmac_v[0] = 1;
>   rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1);
> -- 
> 2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Robin Murphy

On 2022-05-27 00:50, Dmitry Osipenko wrote:

Hello,

This patchset introduces memory shrinker for the VirtIO-GPU DRM driver
and adds memory purging and eviction support to VirtIO-GPU driver.

The new dma-buf locking convention is introduced here as well.

During OOM, the shrinker will release BOs that are marked as "not needed"
by userspace using the new madvise IOCTL, it will also evict idling BOs
to SWAP. The userspace in this case is the Mesa VirGL driver, it will mark
the cached BOs as "not needed", allowing kernel driver to release memory
of the cached shmem BOs on lowmem situations, preventing OOM kills.

The Panfrost driver is switched to use generic memory shrinker.


I think we still have some outstanding issues here - Alyssa reported 
some weirdness yesterday, so I just tried provoking a low-memory 
condition locally with this series applied and a few debug options 
enabled, and the results as below were... interesting.


Thanks,
Robin.

->8-
[   68.295951] ==
[   68.295956] WARNING: possible circular locking dependency detected
[   68.295963] 5.19.0-rc3+ #400 Not tainted
[   68.295972] --
[   68.295977] cc1/295 is trying to acquire lock:
[   68.295986] 08d7f1a0 
(reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198

[   68.296036]
[   68.296036] but task is already holding lock:
[   68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: 
__alloc_pages_slowpath.constprop.0+0x4d8/0x1470

[   68.296080]
[   68.296080] which lock already depends on the new lock.
[   68.296080]
[   68.296085]
[   68.296085] the existing dependency chain (in reverse order) is:
[   68.296090]
[   68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}:
[   68.296111]fs_reclaim_acquire+0xb8/0x150
[   68.296130]dma_resv_lockdep+0x298/0x3fc
[   68.296148]do_one_initcall+0xe4/0x5f8
[   68.296163]kernel_init_freeable+0x414/0x49c
[   68.296180]kernel_init+0x2c/0x148
[   68.296195]ret_from_fork+0x10/0x20
[   68.296207]
[   68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
[   68.296229]__lock_acquire+0x1724/0x2398
[   68.296246]lock_acquire+0x218/0x5b0
[   68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378
[   68.296277]ww_mutex_lock+0x7c/0x4d8
[   68.296291]drm_gem_shmem_free+0x7c/0x198
[   68.296304]panfrost_gem_free_object+0x118/0x138
[   68.296318]drm_gem_object_free+0x40/0x68
[   68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8
[   68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170
[   68.296368]do_shrink_slab+0x220/0x808
[   68.296381]shrink_slab+0x11c/0x408
[   68.296392]shrink_node+0x6ac/0xb90
[   68.296403]do_try_to_free_pages+0x1dc/0x8d0
[   68.296416]try_to_free_pages+0x1ec/0x5b0
[   68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470
[   68.296444]__alloc_pages+0x4e0/0x5b8
[   68.296455]__folio_alloc+0x24/0x60
[   68.296467]vma_alloc_folio+0xb8/0x2f8
[   68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68
[   68.296498]__handle_mm_fault+0x918/0x12a8
[   68.296513]handle_mm_fault+0x130/0x300
[   68.296527]do_page_fault+0x1d0/0x568
[   68.296539]do_translation_fault+0xa0/0xb8
[   68.296551]do_mem_abort+0x68/0xf8
[   68.296562]el0_da+0x74/0x100
[   68.296572]el0t_64_sync_handler+0x68/0xc0
[   68.296585]el0t_64_sync+0x18c/0x190
[   68.296596]
[   68.296596] other info that might help us debug this:
[   68.296596]
[   68.296601]  Possible unsafe locking scenario:
[   68.296601]
[   68.296604]CPU0CPU1
[   68.296608]
[   68.296612]   lock(fs_reclaim);
[   68.296622] 
lock(reservation_ww_class_mutex);

[   68.296633]lock(fs_reclaim);
[   68.296644]   lock(reservation_ww_class_mutex);
[   68.296654]
[   68.296654]  *** DEADLOCK ***
[   68.296654]
[   68.296658] 3 locks held by cc1/295:
[   68.29]  #0: 0616e898 (&mm->mmap_lock){}-{3:3}, at: 
do_page_fault+0x144/0x568
[   68.296702]  #1: 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: 
__alloc_pages_slowpath.constprop.0+0x4d8/0x1470
[   68.296740]  #2: 8c1215b0 (shrinker_rwsem){}-{3:3}, at: 
shrink_slab+0xc0/0x408

[   68.296774]
[   68.296774] stack backtrace:
[   68.296780] CPU: 2 PID: 295 Comm: cc1 Not tainted 5.19.0-rc3+ #400
[   68.296794] Hardware name: ARM LTD ARM Juno Development Platform/ARM 
Juno Development Platform, BIOS EDK II Sep  3 2019

[   68.296803] Call trace:
[   68.296808]  dump_backtrace+0x1e4/0x1f0
[   68.296821]  show_stack+0x20/0x70
[   68.296832]  dump_stack_lvl+0x8c/0xb8
[   68.296849]  dump_stack+0x1c/0x38
[   68.296864]  print_circular_bug.isra.0+0x284/0x378
[   68.296881]  check_noncircular+0x1d8/0x1

Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members

2022-06-28 Thread Jason Gunthorpe
On Tue, Jun 28, 2022 at 04:21:29AM +0200, Gustavo A. R. Silva wrote:

> > > Though maybe we could just switch off 
> > > -Wgnu-variable-sized-type-not-at-end  during configuration ?

> We need to think in a different strategy.

I think we will need to switch off the warning in userspace - this is
doable for rdma-core.

On the other hand, if the goal is to enable the array size check
compiler warning I would suggest focusing only on those structs that
actually hit that warning in the kernel. IIRC infiniband doesn't
trigger it because it just pointer casts the flex array to some other
struct.

It isn't actually an array it is a placeholder for a trailing
structure, so it is never indexed.

This is also why we hit the warning because the convient way for
userspace to compose the message is to squash the header and trailer
structs together in a super struct on the stack, then invoke the
ioctl.

Jason 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 1/4] vdpa: Add suspend operation

2022-06-28 Thread Stefano Garzarella

On Thu, Jun 23, 2022 at 06:07:35PM +0200, Eugenio Pérez wrote:

This operation is optional: It it's not implemented, backend feature bit
will not be exposed.

Signed-off-by: Eugenio Pérez 
---
include/linux/vdpa.h | 4 
1 file changed, 4 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 7b4a13d3bd91..d282f464d2f1 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -218,6 +218,9 @@ struct vdpa_map_file {
 * @reset:  Reset device
 *  @vdev: vdpa device
 *  Returns integer: success (0) or error (< 0)
+ * @suspend:   Suspend or resume the device (optional)

   ^
IIUC we removed the resume operation (that should be done with reset),
so should we update this documentation?

Thanks,
Stefano


+ * @vdev: vdpa device
+ * Returns integer: success (0) or error (< 0)
 * @get_config_size:Get the size of the configuration space includes
 *  fields that are conditional on feature bits.
 *  @vdev: vdpa device
@@ -319,6 +322,7 @@ struct vdpa_config_ops {
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
int (*reset)(struct vdpa_device *vdev);
+   int (*suspend)(struct vdpa_device *vdev);
size_t (*get_config_size)(struct vdpa_device *vdev);
void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
   void *buf, unsigned int len);
--
2.31.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 2/4] vhost-vdpa: introduce SUSPEND backend feature bit

2022-06-28 Thread Stefano Garzarella

On Thu, Jun 23, 2022 at 06:07:36PM +0200, Eugenio Pérez wrote:

Userland knows if it can suspend the device or not by checking this feature
bit.

It's only offered if the vdpa driver backend implements the suspend()
operation callback, and to offer it or userland to ack it if the backend
does not offer that callback is an error.


Should we document in the previous patch that the callback must be 
implemented only if the drive/device support it?


The rest LGTM although I have a doubt whether it is better to move this 
patch after patch 3, or merge it with patch 3, for bisectability since 
we enable the feature here but if the userspace calls ioctl() with 
VHOST_VDPA_SUSPEND we reply back that it is not supported.


Thanks,
Stefano



Signed-off-by: Eugenio Pérez 
---
drivers/vhost/vdpa.c | 16 +++-
include/uapi/linux/vhost_types.h |  2 ++
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 23dcbfdfa13b..3d636e192061 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -347,6 +347,14 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v,
return 0;
}

+static bool vhost_vdpa_can_suspend(const struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   return ops->suspend;
+}
+
static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep)
{
struct vdpa_device *vdpa = v->vdpa;
@@ -577,7 +585,11 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
if (cmd == VHOST_SET_BACKEND_FEATURES) {
if (copy_from_user(&features, featurep, sizeof(features)))
return -EFAULT;
-   if (features & ~VHOST_VDPA_BACKEND_FEATURES)
+   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
+BIT_ULL(VHOST_BACKEND_F_SUSPEND)))
+   return -EOPNOTSUPP;
+   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
+!vhost_vdpa_can_suspend(v))
return -EOPNOTSUPP;
vhost_set_backend_features(&v->vdev, features);
return 0;
@@ -628,6 +640,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
break;
case VHOST_GET_BACKEND_FEATURES:
features = VHOST_VDPA_BACKEND_FEATURES;
+   if (vhost_vdpa_can_suspend(v))
+   features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND);
if (copy_to_user(featurep, &features, sizeof(features)))
r = -EFAULT;
break;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 634cee485abb..1bdd6e363f4c 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -161,5 +161,7 @@ struct vhost_vdpa_iova_range {
 * message
 */
#define VHOST_BACKEND_F_IOTLB_ASID  0x3
+/* Device can be suspended */
+#define VHOST_BACKEND_F_SUSPEND  0x4

#endif
--
2.31.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 3/4] vhost-vdpa: uAPI to suspend the device

2022-06-28 Thread Stefano Garzarella
On Thu, Jun 23, 2022 at 06:07:37PM +0200, Eugenio Pérez wrote:
>The ioctl adds support for suspending the device from userspace.
>
>This is a must before getting virtqueue indexes (base) for live migration,
>since the device could modify them after userland gets them. There are
>individual ways to perform that action for some devices
>(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
>way to perform it for any vhost device (and, in particular, vhost-vdpa).
>
>After a successful return of the ioctl call the device must not process
>more virtqueue descriptors. The device can answer to read or writes of
>config fields as if it were not suspended. In particular, writing to
>"queue_enable" with a value of 1 will not make the device start
>processing buffers of the virtqueue.
>
>Signed-off-by: Eugenio Pérez 
>---
> drivers/vhost/vdpa.c   | 19 +++
> include/uapi/linux/vhost.h | 14 ++
> 2 files changed, 33 insertions(+)
>
>diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>index 3d636e192061..7fa671ac4bdf 100644
>--- a/drivers/vhost/vdpa.c
>+++ b/drivers/vhost/vdpa.c
>@@ -478,6 +478,22 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa 
>*v, u32 __user *argp)
>   return 0;
> }
>
>+/* After a successful return of ioctl the device must not process more
>+ * virtqueue descriptors. The device can answer to read or writes of config
>+ * fields as if it were not suspended. In particular, writing to 
>"queue_enable"
>+ * with a value of 1 will not make the device start processing buffers.
>+ */
>+static long vhost_vdpa_suspend(struct vhost_vdpa *v)
>+{
>+  struct vdpa_device *vdpa = v->vdpa;
>+  const struct vdpa_config_ops *ops = vdpa->config;
>+
>+  if (!ops->suspend)
>+  return -EOPNOTSUPP;
>+
>+  return ops->suspend(vdpa);
>+}
>+
> static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>  void __user *argp)
> {
>@@ -654,6 +670,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>   case VHOST_VDPA_GET_VQS_COUNT:
>   r = vhost_vdpa_get_vqs_count(v, argp);
>   break;
>+  case VHOST_VDPA_SUSPEND:
>+  r = vhost_vdpa_suspend(v);
>+  break;
>   default:
>   r = vhost_dev_ioctl(&v->vdev, cmd, argp);
>   if (r == -ENOIOCTLCMD)
>diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
>index cab645d4a645..6d9f45163155 100644
>--- a/include/uapi/linux/vhost.h
>+++ b/include/uapi/linux/vhost.h
>@@ -171,4 +171,18 @@
> #define VHOST_VDPA_SET_GROUP_ASID _IOW(VHOST_VIRTIO, 0x7C, \
>struct vhost_vring_state)
>
>+/* Suspend or resume a device so it does not process virtqueue requests 
>anymore
>+ *
>+ * After the return of ioctl with suspend != 0, the device must finish any
>+ * pending operations like in flight requests. It must also preserve all the
>+ * necessary state (the virtqueue vring base plus the possible device specific
>+ * states) that is required for restoring in the future. The device must not
>+ * change its configuration after that point.
>+ *
>+ * After the return of ioctl with suspend == 0, the device can continue
>+ * processing buffers as long as typical conditions are met (vq is enabled,
>+ * DRIVER_OK status bit is enabled, etc).
>+ */
>+#define VHOST_VDPA_SUSPEND_IOW(VHOST_VIRTIO, 0x7D, int)
 ^
IIUC we are not using the argument anymore, so this should be changed in
_IO(VHOST_VIRTIO, 0x7D).

And we should update a bit the documentation.

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6] x86/paravirt: useless assignment instructions cause Unixbench full core performance degradation

2022-06-28 Thread Waiman Long

On 6/28/22 08:54, Guo Hui wrote:

The instructions assigned to the vcpu_is_preempted function parameter
in the X86 architecture physical machine are redundant instructions,
causing the multi-core performance of Unixbench to drop by about 4% to 5%.
The C function is as follows:
static bool vcpu_is_preempted(long vcpu);

The parameter 'vcpu' in the function osq_lock
that calls the function vcpu_is_preempted is assigned as follows:

The C code is in the function node_cpu:
cpu = node->cpu - 1;

The instructions corresponding to the C code are:
mov 0x14(%rax),%edi
sub $0x1,%edi

The above instructions are unnecessary
in the X86 Native operating environment,
causing high cache-misses and degrading performance.

This patch uses static_key to not execute this instruction
in the Native runtime environment.

The patch effect is as follows two machines,
Unixbench runs with full core score:

1. Machine configuration:
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
CPU core: 40
Memory: 256G
OS Kernel: 5.19-rc3

Before using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0  948326591.2  81261.9
Double-Precision Whetstone   55.0 211986.3  38543.0
Execl Throughput 43.0  43453.2  10105.4
File Copy 1024 bufsize 2000 maxblocks  3960.0 438936.2   1108.4
File Copy 256 bufsize 500 maxblocks1655.0 118197.4714.2
File Copy 4096 bufsize 8000 maxblocks  5800.01534674.7   2646.0
Pipe Throughput   12440.0   46482107.6  37365.0
Pipe-based Context Switching   4000.01915094.2   4787.7
Process Creation126.0  85442.2   6781.1
Shell Scripts (1 concurrent) 42.4  69400.7  16368.1
Shell Scripts (8 concurrent)  6.0   8877.2  14795.3
System Call Overhead  15000.04714906.1   3143.3

System Benchmarks Index Score7923.3

After using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0  947032915.5  81151.1
Double-Precision Whetstone   55.0 211971.2  38540.2
Execl Throughput 43.0  45054.8  10477.9
File Copy 1024 bufsize 2000 maxblocks  3960.0 515024.9   1300.6
File Copy 256 bufsize 500 maxblocks1655.0 146354.6884.3
File Copy 4096 bufsize 8000 maxblocks  5800.01679995.9   2896.5
Pipe Throughput   12440.0   46466394.2  37352.4
Pipe-based Context Switching   4000.01898221.4   4745.6
Process Creation126.0  85653.1   6797.9
Shell Scripts (1 concurrent) 42.4  69437.3  16376.7
Shell Scripts (8 concurrent)  6.0   8898.9  14831.4
System Call Overhead  15000.04658746.7   3105.8

System Benchmarks Index Score8248.8

2. Machine configuration:
Hygon C86 7185 32-core Processor
CPU core: 128
Memory: 256G
OS Kernel: 5.19-rc3

Before using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0 2256644068.3 193371.4
Double-Precision Whetstone   55.0 438969.9  79812.7
Execl Throughput 43.0  10108.6   2350.8
File Copy 1024 bufsize 2000 maxblocks  3960.0 275892.8696.7
File Copy 256 bufsize 500 maxblocks1655.0  72082.7435.5
File Copy 4096 bufsize 8000 maxblocks  5800.0 925043.4   1594.9
Pipe Throughput   12440.0  118905512.5  95583.2
Pipe-based Context Switching   4000.07820945.7  19552.4
Process Creation126.0  31233.3   2478.8
Shell Scripts (1 concurrent) 42.4  49042.8  11566.7
Shell Scripts (8 concurrent)  6.0   6656.0  11093.3
System Call Overhead  15000.06816047.5   4544.0

System Benchmarks Index Score7756.6

After using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0 2252272929.4 192996.8
Double-Precision Whetstone   55.0 451847.2  82154.0
Execl Throughput 43.0  10595.1   2464.0
File Copy 1024 bufsize 2000 maxblocks  3960.0 

Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers

2022-06-28 Thread Rob Clark
On Tue, Jun 28, 2022 at 5:51 AM Dmitry Osipenko
 wrote:
>
> On 6/28/22 15:31, Robin Murphy wrote:
> > ->8-
> > [   68.295951] ==
> > [   68.295956] WARNING: possible circular locking dependency detected
> > [   68.295963] 5.19.0-rc3+ #400 Not tainted
> > [   68.295972] --
> > [   68.295977] cc1/295 is trying to acquire lock:
> > [   68.295986] 08d7f1a0
> > (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198
> > [   68.296036]
> > [   68.296036] but task is already holding lock:
> > [   68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at:
> > __alloc_pages_slowpath.constprop.0+0x4d8/0x1470
> > [   68.296080]
> > [   68.296080] which lock already depends on the new lock.
> > [   68.296080]
> > [   68.296085]
> > [   68.296085] the existing dependency chain (in reverse order) is:
> > [   68.296090]
> > [   68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}:
> > [   68.296111]fs_reclaim_acquire+0xb8/0x150
> > [   68.296130]dma_resv_lockdep+0x298/0x3fc
> > [   68.296148]do_one_initcall+0xe4/0x5f8
> > [   68.296163]kernel_init_freeable+0x414/0x49c
> > [   68.296180]kernel_init+0x2c/0x148
> > [   68.296195]ret_from_fork+0x10/0x20
> > [   68.296207]
> > [   68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
> > [   68.296229]__lock_acquire+0x1724/0x2398
> > [   68.296246]lock_acquire+0x218/0x5b0
> > [   68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378
> > [   68.296277]ww_mutex_lock+0x7c/0x4d8
> > [   68.296291]drm_gem_shmem_free+0x7c/0x198
> > [   68.296304]panfrost_gem_free_object+0x118/0x138
> > [   68.296318]drm_gem_object_free+0x40/0x68
> > [   68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8
> > [   68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170
> > [   68.296368]do_shrink_slab+0x220/0x808
> > [   68.296381]shrink_slab+0x11c/0x408
> > [   68.296392]shrink_node+0x6ac/0xb90
> > [   68.296403]do_try_to_free_pages+0x1dc/0x8d0
> > [   68.296416]try_to_free_pages+0x1ec/0x5b0
> > [   68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470
> > [   68.296444]__alloc_pages+0x4e0/0x5b8
> > [   68.296455]__folio_alloc+0x24/0x60
> > [   68.296467]vma_alloc_folio+0xb8/0x2f8
> > [   68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68
> > [   68.296498]__handle_mm_fault+0x918/0x12a8
> > [   68.296513]handle_mm_fault+0x130/0x300
> > [   68.296527]do_page_fault+0x1d0/0x568
> > [   68.296539]do_translation_fault+0xa0/0xb8
> > [   68.296551]do_mem_abort+0x68/0xf8
> > [   68.296562]el0_da+0x74/0x100
> > [   68.296572]el0t_64_sync_handler+0x68/0xc0
> > [   68.296585]el0t_64_sync+0x18c/0x190
> > [   68.296596]
> > [   68.296596] other info that might help us debug this:
> > [   68.296596]
> > [   68.296601]  Possible unsafe locking scenario:
> > [   68.296601]
> > [   68.296604]CPU0CPU1
> > [   68.296608]
> > [   68.296612]   lock(fs_reclaim);
> > [   68.296622] lock(reservation_ww_class_mutex);
> > [   68.296633]lock(fs_reclaim);
> > [   68.296644]   lock(reservation_ww_class_mutex);
> > [   68.296654]
> > [   68.296654]  *** DEADLOCK ***
>
> This splat could be ignored for now. I'm aware about it, although
> haven't looked closely at how to fix it since it's a kind of a lockdep
> misreporting.

The lockdep splat could be fixed with something similar to what I've
done in msm, ie. basically just not acquire the lock in the finalizer:

https://patchwork.freedesktop.org/patch/489364/

There is one gotcha to watch for, as danvet pointed out
(scan_objects() could still see the obj in the LRU before the
finalizer removes it), but if scan_objects() does the
kref_get_unless_zero() trick, it is safe.

BR,
-R
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7] x86/paravirt: useless assignment instructions cause Unixbench full core performance degradation

2022-06-28 Thread Waiman Long

On 6/28/22 12:12, Guo Hui wrote:

The instructions assigned to the vcpu_is_preempted function parameter
in the X86 architecture physical machine are redundant instructions,
causing the multi-core performance of Unixbench to drop by about 4% to 5%.
The C function is as follows:
static bool vcpu_is_preempted(long vcpu);

The parameter 'vcpu' in the function osq_lock
that calls the function vcpu_is_preempted is assigned as follows:

The C code is in the function node_cpu:
cpu = node->cpu - 1;

The instructions corresponding to the C code are:
mov 0x14(%rax),%edi
sub $0x1,%edi

The above instructions are unnecessary
in the X86 Native operating environment,
causing high cache-misses and degrading performance.

This patch uses static_key to not execute this instruction
in the Native runtime environment.

The patch effect is as follows two machines,
Unixbench runs with full core score:

1. Machine configuration:
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
CPU core: 40
Memory: 256G
OS Kernel: 5.19-rc3

Before using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0  948326591.2  81261.9
Double-Precision Whetstone   55.0 211986.3  38543.0
Execl Throughput 43.0  43453.2  10105.4
File Copy 1024 bufsize 2000 maxblocks  3960.0 438936.2   1108.4
File Copy 256 bufsize 500 maxblocks1655.0 118197.4714.2
File Copy 4096 bufsize 8000 maxblocks  5800.01534674.7   2646.0
Pipe Throughput   12440.0   46482107.6  37365.0
Pipe-based Context Switching   4000.01915094.2   4787.7
Process Creation126.0  85442.2   6781.1
Shell Scripts (1 concurrent) 42.4  69400.7  16368.1
Shell Scripts (8 concurrent)  6.0   8877.2  14795.3
System Call Overhead  15000.04714906.1   3143.3

System Benchmarks Index Score7923.3

After using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0  947032915.5  81151.1
Double-Precision Whetstone   55.0 211971.2  38540.2
Execl Throughput 43.0  45054.8  10477.9
File Copy 1024 bufsize 2000 maxblocks  3960.0 515024.9   1300.6
File Copy 256 bufsize 500 maxblocks1655.0 146354.6884.3
File Copy 4096 bufsize 8000 maxblocks  5800.01679995.9   2896.5
Pipe Throughput   12440.0   46466394.2  37352.4
Pipe-based Context Switching   4000.01898221.4   4745.6
Process Creation126.0  85653.1   6797.9
Shell Scripts (1 concurrent) 42.4  69437.3  16376.7
Shell Scripts (8 concurrent)  6.0   8898.9  14831.4
System Call Overhead  15000.04658746.7   3105.8

System Benchmarks Index Score8248.8

2. Machine configuration:
Hygon C86 7185 32-core Processor
CPU core: 128
Memory: 256G
OS Kernel: 5.19-rc3

Before using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0 2256644068.3 193371.4
Double-Precision Whetstone   55.0 438969.9  79812.7
Execl Throughput 43.0  10108.6   2350.8
File Copy 1024 bufsize 2000 maxblocks  3960.0 275892.8696.7
File Copy 256 bufsize 500 maxblocks1655.0  72082.7435.5
File Copy 4096 bufsize 8000 maxblocks  5800.0 925043.4   1594.9
Pipe Throughput   12440.0  118905512.5  95583.2
Pipe-based Context Switching   4000.07820945.7  19552.4
Process Creation126.0  31233.3   2478.8
Shell Scripts (1 concurrent) 42.4  49042.8  11566.7
Shell Scripts (8 concurrent)  6.0   6656.0  11093.3
System Call Overhead  15000.06816047.5   4544.0

System Benchmarks Index Score7756.6

After using the patch:
System Benchmarks Index Values   BASELINE   RESULTINDEX
Dhrystone 2 using register variables 116700.0 2252272929.4 192996.8
Double-Precision Whetstone   55.0 451847.2  82154.0
Execl Throughput 43.0  10595.1   2464.0
File Copy 1024 bufsize 2000 maxblocks  3960.0 

Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members

2022-06-28 Thread Kees Cook
On Mon, Jun 27, 2022 at 09:40:52PM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 27, 2022 at 08:27:37PM +0200, Daniel Borkmann wrote:
> > [...]
> > Fyi, this breaks BPF CI:
> > 
> > https://github.com/kernel-patches/bpf/runs/7078719372?check_suite_focus=true
> > 
> >   [...]
> >   progs/map_ptr_kern.c:314:26: error: field 'trie_key' with variable sized 
> > type 'struct bpf_lpm_trie_key' not at the end of a struct or class is a GNU 
> > extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
> >   struct bpf_lpm_trie_key trie_key;
> >   ^

The issue here seems to be a collision between "unknown array size"
and known sizes:

struct bpf_lpm_trie_key {
__u32   prefixlen;  /* up to 32 for AF_INET, 128 for AF_INET6 */
__u8data[0];/* Arbitrary size */
};

struct lpm_key {
struct bpf_lpm_trie_key trie_key;
__u32 data;
};

This is treating trie_key as a header, which it's not: it's a complete
structure. :)

Perhaps:

struct lpm_key {
__u32 prefixlen;
__u32 data;
};

I don't see anything else trying to include bpf_lpm_trie_key.

> 
> This will break the rdma-core userspace as well, with a similar
> error:
> 
> /usr/bin/clang-13 -DVERBS_DEBUG -Dibverbs_EXPORTS -Iinclude 
> -I/usr/include/libnl3 -I/usr/include/drm -g -O2 -fdebug-prefix-map=/__w/1/s=. 
> -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
> -D_FORTIFY_SOURCE=2 -Wmissing-prototypes -Wmissing-declarations 
> -Wwrite-strings -Wformat=2 -Wcast-function-type -Wformat-nonliteral 
> -Wdate-time -Wnested-externs -Wshadow -Wstrict-prototypes 
> -Wold-style-definition -Werror -Wredundant-decls -g -fPIC   -std=gnu11 -MD 
> -MT libibverbs/CMakeFiles/ibverbs.dir/cmd_flow.c.o -MF 
> libibverbs/CMakeFiles/ibverbs.dir/cmd_flow.c.o.d -o 
> libibverbs/CMakeFiles/ibverbs.dir/cmd_flow.c.o   -c ../libibverbs/cmd_flow.c
> In file included from ../libibverbs/cmd_flow.c:33:
> In file included from include/infiniband/cmd_write.h:36:
> In file included from include/infiniband/cmd_ioctl.h:41:
> In file included from include/infiniband/verbs.h:48:
> In file included from include/infiniband/verbs_api.h:66:
> In file included from include/infiniband/ib_user_ioctl_verbs.h:38:
> include/rdma/ib_user_verbs.h:436:34: error: field 'base' with variable sized 
> type 'struct ib_uverbs_create_cq_resp' not at the end of a struct or class is 
> a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
> struct ib_uverbs_create_cq_resp base;
> ^
> include/rdma/ib_user_verbs.h:644:34: error: field 'base' with variable sized 
> type 'struct ib_uverbs_create_qp_resp' not at the end of a struct or class is 
> a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
> struct ib_uverbs_create_qp_resp base;

This looks very similar, a struct of unknown size is being treated as a
header struct:

struct ib_uverbs_create_cq_resp {
__u32 cq_handle;
__u32 cqe;
__aligned_u64 driver_data[0];
};

struct ib_uverbs_ex_create_cq_resp {
struct ib_uverbs_create_cq_resp base;
__u32 comp_mask;
__u32 response_length;
};

And it only gets used here:

DECLARE_UVERBS_WRITE(IB_USER_VERBS_CMD_CREATE_CQ,
 ib_uverbs_create_cq,
 UAPI_DEF_WRITE_UDATA_IO(
 struct ib_uverbs_create_cq,
 struct ib_uverbs_create_cq_resp),
 ^^^
 UAPI_DEF_METHOD_NEEDS_FN(create_cq)),

which must also be assuming it's a header. So probably better to just
drop the driver_data field? I don't see anything using it (that I can
find) besides as a sanity-check that the field exists and is at the end
of the struct.

-- 
Kees Cook
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members

2022-06-28 Thread Kees Cook
On Tue, Jun 28, 2022 at 09:27:21AM +0200, Geert Uytterhoeven wrote:
> Hi Gustavo,
> 
> Thanks for your patch!
> 
> On Mon, Jun 27, 2022 at 8:04 PM Gustavo A. R. Silva
>  wrote:
> > There is a regular need in the kernel to provide a way to declare
> > having a dynamically sized set of trailing elements in a structure.
> > Kernel code should always use “flexible array members”[1] for these
> > cases. The older style of one-element or zero-length arrays should
> > no longer be used[2].
> 
> These rules apply to the kernel, but uapi is not considered part of the
> kernel, so different rules apply.  Uapi header files should work with
> whatever compiler that can be used for compiling userspace.

Right, userspace isn't bound by these rules, but the kernel ends up
consuming these structures, so we need to fix them. The [0] -> []
changes (when they are not erroneously being used within other
structures) is valid for all compilers. Flexible arrays are C99; it's
been 23 years. :)

But, yes, where we DO break stuff we need to workaround it, etc.

-- 
Kees Cook
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members

2022-06-28 Thread Jason Gunthorpe
On Tue, Jun 28, 2022 at 10:54:58AM -0700, Kees Cook wrote:

 
> which must also be assuming it's a header. So probably better to just
> drop the driver_data field? I don't see anything using it (that I can
> find) besides as a sanity-check that the field exists and is at the end
> of the struct.

The field is guaranteeing alignment of the following structure. IIRC
there are a few cases that we don't have a u64 already to force this.

Jason
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 -next] vdpa/mlx5: Use eth_zero_addr() to assign zero address

2022-06-28 Thread Michael S. Tsirkin
On Tue, Jun 28, 2022 at 12:34:57PM +, Xu Qiang wrote:
> Using eth_zero_addr() to assign zero address instead of memset().
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Xu Qiang 

Acked-by: Michael S. Tsirkin 

> ---
> v2:
> - fix typo in commit log
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index e85c1d71f4ed..f738c78ef446 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -1457,8 +1457,8 @@ static int mlx5_vdpa_add_mac_vlan_rules(struct 
> mlx5_vdpa_net *ndev, u8 *mac,
>  
>   *ucast = rule;
>  
> - memset(dmac_c, 0, ETH_ALEN);
> - memset(dmac_v, 0, ETH_ALEN);
> + eth_zero_addr(dmac_c);
> + eth_zero_addr(dmac_v);
>   dmac_c[0] = 1;
>   dmac_v[0] = 1;
>   rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1);
> -- 
> 2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/3] virtio: support requiring restricted access per device

2022-06-28 Thread Stefano Stabellini
On Wed, 22 Jun 2022, Juergen Gross wrote:
> Instead of an all or nothing approach add support for requiring
> restricted memory access per device.
> 
> Changes in V3:
> - new patches 1 + 2
> - basically complete rework of patch 3
> 
> Juergen Gross (3):
>   virtio: replace restricted mem access flag with callback
>   kernel: remove platform_has() infrastructure
>   xen: don't require virtio with grants for non-PV guests


On the whole series:

Reviewed-by: Stefano Stabellini 


>  MAINTAINERS|  8 
>  arch/arm/xen/enlighten.c   |  4 +++-
>  arch/s390/mm/init.c|  4 ++--
>  arch/x86/mm/mem_encrypt_amd.c  |  4 ++--
>  arch/x86/xen/enlighten_hvm.c   |  4 +++-
>  arch/x86/xen/enlighten_pv.c|  5 -
>  drivers/virtio/Kconfig |  4 
>  drivers/virtio/Makefile|  1 +
>  drivers/virtio/virtio.c|  4 ++--
>  drivers/virtio/virtio_anchor.c | 18 +
>  drivers/xen/Kconfig|  9 +
>  drivers/xen/grant-dma-ops.c| 10 ++
>  include/asm-generic/Kbuild |  1 -
>  include/asm-generic/platform-feature.h |  8 
>  include/linux/platform-feature.h   | 19 --
>  include/linux/virtio_anchor.h  | 19 ++
>  include/xen/xen-ops.h  |  6 ++
>  include/xen/xen.h  |  8 
>  kernel/Makefile|  2 +-
>  kernel/platform-feature.c  | 27 --
>  20 files changed, 84 insertions(+), 81 deletions(-)
>  create mode 100644 drivers/virtio/virtio_anchor.c
>  delete mode 100644 include/asm-generic/platform-feature.h
>  delete mode 100644 include/linux/platform-feature.h
>  create mode 100644 include/linux/virtio_anchor.h
>  delete mode 100644 kernel/platform-feature.c
> 
> -- 
> 2.35.3
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] virtio: disable notification hardening by default

2022-06-28 Thread Jason Wang
On Tue, Jun 28, 2022 at 2:17 PM Jason Wang  wrote:
>
> On Tue, Jun 28, 2022 at 1:00 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jun 28, 2022 at 11:49:12AM +0800, Jason Wang wrote:
> > > > Heh. Yea sure. But things work fine for people. What is the chance
> > > > your review found and fixed all driver bugs?
> > >
> > > I don't/can't audit all bugs but the race between open/close against
> > > ready/reset. It looks to me a good chance to fix them all but if you
> > > think differently, let me know
> > >
> > > > After two attempts
> > > > I don't feel like hoping audit will fix all bugs.
> > >
> > > I've started the auditing and have 15+ patches in the queue. (only
> > > covers bluetooth, console, pmem, virtio-net and caif). Spotting the
> > > issue is not hard but the testing, It would take at least the time of
> > > one release to finalize I guess.
> >
> > Absolutely. So I am looking for a way to implement hardening that does
> > not break existing drivers.
>
> I totally agree with you to seek a way without bothering the drivers.
> Just wonder if this is possbile.
>
> >
> >
> > > >
> > > >
> > > > > >
> > > > > > The reason config was kind of easy is that config interrupt is 
> > > > > > rarely
> > > > > > vital for device function so arbitrarily deferring that does not 
> > > > > > lead to
> > > > > > deadlocks - what you are trying to do with VQ interrupts is
> > > > > > fundamentally different. Things are especially bad if we just drop
> > > > > > an interrupt but deferring can lead to problems too.
> > > > >
> > > > > I'm not sure I see the difference, disable_irq() stuffs also delay the
> > > > > interrupt processing until enable_irq().
> > > >
> > > >
> > > > Absolutely. I am not at all sure disable_irq fixes all problems.
> > > >
> > > > > >
> > > > > > Consider as an example
> > > > > > virtio-net: fix race between ndo_open() and 
> > > > > > virtio_device_ready()
> > > > > > if you just defer vq interrupts you get deadlocks.
> > > > > >
> > > > > >
> > > > >
> > > > > I don't see a deadlock here, maybe you can show more detail on this?
> > > >
> > > > What I mean is this: if we revert the above commit, things still
> > > > work (out of spec, but still). If we revert and defer interrupts until
> > > > device ready then ndo_open that triggers before device ready deadlocks.
> > >
> > > Ok, I guess you meant on a hypervisor that is strictly written with spec.
> >
> > I mean on hypervisor that starts processing queues after getting a kick
> > even without DRIVER_OK.
>
> Oh right.
>
> >
> > > >
> > > >
> > > > > >
> > > > > > So, thinking about all this, how about a simple per vq flag meaning
> > > > > > "this vq was kicked since reset"?
> > > > >
> > > > > And ignore the notification if vq is not kicked? It sounds like the
> > > > > callback needs to be synchronized with the kick.
> > > >
> > > > Note we only need to synchronize it when it changes, which is
> > > > only during initialization and reset.
> > >
> > > Yes.
> > >
> > > >
> > > >
> > > > > >
> > > > > > If driver does not kick then it's not ready to get callbacks, right?
> > > > > >
> > > > > > Sounds quite clean, but we need to think through memory ordering
> > > > > > concerns - I guess it's only when we change the value so
> > > > > > if (!vq->kicked) {
> > > > > > vq->kicked = true;
> > > > > > mb();
> > > > > > }
> > > > > >
> > > > > > will do the trick, right?
> > > > >
> > > > > There's no much difference with the existing approach:
> > > > >
> > > > > 1) your proposal implicitly makes callbacks ready in virtqueue_kick()
> > > > > 2) my proposal explicitly makes callbacks ready via 
> > > > > virtio_device_ready()
> > > > >
> > > > > Both require careful auditing of all the existing drivers to make sure
> > > > > no kick before DRIVER_OK.
> > > >
> > > > Jason, kick before DRIVER_OK is out of spec, sure. But it is unrelated
> > > > to hardening
> > >
> > > Yes but with your proposal, it seems to couple kick with DRIVER_OK 
> > > somehow.
> >
> > I don't see how - my proposal ignores DRIVER_OK issues.
>
> Yes, what I meant is, in your proposal, the first kick after rest is a
> hint that the driver is ok (but actually it could not).
>
> >
> > > > and in absence of config interrupts is generally easily
> > > > fixed just by sticking virtio_device_ready early in initialization.
> > >
> > > So if the kick is done before the subsystem registration, there's
> > > still a window in the middle (assuming we stick virtio_device_ready()
> > > early):
> > >
> > > virtio_device_ready()
> > > virtqueue_kick()
> > > /* the window */
> > > subsystem_registration()
> >
> > Absolutely, however, I do not think we really have many such drivers
> > since this has been known as a wrong thing to do since the beginning.
> > Want to try to find any?
>
> Yes, let me try and update.

This is basically the device that have an RX queue, so I've found the
following drivers:

scmi, mac80211_hwsim, vsock, bt, ball

Re: [PATCH v6 1/4] vdpa: Add suspend operation

2022-06-28 Thread Jason Wang
On Fri, Jun 24, 2022 at 12:07 AM Eugenio Pérez  wrote:
>
> This operation is optional: It it's not implemented, backend feature bit
> will not be exposed.

A question, do we allow suspending a device without DRIVER_OK?

Thanks

>
> Signed-off-by: Eugenio Pérez 
> ---
>  include/linux/vdpa.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 7b4a13d3bd91..d282f464d2f1 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -218,6 +218,9 @@ struct vdpa_map_file {
>   * @reset: Reset device
>   * @vdev: vdpa device
>   * Returns integer: success (0) or error (< 0)
> + * @suspend:   Suspend or resume the device (optional)
> + * @vdev: vdpa device
> + * Returns integer: success (0) or error (< 0)
>   * @get_config_size:   Get the size of the configuration space 
> includes
>   * fields that are conditional on feature bits.
>   * @vdev: vdpa device
> @@ -319,6 +322,7 @@ struct vdpa_config_ops {
> u8 (*get_status)(struct vdpa_device *vdev);
> void (*set_status)(struct vdpa_device *vdev, u8 status);
> int (*reset)(struct vdpa_device *vdev);
> +   int (*suspend)(struct vdpa_device *vdev);
> size_t (*get_config_size)(struct vdpa_device *vdev);
> void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>void *buf, unsigned int len);
> --
> 2.31.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 2/4] vhost-vdpa: introduce SUSPEND backend feature bit

2022-06-28 Thread Jason Wang
On Fri, Jun 24, 2022 at 12:08 AM Eugenio Pérez  wrote:
>
> Userland knows if it can suspend the device or not by checking this feature
> bit.
>
> It's only offered if the vdpa driver backend implements the suspend()
> operation callback, and to offer it or userland to ack it if the backend
> does not offer that callback is an error.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  drivers/vhost/vdpa.c | 16 +++-
>  include/uapi/linux/vhost_types.h |  2 ++
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 23dcbfdfa13b..3d636e192061 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -347,6 +347,14 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v,
> return 0;
>  }
>
> +static bool vhost_vdpa_can_suspend(const struct vhost_vdpa *v)
> +{
> +   struct vdpa_device *vdpa = v->vdpa;
> +   const struct vdpa_config_ops *ops = vdpa->config;
> +
> +   return ops->suspend;
> +}
> +
>  static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user 
> *featurep)
>  {
> struct vdpa_device *vdpa = v->vdpa;
> @@ -577,7 +585,11 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> if (cmd == VHOST_SET_BACKEND_FEATURES) {
> if (copy_from_user(&features, featurep, sizeof(features)))
> return -EFAULT;
> -   if (features & ~VHOST_VDPA_BACKEND_FEATURES)
> +   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> +BIT_ULL(VHOST_BACKEND_F_SUSPEND)))
> +   return -EOPNOTSUPP;
> +   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> +!vhost_vdpa_can_suspend(v))

Do we need to advertise this to the management?

Thanks

> return -EOPNOTSUPP;
> vhost_set_backend_features(&v->vdev, features);
> return 0;
> @@ -628,6 +640,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> break;
> case VHOST_GET_BACKEND_FEATURES:
> features = VHOST_VDPA_BACKEND_FEATURES;
> +   if (vhost_vdpa_can_suspend(v))
> +   features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND);
> if (copy_to_user(featurep, &features, sizeof(features)))
> r = -EFAULT;
> break;
> diff --git a/include/uapi/linux/vhost_types.h 
> b/include/uapi/linux/vhost_types.h
> index 634cee485abb..1bdd6e363f4c 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -161,5 +161,7 @@ struct vhost_vdpa_iova_range {
>   * message
>   */
>  #define VHOST_BACKEND_F_IOTLB_ASID  0x3
> +/* Device can be suspended */
> +#define VHOST_BACKEND_F_SUSPEND  0x4
>
>  #endif
> --
> 2.31.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 3/4] vhost-vdpa: uAPI to suspend the device

2022-06-28 Thread Jason Wang
On Fri, Jun 24, 2022 at 12:08 AM Eugenio Pérez  wrote:
>
> The ioctl adds support for suspending the device from userspace.
>
> This is a must before getting virtqueue indexes (base) for live migration,
> since the device could modify them after userland gets them. There are
> individual ways to perform that action for some devices
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
>
> After a successful return of the ioctl call the device must not process
> more virtqueue descriptors. The device can answer to read or writes of
> config fields as if it were not suspended. In particular, writing to
> "queue_enable" with a value of 1 will not make the device start
> processing buffers of the virtqueue.
>
> Signed-off-by: Eugenio Pérez 
> ---
>  drivers/vhost/vdpa.c   | 19 +++
>  include/uapi/linux/vhost.h | 14 ++
>  2 files changed, 33 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 3d636e192061..7fa671ac4bdf 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -478,6 +478,22 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa 
> *v, u32 __user *argp)
> return 0;
>  }
>
> +/* After a successful return of ioctl the device must not process more
> + * virtqueue descriptors. The device can answer to read or writes of config
> + * fields as if it were not suspended. In particular, writing to 
> "queue_enable"
> + * with a value of 1 will not make the device start processing buffers.
> + */
> +static long vhost_vdpa_suspend(struct vhost_vdpa *v)
> +{
> +   struct vdpa_device *vdpa = v->vdpa;
> +   const struct vdpa_config_ops *ops = vdpa->config;
> +
> +   if (!ops->suspend)
> +   return -EOPNOTSUPP;
> +
> +   return ops->suspend(vdpa);
> +}
> +
>  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>void __user *argp)
>  {
> @@ -654,6 +670,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> case VHOST_VDPA_GET_VQS_COUNT:
> r = vhost_vdpa_get_vqs_count(v, argp);
> break;
> +   case VHOST_VDPA_SUSPEND:
> +   r = vhost_vdpa_suspend(v);
> +   break;
> default:
> r = vhost_dev_ioctl(&v->vdev, cmd, argp);
> if (r == -ENOIOCTLCMD)
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index cab645d4a645..6d9f45163155 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -171,4 +171,18 @@
>  #define VHOST_VDPA_SET_GROUP_ASID  _IOW(VHOST_VIRTIO, 0x7C, \
>  struct vhost_vring_state)
>
> +/* Suspend or resume a device so it does not process virtqueue requests 
> anymore
> + *
> + * After the return of ioctl with suspend != 0, the device must finish any
> + * pending operations like in flight requests.

I'm not sure we should mandate the flush here. This probably blocks us
from adding inflight descriptor reporting in the future.

Thanks

It must also preserve all the
> + * necessary state (the virtqueue vring base plus the possible device 
> specific
> + * states) that is required for restoring in the future. The device must not
> + * change its configuration after that point.
> + *
> + * After the return of ioctl with suspend == 0, the device can continue
> + * processing buffers as long as typical conditions are met (vq is enabled,
> + * DRIVER_OK status bit is enabled, etc).
> + */
> +#define VHOST_VDPA_SUSPEND _IOW(VHOST_VIRTIO, 0x7D, int)
> +
>  #endif
> --
> 2.31.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 4/4] vdpa_sim: Implement suspend vdpa op

2022-06-28 Thread Jason Wang
On Fri, Jun 24, 2022 at 12:08 AM Eugenio Pérez  wrote:
>
> Implement suspend operation for vdpa_sim devices, so vhost-vdpa will
> offer that backend feature and userspace can effectively suspend the
> device.
>
> This is a must before get virtqueue indexes (base) for live migration,
> since the device could modify them after userland gets them. There are
> individual ways to perform that action for some devices
> (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> way to perform it for any vhost device (and, in particular, vhost-vdpa).
>
> Reviewed-by: Stefano Garzarella 
> Signed-off-by: Eugenio Pérez 
> ---
>  drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 +
>  drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
>  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
>  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
>  4 files changed, 28 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 0f2865899647..213883487f9b 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -107,6 +107,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
> for (i = 0; i < vdpasim->dev_attr.nas; i++)
> vhost_iotlb_reset(&vdpasim->iommu[i]);
>
> +   vdpasim->running = true;
> spin_unlock(&vdpasim->iommu_lock);
>
> vdpasim->features = 0;
> @@ -505,6 +506,24 @@ static int vdpasim_reset(struct vdpa_device *vdpa)
> return 0;
>  }
>
> +static int vdpasim_suspend(struct vdpa_device *vdpa)
> +{
> +   struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +   int i;
> +
> +   spin_lock(&vdpasim->lock);
> +   vdpasim->running = false;
> +   if (vdpasim->running) {
> +   /* Check for missed buffers */
> +   for (i = 0; i < vdpasim->dev_attr.nvqs; ++i)
> +   vdpasim_kick_vq(vdpa, i);

This seems only valid if we allow resuming?

Thanks

> +
> +   }
> +   spin_unlock(&vdpasim->lock);
> +
> +   return 0;
> +}
> +
>  static size_t vdpasim_get_config_size(struct vdpa_device *vdpa)
>  {
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> @@ -694,6 +713,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
> .get_status = vdpasim_get_status,
> .set_status = vdpasim_set_status,
> .reset  = vdpasim_reset,
> +   .suspend= vdpasim_suspend,
> .get_config_size= vdpasim_get_config_size,
> .get_config = vdpasim_get_config,
> .set_config = vdpasim_set_config,
> @@ -726,6 +746,7 @@ static const struct vdpa_config_ops 
> vdpasim_batch_config_ops = {
> .get_status = vdpasim_get_status,
> .set_status = vdpasim_set_status,
> .reset  = vdpasim_reset,
> +   .suspend= vdpasim_suspend,
> .get_config_size= vdpasim_get_config_size,
> .get_config = vdpasim_get_config,
> .set_config = vdpasim_set_config,
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h 
> b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> index 622782e92239..061986f30911 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
> @@ -66,6 +66,7 @@ struct vdpasim {
> u32 generation;
> u64 features;
> u32 groups;
> +   bool running;
> /* spinlock to synchronize iommu table */
> spinlock_t iommu_lock;
>  };
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> index 42d401d43911..bcdb1982c378 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
> @@ -204,6 +204,9 @@ static void vdpasim_blk_work(struct work_struct *work)
> if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> goto out;
>
> +   if (!vdpasim->running)
> +   goto out;
> +
> for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) {
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[i];
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c 
> b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> index 5125976a4df8..886449e88502 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
> @@ -154,6 +154,9 @@ static void vdpasim_net_work(struct work_struct *work)
>
> spin_lock(&vdpasim->lock);
>
> +   if (!vdpasim->running)
> +   goto out;
> +
> if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> goto out;
>
> --
> 2.31.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V3] virtio: disable notification hardening by default

2022-06-28 Thread Michael S. Tsirkin
On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason Wang wrote:
> On Tue, Jun 28, 2022 at 2:17 PM Jason Wang  wrote:
> >
> > On Tue, Jun 28, 2022 at 1:00 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jun 28, 2022 at 11:49:12AM +0800, Jason Wang wrote:
> > > > > Heh. Yea sure. But things work fine for people. What is the chance
> > > > > your review found and fixed all driver bugs?
> > > >
> > > > I don't/can't audit all bugs but the race between open/close against
> > > > ready/reset. It looks to me a good chance to fix them all but if you
> > > > think differently, let me know
> > > >
> > > > > After two attempts
> > > > > I don't feel like hoping audit will fix all bugs.
> > > >
> > > > I've started the auditing and have 15+ patches in the queue. (only
> > > > covers bluetooth, console, pmem, virtio-net and caif). Spotting the
> > > > issue is not hard but the testing, It would take at least the time of
> > > > one release to finalize I guess.
> > >
> > > Absolutely. So I am looking for a way to implement hardening that does
> > > not break existing drivers.
> >
> > I totally agree with you to seek a way without bothering the drivers.
> > Just wonder if this is possbile.
> >
> > >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > The reason config was kind of easy is that config interrupt is 
> > > > > > > rarely
> > > > > > > vital for device function so arbitrarily deferring that does not 
> > > > > > > lead to
> > > > > > > deadlocks - what you are trying to do with VQ interrupts is
> > > > > > > fundamentally different. Things are especially bad if we just drop
> > > > > > > an interrupt but deferring can lead to problems too.
> > > > > >
> > > > > > I'm not sure I see the difference, disable_irq() stuffs also delay 
> > > > > > the
> > > > > > interrupt processing until enable_irq().
> > > > >
> > > > >
> > > > > Absolutely. I am not at all sure disable_irq fixes all problems.
> > > > >
> > > > > > >
> > > > > > > Consider as an example
> > > > > > > virtio-net: fix race between ndo_open() and 
> > > > > > > virtio_device_ready()
> > > > > > > if you just defer vq interrupts you get deadlocks.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > I don't see a deadlock here, maybe you can show more detail on this?
> > > > >
> > > > > What I mean is this: if we revert the above commit, things still
> > > > > work (out of spec, but still). If we revert and defer interrupts until
> > > > > device ready then ndo_open that triggers before device ready 
> > > > > deadlocks.
> > > >
> > > > Ok, I guess you meant on a hypervisor that is strictly written with 
> > > > spec.
> > >
> > > I mean on hypervisor that starts processing queues after getting a kick
> > > even without DRIVER_OK.
> >
> > Oh right.
> >
> > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > So, thinking about all this, how about a simple per vq flag 
> > > > > > > meaning
> > > > > > > "this vq was kicked since reset"?
> > > > > >
> > > > > > And ignore the notification if vq is not kicked? It sounds like the
> > > > > > callback needs to be synchronized with the kick.
> > > > >
> > > > > Note we only need to synchronize it when it changes, which is
> > > > > only during initialization and reset.
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > If driver does not kick then it's not ready to get callbacks, 
> > > > > > > right?
> > > > > > >
> > > > > > > Sounds quite clean, but we need to think through memory ordering
> > > > > > > concerns - I guess it's only when we change the value so
> > > > > > > if (!vq->kicked) {
> > > > > > > vq->kicked = true;
> > > > > > > mb();
> > > > > > > }
> > > > > > >
> > > > > > > will do the trick, right?
> > > > > >
> > > > > > There's no much difference with the existing approach:
> > > > > >
> > > > > > 1) your proposal implicitly makes callbacks ready in 
> > > > > > virtqueue_kick()
> > > > > > 2) my proposal explicitly makes callbacks ready via 
> > > > > > virtio_device_ready()
> > > > > >
> > > > > > Both require careful auditing of all the existing drivers to make 
> > > > > > sure
> > > > > > no kick before DRIVER_OK.
> > > > >
> > > > > Jason, kick before DRIVER_OK is out of spec, sure. But it is unrelated
> > > > > to hardening
> > > >
> > > > Yes but with your proposal, it seems to couple kick with DRIVER_OK 
> > > > somehow.
> > >
> > > I don't see how - my proposal ignores DRIVER_OK issues.
> >
> > Yes, what I meant is, in your proposal, the first kick after rest is a
> > hint that the driver is ok (but actually it could not).
> >
> > >
> > > > > and in absence of config interrupts is generally easily
> > > > > fixed just by sticking virtio_device_ready early in initialization.
> > > >
> > > > So if the kick is done before the subsystem registration, there's
> > > > still a window in the middle (assuming we stick virtio_device_ready()
> > > > early):
> > > >
> > > > virtio_device_ready()
> > > > virtqueue_ki

[PATCH v11 01/40] virtio: add helper virtqueue_get_vring_max_size()

2022-06-28 Thread Xuan Zhuo
Record the maximum queue num supported by the device.

virtio-net can display the maximum (supported by hardware) ring size in
ethtool -g eth0.

When the subsequent patch implements vring reset, it can judge whether
the ring size passed by the driver is legal based on this.

Signed-off-by: Xuan Zhuo 
---
 arch/um/drivers/virtio_uml.c |  1 +
 drivers/platform/mellanox/mlxbf-tmfifo.c |  2 ++
 drivers/remoteproc/remoteproc_virtio.c   |  2 ++
 drivers/s390/virtio/virtio_ccw.c |  3 +++
 drivers/virtio/virtio_mmio.c |  2 ++
 drivers/virtio/virtio_pci_legacy.c   |  2 ++
 drivers/virtio/virtio_pci_modern.c   |  2 ++
 drivers/virtio/virtio_ring.c | 14 ++
 drivers/virtio/virtio_vdpa.c |  2 ++
 include/linux/virtio.h   |  2 ++
 10 files changed, 32 insertions(+)

diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index 82ff3785bf69..e719af8bdf56 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -958,6 +958,7 @@ static struct virtqueue *vu_setup_vq(struct virtio_device 
*vdev,
goto error_create;
}
vq->priv = info;
+   vq->num_max = num;
num = virtqueue_get_vring_size(vq);
 
if (vu_dev->protocol_features &
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c 
b/drivers/platform/mellanox/mlxbf-tmfifo.c
index 38800e86ed8a..1ae3c56b66b0 100644
--- a/drivers/platform/mellanox/mlxbf-tmfifo.c
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -959,6 +959,8 @@ static int mlxbf_tmfifo_virtio_find_vqs(struct 
virtio_device *vdev,
goto error;
}
 
+   vq->num_max = vring->num;
+
vqs[i] = vq;
vring->vq = vq;
vq->priv = vring;
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index d43d74733f0a..0f7706e23eb9 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -125,6 +125,8 @@ static struct virtqueue *rp_find_vq(struct virtio_device 
*vdev,
return ERR_PTR(-ENOMEM);
}
 
+   vq->num_max = num;
+
rvring->vq = vq;
vq->priv = rvring;
 
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 161d3b141f0d..6b86d0280d6b 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -530,6 +530,9 @@ static struct virtqueue *virtio_ccw_setup_vq(struct 
virtio_device *vdev,
err = -ENOMEM;
goto out_err;
}
+
+   vq->num_max = info->num;
+
/* it may have been reduced */
info->num = virtqueue_get_vring_size(vq);
 
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 083ff1eb743d..a20d5a6b5819 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -403,6 +403,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev, unsigned int in
goto error_new_virtqueue;
}
 
+   vq->num_max = num;
+
/* Activate the queue */
writel(virtqueue_get_vring_size(vq), vm_dev->base + 
VIRTIO_MMIO_QUEUE_NUM);
if (vm_dev->version == 1) {
diff --git a/drivers/virtio/virtio_pci_legacy.c 
b/drivers/virtio/virtio_pci_legacy.c
index a5e5721145c7..2257f1b3d8ae 100644
--- a/drivers/virtio/virtio_pci_legacy.c
+++ b/drivers/virtio/virtio_pci_legacy.c
@@ -135,6 +135,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device 
*vp_dev,
if (!vq)
return ERR_PTR(-ENOMEM);
 
+   vq->num_max = num;
+
q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT;
if (q_pfn >> 32) {
dev_err(&vp_dev->pci_dev->dev,
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 623906b4996c..e7e0b8c850f6 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -218,6 +218,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device 
*vp_dev,
if (!vq)
return ERR_PTR(-ENOMEM);
 
+   vq->num_max = num;
+
/* activate the queue */
vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq));
vp_modern_queue_address(mdev, index, virtqueue_get_desc_addr(vq),
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a5ec724c01d8..4cac600856ad 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2385,6 +2385,20 @@ void vring_transport_features(struct virtio_device *vdev)
 }
 EXPORT_SYMBOL_GPL(vring_transport_features);
 
+/**
+ * virtqueue_get_vring_max_size - return the max size of the virtqueue's vring
+ * @_vq: the struct virtqueue containing the vring of interest.
+ *
+ * Returns the max size of the vring.
+ *
+ * Unlike other operations, this need not be serialized.
+ */
+unsigned int virtqueue_get_vring_ma

[PATCH v11 00/40] virtio pci support VIRTIO_F_RING_RESET

2022-06-28 Thread Xuan Zhuo
The virtio spec already supports the virtio queue reset function. This patch set
is to add this function to the kernel. The relevant virtio spec information is
here:

https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139

Also regarding MMIO support for queue reset, I plan to support it after this
patch is passed.

This patch set implements the refactoring of vring. Finally, the
virtuque_resize() interface is provided based on the reset function of the
transport layer.

Test environment:
Host: 4.19.91
Qemu: QEMU emulator version 6.2.50 (with vq reset support)
Test Cmd:  ethtool -G eth1 rx $1 tx $2; ethtool -g eth1

The default is split mode, modify Qemu virtio-net to add PACKED feature to 
test
packed mode.

Qemu code:

https://github.com/fengidri/qemu/compare/89f3bfa3265554d1d591ee4d7f1197b6e3397e84...master

In order to simplify the review of this patch set, the function of reusing
the old buffers after resize will be introduced in subsequent patch sets.

Please review. Thanks.

v11:
  1. struct virtio_pci_common_cfg to virtio_pci_modern.h
  2. conflict resolution

v10:
  1. on top of the harden vring IRQ
  2. factor out split and packed from struct vring_virtqueue
  3. some suggest from @Jason Wang

v9:
  1. Provide a virtqueue_resize() interface directly
  2. A patch set including vring resize, virtio pci reset, virtio-net resize
  3. No more separate structs

v8:
  1. Provide a virtqueue_reset() interface directly
  2. Split the two patch sets, this is the first part
  3. Add independent allocation helper for allocating state, extra

v7:
  1. fix #6 subject typo
  2. fix #6 ring_size_in_bytes is uninitialized
  3. check by: make W=12

v6:
  1. virtio_pci: use synchronize_irq(irq) to sync the irq callbacks
  2. Introduce virtqueue_reset_vring() to implement the reset of vring during
 the reset process. May use the old vring if num of the vq not change.
  3. find_vqs() support sizes to special the max size of each vq

v5:
  1. add virtio-net support set_ringparam

v4:
  1. just the code of virtio, without virtio-net
  2. Performing reset on a queue is divided into these steps:
1. reset_vq: reset one vq
2. recycle the buffer from vq by virtqueue_detach_unused_buf()
3. release the ring of the vq by vring_release_virtqueue()
4. enable_reset_vq: re-enable the reset queue
  3. Simplify the parameters of enable_reset_vq()
  4. add container structures for virtio_pci_common_cfg

v3:
  1. keep vq, irq unreleased

Xuan Zhuo (40):
  virtio: add helper virtqueue_get_vring_max_size()
  virtio: struct virtio_config_ops add callbacks for queue_reset
  virtio_ring: update the document of the virtqueue_detach_unused_buf
for queue reset
  virtio_ring: extract the logic of freeing vring
  virtio_ring: split vring_virtqueue
  virtio_ring: introduce virtqueue_init()
  virtio_ring: split: introduce vring_free_split()
  virtio_ring: split: extract the logic of alloc queue
  virtio_ring: split: extract the logic of alloc state and extra
  virtio_ring: split: extract the logic of attach vring
  virtio_ring: split: extract the logic of vring init
  virtio_ring: split: introduce virtqueue_reinit_split()
  virtio_ring: split: reserve vring_align, may_reduce_num
  virtio_ring: split: introduce virtqueue_resize_split()
  virtio_ring: packed: introduce vring_free_packed
  virtio_ring: packed: extract the logic of alloc queue
  virtio_ring: packed: extract the logic of alloc state and extra
  virtio_ring: packed: extract the logic of attach vring
  virtio_ring: packed: extract the logic of vring init
  virtio_ring: packed: introduce virtqueue_reinit_packed()
  virtio_ring: packed: introduce virtqueue_resize_packed()
  virtio_ring: introduce virtqueue_resize()
  virtio_pci: move struct virtio_pci_common_cfg to virtio_pci_modern.h
  virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
  virtio: allow to unbreak/break virtqueue individually
  virtio: queue_reset: add VIRTIO_F_RING_RESET
  virtio_pci: struct virtio_pci_common_cfg add queue_reset
  virtio_pci: introduce helper to get/set queue reset
  virtio_pci: extract the logic of active vq for modern pci
  virtio_pci: support VIRTIO_F_RING_RESET
  virtio: find_vqs() add arg sizes
  virtio_pci: support the arg sizes of find_vqs()
  virtio_mmio: support the arg sizes of find_vqs()
  virtio: add helper virtio_find_vqs_ctx_size()
  virtio_net: set the default max ring size by find_vqs()
  virtio_net: get ringparam by virtqueue_get_vring_max_size()
  virtio_net: split free_unused_bufs()
  virtio_net: support rx queue resize
  virtio_net: support tx queue resize
  virtio_net: support set_ringparam

 arch/um/drivers/virtio_uml.c |   3 +-
 drivers/net/virtio_net.c | 209 +-
 drivers/platform/mellanox/mlxbf-tmfifo.c |   3 +
 drivers/remoteproc/remoteproc_virtio.c   |   3 +
 drivers/s390/virtio/virtio_ccw.c |   4 +
 drivers/virtio/virtio_mmio

[PATCH v11 03/40] virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset

2022-06-28 Thread Xuan Zhuo
Added documentation for virtqueue_detach_unused_buf, allowing it to be
called on queue reset.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4cac600856ad..4ed51bc05a51 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2130,8 +2130,8 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
  * @_vq: the struct virtqueue we're talking about.
  *
  * Returns NULL or the "data" token handed to virtqueue_add_*().
- * This is not valid on an active queue; it is useful only for device
- * shutdown.
+ * This is not valid on an active queue; it is useful for device
+ * shutdown or the reset queue.
  */
 void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 02/40] virtio: struct virtio_config_ops add callbacks for queue_reset

2022-06-28 Thread Xuan Zhuo
reset can be divided into the following four steps (example):
 1. transport: notify the device to reset the queue
 2. vring: recycle the buffer submitted
 3. vring: reset/resize the vring (may re-alloc)
 4. transport: mmap vring to device, and enable the queue

In order to support queue reset, add two callbacks(reset_vq,
enable_reset_vq) in struct virtio_config_ops to implement steps 1 and 4.

Signed-off-by: Xuan Zhuo 
---
 include/linux/virtio_config.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index b47c2e7ed0ee..ded51b0d4823 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -78,6 +78,16 @@ struct virtio_shm_region {
  * @set_vq_affinity: set the affinity for a virtqueue (optional).
  * @get_vq_affinity: get the affinity for a virtqueue (optional).
  * @get_shm_region: get a shared memory region based on the index.
+ * @reset_vq: reset a queue individually (optional).
+ * vq: the virtqueue
+ * Returns 0 on success or error status
+ * reset_vq will guarantee that the callbacks are disabled and 
synchronized.
+ * Except for the callback, the caller should guarantee that the vring is
+ * not accessed by any functions of virtqueue.
+ * @enable_reset_vq: enable a reset queue
+ * vq: the virtqueue
+ * Returns 0 on success or error status
+ * If reset_vq is set, then enable_reset_vq must also be set.
  */
 typedef void vq_callback_t(struct virtqueue *);
 struct virtio_config_ops {
@@ -104,6 +114,8 @@ struct virtio_config_ops {
int index);
bool (*get_shm_region)(struct virtio_device *vdev,
   struct virtio_shm_region *region, u8 id);
+   int (*reset_vq)(struct virtqueue *vq);
+   int (*enable_reset_vq)(struct virtqueue *vq);
 };
 
 /* If driver didn't advertise the feature, it will never appear. */
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 04/40] virtio_ring: extract the logic of freeing vring

2022-06-28 Thread Xuan Zhuo
Introduce vring_free() to free the vring of vq.

Subsequent patches will use vring_free() alone.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4ed51bc05a51..bb4e8ae09c9b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2316,14 +2316,10 @@ struct virtqueue *vring_new_virtqueue(unsigned int 
index,
 }
 EXPORT_SYMBOL_GPL(vring_new_virtqueue);
 
-void vring_del_virtqueue(struct virtqueue *_vq)
+static void vring_free(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
-   spin_lock(&vq->vq.vdev->vqs_list_lock);
-   list_del(&_vq->list);
-   spin_unlock(&vq->vq.vdev->vqs_list_lock);
-
if (vq->we_own_ring) {
if (vq->packed_ring) {
vring_free_queue(vq->vq.vdev,
@@ -2354,6 +2350,18 @@ void vring_del_virtqueue(struct virtqueue *_vq)
kfree(vq->split.desc_state);
kfree(vq->split.desc_extra);
}
+}
+
+void vring_del_virtqueue(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   spin_lock(&vq->vq.vdev->vqs_list_lock);
+   list_del(&_vq->list);
+   spin_unlock(&vq->vq.vdev->vqs_list_lock);
+
+   vring_free(_vq);
+
kfree(vq);
 }
 EXPORT_SYMBOL_GPL(vring_del_virtqueue);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 05/40] virtio_ring: split vring_virtqueue

2022-06-28 Thread Xuan Zhuo
Separate the two inline structures(split and packed) from the structure
vring_virtqueue.

In this way, we can use these two structures later to pass parameters
and retain temporary variables.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 116 ++-
 1 file changed, 60 insertions(+), 56 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bb4e8ae09c9b..2806e033a651 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -85,6 +85,64 @@ struct vring_desc_extra {
u16 next;   /* The next desc state in a list. */
 };
 
+struct vring_virtqueue_split {
+   /* Actual memory layout for this queue. */
+   struct vring vring;
+
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
+
+   /*
+* Last written value to avail->idx in
+* guest byte order.
+*/
+   u16 avail_idx_shadow;
+
+   /* Per-descriptor state. */
+   struct vring_desc_state_split *desc_state;
+   struct vring_desc_extra *desc_extra;
+
+   /* DMA address and size information */
+   dma_addr_t queue_dma_addr;
+   size_t queue_size_in_bytes;
+};
+
+struct vring_virtqueue_packed {
+   /* Actual memory layout for this queue. */
+   struct {
+   unsigned int num;
+   struct vring_packed_desc *desc;
+   struct vring_packed_desc_event *driver;
+   struct vring_packed_desc_event *device;
+   } vring;
+
+   /* Driver ring wrap counter. */
+   bool avail_wrap_counter;
+
+   /* Avail used flags. */
+   u16 avail_used_flags;
+
+   /* Index of the next avail descriptor. */
+   u16 next_avail_idx;
+
+   /*
+* Last written value to driver->flags in
+* guest byte order.
+*/
+   u16 event_flags_shadow;
+
+   /* Per-descriptor state. */
+   struct vring_desc_state_packed *desc_state;
+   struct vring_desc_extra *desc_extra;
+
+   /* DMA address and size information */
+   dma_addr_t ring_dma_addr;
+   dma_addr_t driver_event_dma_addr;
+   dma_addr_t device_event_dma_addr;
+   size_t ring_size_in_bytes;
+   size_t event_size_in_bytes;
+};
+
 struct vring_virtqueue {
struct virtqueue vq;
 
@@ -124,64 +182,10 @@ struct vring_virtqueue {
 
union {
/* Available for split ring */
-   struct {
-   /* Actual memory layout for this queue. */
-   struct vring vring;
-
-   /* Last written value to avail->flags */
-   u16 avail_flags_shadow;
-
-   /*
-* Last written value to avail->idx in
-* guest byte order.
-*/
-   u16 avail_idx_shadow;
-
-   /* Per-descriptor state. */
-   struct vring_desc_state_split *desc_state;
-   struct vring_desc_extra *desc_extra;
-
-   /* DMA address and size information */
-   dma_addr_t queue_dma_addr;
-   size_t queue_size_in_bytes;
-   } split;
+   struct vring_virtqueue_split split;
 
/* Available for packed ring */
-   struct {
-   /* Actual memory layout for this queue. */
-   struct {
-   unsigned int num;
-   struct vring_packed_desc *desc;
-   struct vring_packed_desc_event *driver;
-   struct vring_packed_desc_event *device;
-   } vring;
-
-   /* Driver ring wrap counter. */
-   bool avail_wrap_counter;
-
-   /* Avail used flags. */
-   u16 avail_used_flags;
-
-   /* Index of the next avail descriptor. */
-   u16 next_avail_idx;
-
-   /*
-* Last written value to driver->flags in
-* guest byte order.
-*/
-   u16 event_flags_shadow;
-
-   /* Per-descriptor state. */
-   struct vring_desc_state_packed *desc_state;
-   struct vring_desc_extra *desc_extra;
-
-   /* DMA address and size information */
-   dma_addr_t ring_dma_addr;
-   dma_addr_t driver_event_dma_addr;
-   dma_addr_t device_event_dma_addr;
-   size_t ring_size_in_bytes;
-   size_t event_size_in_bytes;
-   } packed;
+   struct vring_virtqueue_packed packed;
};
 
/* How to notify other side. FIXME: c

[PATCH v11 07/40] virtio_ring: split: introduce vring_free_split()

2022-06-28 Thread Xuan Zhuo
Free the structure struct vring_vritqueue_split.

Subsequent patches require it.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 986dbd9294d6..49d61e412dc6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -939,6 +939,16 @@ static void *virtqueue_detach_unused_buf_split(struct 
virtqueue *_vq)
return NULL;
 }
 
+static void vring_free_split(struct vring_virtqueue_split *vring,
+struct virtio_device *vdev)
+{
+   vring_free_queue(vdev, vring->queue_size_in_bytes, vring->vring.desc,
+vring->queue_dma_addr);
+
+   kfree(vring->desc_state);
+   kfree(vring->desc_extra);
+}
+
 static struct virtqueue *vring_create_virtqueue_split(
unsigned int index,
unsigned int num,
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 08/40] virtio_ring: split: extract the logic of alloc queue

2022-06-28 Thread Xuan Zhuo
Separate the logic of split to create vring queue.

This feature is required for subsequent virtuqueue reset vring.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 68 ++--
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 49d61e412dc6..a9ceb9c16c54 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -949,28 +949,19 @@ static void vring_free_split(struct vring_virtqueue_split 
*vring,
kfree(vring->desc_extra);
 }
 
-static struct virtqueue *vring_create_virtqueue_split(
-   unsigned int index,
-   unsigned int num,
-   unsigned int vring_align,
-   struct virtio_device *vdev,
-   bool weak_barriers,
-   bool may_reduce_num,
-   bool context,
-   bool (*notify)(struct virtqueue *),
-   void (*callback)(struct virtqueue *),
-   const char *name)
+static int vring_alloc_queue_split(struct vring_virtqueue_split *vring,
+  struct virtio_device *vdev,
+  u32 num,
+  unsigned int vring_align,
+  bool may_reduce_num)
 {
-   struct virtqueue *vq;
void *queue = NULL;
dma_addr_t dma_addr;
-   size_t queue_size_in_bytes;
-   struct vring vring;
 
/* We assume num is a power of 2. */
if (num & (num - 1)) {
dev_warn(&vdev->dev, "Bad virtqueue length %u\n", num);
-   return NULL;
+   return -EINVAL;
}
 
/* TODO: allocate each queue chunk individually */
@@ -981,11 +972,11 @@ static struct virtqueue *vring_create_virtqueue_split(
if (queue)
break;
if (!may_reduce_num)
-   return NULL;
+   return -ENOMEM;
}
 
if (!num)
-   return NULL;
+   return -ENOMEM;
 
if (!queue) {
/* Try to get a single page. You are my only hope! */
@@ -993,21 +984,46 @@ static struct virtqueue *vring_create_virtqueue_split(
  &dma_addr, GFP_KERNEL|__GFP_ZERO);
}
if (!queue)
-   return NULL;
+   return -ENOMEM;
+
+   vring_init(&vring->vring, num, queue, vring_align);
 
-   queue_size_in_bytes = vring_size(num, vring_align);
-   vring_init(&vring, num, queue, vring_align);
+   vring->queue_dma_addr = dma_addr;
+   vring->queue_size_in_bytes = vring_size(num, vring_align);
+
+   return 0;
+}
+
+static struct virtqueue *vring_create_virtqueue_split(
+   unsigned int index,
+   unsigned int num,
+   unsigned int vring_align,
+   struct virtio_device *vdev,
+   bool weak_barriers,
+   bool may_reduce_num,
+   bool context,
+   bool (*notify)(struct virtqueue *),
+   void (*callback)(struct virtqueue *),
+   const char *name)
+{
+   struct vring_virtqueue_split vring = {};
+   struct virtqueue *vq;
+   int err;
+
+   err = vring_alloc_queue_split(&vring, vdev, num, vring_align,
+ may_reduce_num);
+   if (err)
+   return NULL;
 
-   vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-  notify, callback, name);
+   vq = __vring_new_virtqueue(index, vring.vring, vdev, weak_barriers,
+  context, notify, callback, name);
if (!vq) {
-   vring_free_queue(vdev, queue_size_in_bytes, queue,
-dma_addr);
+   vring_free_split(&vring, vdev);
return NULL;
}
 
-   to_vvq(vq)->split.queue_dma_addr = dma_addr;
-   to_vvq(vq)->split.queue_size_in_bytes = queue_size_in_bytes;
+   to_vvq(vq)->split.queue_dma_addr = vring.queue_dma_addr;
+   to_vvq(vq)->split.queue_size_in_bytes = vring.queue_size_in_bytes;
to_vvq(vq)->we_own_ring = true;
 
return vq;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 06/40] virtio_ring: introduce virtqueue_init()

2022-06-28 Thread Xuan Zhuo
Separate the logic of virtqueue initialization. This logic is irrelevant
to ring layout.

This logic can be called independently when implementing resize/reset
later.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 61 ++--
 1 file changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2806e033a651..986dbd9294d6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -368,6 +368,34 @@ static int vring_mapping_error(const struct 
vring_virtqueue *vq,
return dma_mapping_error(vring_dma_dev(vq), addr);
 }
 
+static void virtqueue_init(struct vring_virtqueue *vq, u32 num)
+{
+   struct virtio_device *vdev;
+
+   vdev = vq->vq.vdev;
+
+   vq->vq.num_free = num;
+   if (vq->packed_ring)
+   vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR);
+   else
+   vq->last_used_idx = 0;
+   vq->event_triggered = false;
+   vq->num_added = 0;
+   vq->use_dma_api = vring_use_dma_api(vdev);
+#ifdef DEBUG
+   vq->in_use = false;
+   vq->last_add_time_valid = false;
+#endif
+
+   vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+
+   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
+   vq->weak_barriers = false;
+
+   /* Put everything in free lists. */
+   vq->free_head = 0;
+}
+
 
 /*
  * Split ring specific functions - *_split().
@@ -1706,7 +1734,6 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->vq.callback = callback;
vq->vq.vdev = vdev;
vq->vq.name = name;
-   vq->vq.num_free = num;
vq->vq.index = index;
vq->we_own_ring = true;
vq->notify = notify;
@@ -1716,22 +1743,10 @@ static struct virtqueue *vring_create_virtqueue_packed(
 #else
vq->broken = false;
 #endif
-   vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR);
-   vq->event_triggered = false;
-   vq->num_added = 0;
vq->packed_ring = true;
-   vq->use_dma_api = vring_use_dma_api(vdev);
-#ifdef DEBUG
-   vq->in_use = false;
-   vq->last_add_time_valid = false;
-#endif
 
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
-   vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
-
-   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
-   vq->weak_barriers = false;
 
vq->packed.ring_dma_addr = ring_dma_addr;
vq->packed.driver_event_dma_addr = driver_event_dma_addr;
@@ -1759,8 +1774,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
memset(vq->packed.desc_state, 0,
num * sizeof(struct vring_desc_state_packed));
 
-   /* Put everything in free lists. */
-   vq->free_head = 0;
+   virtqueue_init(vq, num);
 
vq->packed.desc_extra = vring_alloc_desc_extra(num);
if (!vq->packed.desc_extra)
@@ -2205,7 +2219,6 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->vq.callback = callback;
vq->vq.vdev = vdev;
vq->vq.name = name;
-   vq->vq.num_free = vring.num;
vq->vq.index = index;
vq->we_own_ring = false;
vq->notify = notify;
@@ -2215,21 +2228,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
 #else
vq->broken = false;
 #endif
-   vq->last_used_idx = 0;
-   vq->event_triggered = false;
-   vq->num_added = 0;
-   vq->use_dma_api = vring_use_dma_api(vdev);
-#ifdef DEBUG
-   vq->in_use = false;
-   vq->last_add_time_valid = false;
-#endif
 
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
-   vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
-
-   if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
-   vq->weak_barriers = false;
 
vq->split.queue_dma_addr = 0;
vq->split.queue_size_in_bytes = 0;
@@ -2255,11 +2256,11 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
if (!vq->split.desc_extra)
goto err_extra;
 
-   /* Put everything in free lists. */
-   vq->free_head = 0;
memset(vq->split.desc_state, 0, vring.num *
sizeof(struct vring_desc_state_split));
 
+   virtqueue_init(vq, vq->split.vring.num);
+
spin_lock(&vdev->vqs_list_lock);
list_add_tail(&vq->vq.list, &vdev->vqs);
spin_unlock(&vdev->vqs_list_lock);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 10/40] virtio_ring: split: extract the logic of attach vring

2022-06-28 Thread Xuan Zhuo
Separate the logic of attach vring, subsequent patches will call it
separately.

Since the "struct vring_virtqueue_split split" is created on the
stack and has been initialized to 0. So using
split->queue_dma_addr/split->queue_size_in_bytes assignment for
queue_dma_addr/queue_size_in_bytes can keep the same as the original
code.

On the other hand, subsequent patches can use the "struct
vring_virtqueue_split split" obtained by vring_alloc_queue_split() to
directly complete the attach operation.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cedd340d6db7..9025bd373d3b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -940,6 +940,18 @@ static void *virtqueue_detach_unused_buf_split(struct 
virtqueue *_vq)
return NULL;
 }
 
+static void virtqueue_vring_attach_split(struct vring_virtqueue *vq,
+struct vring_virtqueue_split *vring)
+{
+   vq->split.queue_dma_addr = vring->queue_dma_addr;
+   vq->split.queue_size_in_bytes = vring->queue_size_in_bytes;
+
+   vq->split.vring = vring->vring;
+
+   vq->split.desc_state = vring->desc_state;
+   vq->split.desc_extra = vring->desc_extra;
+}
+
 static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring)
 {
struct vring_desc_state_split *state;
@@ -2287,10 +2299,6 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
 
-   vq->split.queue_dma_addr = 0;
-   vq->split.queue_size_in_bytes = 0;
-
-   vq->split.vring = _vring;
vq->split.avail_flags_shadow = 0;
vq->split.avail_idx_shadow = 0;
 
@@ -2310,10 +2318,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
return NULL;
}
 
-   vq->split.desc_state = vring.desc_state;
-   vq->split.desc_extra = vring.desc_extra;
-
virtqueue_init(vq, vring.vring.num);
+   virtqueue_vring_attach_split(vq, &vring);
 
spin_lock(&vdev->vqs_list_lock);
list_add_tail(&vq->vq.list, &vdev->vqs);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 09/40] virtio_ring: split: extract the logic of alloc state and extra

2022-06-28 Thread Xuan Zhuo
Separate the logic of creating desc_state, desc_extra, and subsequent
patches will call it independently.

Since only the structure vring is passed into __vring_new_virtqueue(),
when creating the function vring_alloc_state_extra_split(), we prefer to
use vring_virtqueue_split as a parameter, and it will be more convenient
to pass vring_virtqueue_split to some subsequent functions.

So a new vring_virtqueue_split variable is added in
__vring_new_virtqueue().

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 58 +---
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index a9ceb9c16c54..cedd340d6db7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -204,6 +204,7 @@ struct vring_virtqueue {
 #endif
 };
 
+static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num);
 
 /*
  * Helpers.
@@ -939,6 +940,32 @@ static void *virtqueue_detach_unused_buf_split(struct 
virtqueue *_vq)
return NULL;
 }
 
+static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring)
+{
+   struct vring_desc_state_split *state;
+   struct vring_desc_extra *extra;
+   u32 num = vring->vring.num;
+
+   state = kmalloc_array(num, sizeof(struct vring_desc_state_split), 
GFP_KERNEL);
+   if (!state)
+   goto err_state;
+
+   extra = vring_alloc_desc_extra(num);
+   if (!extra)
+   goto err_extra;
+
+   memset(state, 0, num * sizeof(struct vring_desc_state_split));
+
+   vring->desc_state = state;
+   vring->desc_extra = extra;
+   return 0;
+
+err_extra:
+   kfree(state);
+err_state:
+   return -ENOMEM;
+}
+
 static void vring_free_split(struct vring_virtqueue_split *vring,
 struct virtio_device *vdev)
 {
@@ -2224,7 +2251,7 @@ EXPORT_SYMBOL_GPL(vring_interrupt);
 
 /* Only available for split ring */
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-   struct vring vring,
+   struct vring _vring,
struct virtio_device *vdev,
bool weak_barriers,
bool context,
@@ -2232,7 +2259,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
void (*callback)(struct virtqueue *),
const char *name)
 {
+   struct vring_virtqueue_split vring = {};
struct vring_virtqueue *vq;
+   int err;
 
if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED))
return NULL;
@@ -2261,7 +2290,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->split.queue_dma_addr = 0;
vq->split.queue_size_in_bytes = 0;
 
-   vq->split.vring = vring;
+   vq->split.vring = _vring;
vq->split.avail_flags_shadow = 0;
vq->split.avail_idx_shadow = 0;
 
@@ -2273,30 +2302,23 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->split.avail_flags_shadow);
}
 
-   vq->split.desc_state = kmalloc_array(vring.num,
-   sizeof(struct vring_desc_state_split), GFP_KERNEL);
-   if (!vq->split.desc_state)
-   goto err_state;
+   vring.vring = _vring;
 
-   vq->split.desc_extra = vring_alloc_desc_extra(vring.num);
-   if (!vq->split.desc_extra)
-   goto err_extra;
+   err = vring_alloc_state_extra_split(&vring);
+   if (err) {
+   kfree(vq);
+   return NULL;
+   }
 
-   memset(vq->split.desc_state, 0, vring.num *
-   sizeof(struct vring_desc_state_split));
+   vq->split.desc_state = vring.desc_state;
+   vq->split.desc_extra = vring.desc_extra;
 
-   virtqueue_init(vq, vq->split.vring.num);
+   virtqueue_init(vq, vring.vring.num);
 
spin_lock(&vdev->vqs_list_lock);
list_add_tail(&vq->vq.list, &vdev->vqs);
spin_unlock(&vdev->vqs_list_lock);
return &vq->vq;
-
-err_extra:
-   kfree(vq->split.desc_state);
-err_state:
-   kfree(vq);
-   return NULL;
 }
 EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
 
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 11/40] virtio_ring: split: extract the logic of vring init

2022-06-28 Thread Xuan Zhuo
Separate the logic of initializing vring, and subsequent patches will
call it separately.

This function completes the variable initialization of split vring. It
together with the logic of atatch constitutes the initialization of
vring.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 30 +++---
 1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9025bd373d3b..35540daaa1e7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -940,6 +940,24 @@ static void *virtqueue_detach_unused_buf_split(struct 
virtqueue *_vq)
return NULL;
 }
 
+static void virtqueue_vring_init_split(struct vring_virtqueue *vq)
+{
+   struct virtio_device *vdev;
+
+   vdev = vq->vq.vdev;
+
+   vq->split.avail_flags_shadow = 0;
+   vq->split.avail_idx_shadow = 0;
+
+   /* No callback?  Tell other side not to bother us. */
+   if (!vq->vq.callback) {
+   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+   if (!vq->event)
+   vq->split.vring.avail->flags = cpu_to_virtio16(vdev,
+   vq->split.avail_flags_shadow);
+   }
+}
+
 static void virtqueue_vring_attach_split(struct vring_virtqueue *vq,
 struct vring_virtqueue_split *vring)
 {
@@ -2299,17 +2317,6 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
 
-   vq->split.avail_flags_shadow = 0;
-   vq->split.avail_idx_shadow = 0;
-
-   /* No callback?  Tell other side not to bother us. */
-   if (!callback) {
-   vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-   if (!vq->event)
-   vq->split.vring.avail->flags = cpu_to_virtio16(vdev,
-   vq->split.avail_flags_shadow);
-   }
-
vring.vring = _vring;
 
err = vring_alloc_state_extra_split(&vring);
@@ -2320,6 +2327,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
 
virtqueue_init(vq, vring.vring.num);
virtqueue_vring_attach_split(vq, &vring);
+   virtqueue_vring_init_split(vq);
 
spin_lock(&vdev->vqs_list_lock);
list_add_tail(&vq->vq.list, &vdev->vqs);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 12/40] virtio_ring: split: introduce virtqueue_reinit_split()

2022-06-28 Thread Xuan Zhuo
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.

Subsequent patches will call this function after reset vq to
reinitialize vq.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 35540daaa1e7..4c8972da5423 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -958,6 +958,25 @@ static void virtqueue_vring_init_split(struct 
vring_virtqueue *vq)
}
 }
 
+static void virtqueue_reinit_split(struct vring_virtqueue *vq)
+{
+   int size, i;
+
+   memset(vq->split.vring.desc, 0, vq->split.queue_size_in_bytes);
+
+   size = sizeof(struct vring_desc_state_split) * vq->split.vring.num;
+   memset(vq->split.desc_state, 0, size);
+
+   size = sizeof(struct vring_desc_extra) * vq->split.vring.num;
+   memset(vq->split.desc_extra, 0, size);
+
+   for (i = 0; i < vq->split.vring.num - 1; i++)
+   vq->split.desc_extra[i].next = i + 1;
+
+   virtqueue_init(vq, vq->split.vring.num);
+   virtqueue_vring_init_split(vq);
+}
+
 static void virtqueue_vring_attach_split(struct vring_virtqueue *vq,
 struct vring_virtqueue_split *vring)
 {
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 13/40] virtio_ring: split: reserve vring_align, may_reduce_num

2022-06-28 Thread Xuan Zhuo
In vring_create_virtqueue_split() save vring_align, may_reduce_num to
structure vring_virtqueue_split. Used to create a new vring when
implementing resize .

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4c8972da5423..9c83c5e6d5a9 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -105,6 +105,13 @@ struct vring_virtqueue_split {
/* DMA address and size information */
dma_addr_t queue_dma_addr;
size_t queue_size_in_bytes;
+
+   /*
+* The parameters for creating vrings are reserved for creating new
+* vring.
+*/
+   u32 vring_align;
+   bool may_reduce_num;
 };
 
 struct vring_virtqueue_packed {
@@ -1098,6 +1105,8 @@ static struct virtqueue *vring_create_virtqueue_split(
return NULL;
}
 
+   to_vvq(vq)->split.vring_align = vring_align;
+   to_vvq(vq)->split.may_reduce_num = may_reduce_num;
to_vvq(vq)->split.queue_dma_addr = vring.queue_dma_addr;
to_vvq(vq)->split.queue_size_in_bytes = vring.queue_size_in_bytes;
to_vvq(vq)->we_own_ring = true;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 15/40] virtio_ring: packed: introduce vring_free_packed

2022-06-28 Thread Xuan Zhuo
Free the structure struct vring_vritqueue_packed.

Subsequent patches require it.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1aaa1e5f9991..4f497b6f2d04 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1830,6 +1830,27 @@ static struct vring_desc_extra 
*vring_alloc_desc_extra(unsigned int num)
return desc_extra;
 }
 
+static void vring_free_packed(struct vring_virtqueue_packed *vring,
+ struct virtio_device *vdev)
+{
+   if (vring->vring.desc)
+   vring_free_queue(vdev, vring->ring_size_in_bytes,
+vring->vring.desc, vring->ring_dma_addr);
+
+   if (vring->vring.driver)
+   vring_free_queue(vdev, vring->event_size_in_bytes,
+vring->vring.driver,
+vring->driver_event_dma_addr);
+
+   if (vring->vring.device)
+   vring_free_queue(vdev, vring->event_size_in_bytes,
+vring->vring.device,
+vring->device_event_dma_addr);
+
+   kfree(vring->desc_state);
+   kfree(vring->desc_extra);
+}
+
 static struct virtqueue *vring_create_virtqueue_packed(
unsigned int index,
unsigned int num,
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 17/40] virtio_ring: packed: extract the logic of alloc state and extra

2022-06-28 Thread Xuan Zhuo
Separate the logic for alloc desc_state and desc_extra, which will
be called separately by subsequent patches.

Use struct vring_packed to pass desc_state, desc_extra.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 48 +---
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 891257d9cdf8..0c4109eb6c6c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1902,6 +1902,33 @@ static int vring_alloc_queue_packed(struct 
vring_virtqueue_packed *vring,
return -ENOMEM;
 }
 
+static int vring_alloc_state_extra_packed(struct vring_virtqueue_packed *vring)
+{
+   struct vring_desc_state_packed *state;
+   struct vring_desc_extra *extra;
+   u32 num = vring->vring.num;
+
+   state = kmalloc_array(num, sizeof(struct vring_desc_state_packed), 
GFP_KERNEL);
+   if (!state)
+   goto err_desc_state;
+
+   memset(state, 0, num * sizeof(struct vring_desc_state_packed));
+
+   extra = vring_alloc_desc_extra(num);
+   if (!extra)
+   goto err_desc_extra;
+
+   vring->desc_state = state;
+   vring->desc_extra = extra;
+
+   return 0;
+
+err_desc_extra:
+   kfree(state);
+err_desc_state:
+   return -ENOMEM;
+}
+
 static struct virtqueue *vring_create_virtqueue_packed(
unsigned int index,
unsigned int num,
@@ -1916,6 +1943,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 {
struct vring_virtqueue_packed vring = {};
struct vring_virtqueue *vq;
+   int err;
 
if (vring_alloc_queue_packed(&vring, vdev, num))
goto err_ring;
@@ -1955,21 +1983,15 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->packed.event_flags_shadow = 0;
vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL;
 
-   vq->packed.desc_state = kmalloc_array(num,
-   sizeof(struct vring_desc_state_packed),
-   GFP_KERNEL);
-   if (!vq->packed.desc_state)
-   goto err_desc_state;
+   err = vring_alloc_state_extra_packed(&vring);
+   if (err)
+   goto err_state_extra;
 
-   memset(vq->packed.desc_state, 0,
-   num * sizeof(struct vring_desc_state_packed));
+   vq->packed.desc_state = vring.desc_state;
+   vq->packed.desc_extra = vring.desc_extra;
 
virtqueue_init(vq, num);
 
-   vq->packed.desc_extra = vring_alloc_desc_extra(num);
-   if (!vq->packed.desc_extra)
-   goto err_desc_extra;
-
/* No callback?  Tell other side not to bother us. */
if (!callback) {
vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
@@ -1982,9 +2004,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
spin_unlock(&vdev->vqs_list_lock);
return &vq->vq;
 
-err_desc_extra:
-   kfree(vq->packed.desc_state);
-err_desc_state:
+err_state_extra:
kfree(vq);
 err_vq:
vring_free_packed(&vring, vdev);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 16/40] virtio_ring: packed: extract the logic of alloc queue

2022-06-28 Thread Xuan Zhuo
Separate the logic of packed to create vring queue.

For the convenience of passing parameters, add a structure
vring_packed.

This feature is required for subsequent virtuqueue reset vring.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 80 +++-
 1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4f497b6f2d04..891257d9cdf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1851,19 +1851,10 @@ static void vring_free_packed(struct 
vring_virtqueue_packed *vring,
kfree(vring->desc_extra);
 }
 
-static struct virtqueue *vring_create_virtqueue_packed(
-   unsigned int index,
-   unsigned int num,
-   unsigned int vring_align,
-   struct virtio_device *vdev,
-   bool weak_barriers,
-   bool may_reduce_num,
-   bool context,
-   bool (*notify)(struct virtqueue *),
-   void (*callback)(struct virtqueue *),
-   const char *name)
+static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring,
+   struct virtio_device *vdev,
+   u32 num)
 {
-   struct vring_virtqueue *vq;
struct vring_packed_desc *ring;
struct vring_packed_desc_event *driver, *device;
dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr;
@@ -1875,7 +1866,11 @@ static struct virtqueue *vring_create_virtqueue_packed(
 &ring_dma_addr,
 GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
if (!ring)
-   goto err_ring;
+   goto err;
+
+   vring->vring.desc = ring;
+   vring->ring_dma_addr  = ring_dma_addr;
+   vring->ring_size_in_bytes = ring_size_in_bytes;
 
event_size_in_bytes = sizeof(struct vring_packed_desc_event);
 
@@ -1883,13 +1878,47 @@ static struct virtqueue *vring_create_virtqueue_packed(
   &driver_event_dma_addr,
   GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
if (!driver)
-   goto err_driver;
+   goto err;
+
+   vring->vring.driver  = driver;
+   vring->event_size_in_bytes   = event_size_in_bytes;
+   vring->driver_event_dma_addr = driver_event_dma_addr;
 
device = vring_alloc_queue(vdev, event_size_in_bytes,
   &device_event_dma_addr,
   GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
if (!device)
-   goto err_device;
+   goto err;
+
+   vring->vring.device  = device;
+   vring->device_event_dma_addr = device_event_dma_addr;
+
+   vring->vring.num= num;
+
+   return 0;
+
+err:
+   vring_free_packed(vring, vdev);
+   return -ENOMEM;
+}
+
+static struct virtqueue *vring_create_virtqueue_packed(
+   unsigned int index,
+   unsigned int num,
+   unsigned int vring_align,
+   struct virtio_device *vdev,
+   bool weak_barriers,
+   bool may_reduce_num,
+   bool context,
+   bool (*notify)(struct virtqueue *),
+   void (*callback)(struct virtqueue *),
+   const char *name)
+{
+   struct vring_virtqueue_packed vring = {};
+   struct vring_virtqueue *vq;
+
+   if (vring_alloc_queue_packed(&vring, vdev, num))
+   goto err_ring;
 
vq = kmalloc(sizeof(*vq), GFP_KERNEL);
if (!vq)
@@ -1912,17 +1941,14 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
 
-   vq->packed.ring_dma_addr = ring_dma_addr;
-   vq->packed.driver_event_dma_addr = driver_event_dma_addr;
-   vq->packed.device_event_dma_addr = device_event_dma_addr;
+   vq->packed.ring_dma_addr = vring.ring_dma_addr;
+   vq->packed.driver_event_dma_addr = vring.driver_event_dma_addr;
+   vq->packed.device_event_dma_addr = vring.device_event_dma_addr;
 
-   vq->packed.ring_size_in_bytes = ring_size_in_bytes;
-   vq->packed.event_size_in_bytes = event_size_in_bytes;
+   vq->packed.ring_size_in_bytes = vring.ring_size_in_bytes;
+   vq->packed.event_size_in_bytes = vring.event_size_in_bytes;
 
-   vq->packed.vring.num = num;
-   vq->packed.vring.desc = ring;
-   vq->packed.vring.driver = driver;
-   vq->packed.vring.device = device;
+   vq->packed.vring = vring.vring;
 
vq->packed.next_avail_idx = 0;
vq->packed.avail_wrap_counter = 1;
@@ -1961,11 +1987,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 err_desc_state:
kfree(vq);
 err_vq:
-   vring_free_queue(vdev, event_size_in_bytes, device, 
device_event_dma_addr);
-err_device:
-   vring_free_queue(vdev, event_size_in_bytes, driver, 
driver_event_dma_addr);
-err_driver:
-   vring_free_queue(vde

[PATCH v11 14/40] virtio_ring: split: introduce virtqueue_resize_split()

2022-06-28 Thread Xuan Zhuo
virtio ring split supports resize.

Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.

In the case of an error, re-initialize(virtqueue_reinit_split()) the
virtqueue to ensure that the vring can be used.

In addition, vring_align, may_reduce_num are necessary for reallocating
vring, so they are retained for creating vq.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9c83c5e6d5a9..1aaa1e5f9991 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -212,6 +212,7 @@ struct vring_virtqueue {
 };
 
 static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num);
+static void vring_free(struct virtqueue *_vq);
 
 /*
  * Helpers.
@@ -1114,6 +1115,37 @@ static struct virtqueue *vring_create_virtqueue_split(
return vq;
 }
 
+static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_virtqueue_split vring = {};
+   struct virtio_device *vdev = _vq->vdev;
+   int err;
+
+   err = vring_alloc_queue_split(&vring, vdev, num, vq->split.vring_align,
+ vq->split.may_reduce_num);
+   if (err)
+   goto err;
+
+   err = vring_alloc_state_extra_split(&vring);
+   if (err) {
+   vring_free_split(&vring, vdev);
+   goto err;
+   }
+
+   vring_free(&vq->vq);
+
+   virtqueue_init(vq, vring.vring.num);
+   virtqueue_vring_attach_split(vq, &vring);
+   virtqueue_vring_init_split(vq);
+
+   return 0;
+
+err:
+   virtqueue_reinit_split(vq);
+   return -ENOMEM;
+}
+
 
 /*
  * Packed ring specific functions - *_packed().
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 18/40] virtio_ring: packed: extract the logic of attach vring

2022-06-28 Thread Xuan Zhuo
Separate the logic of attach vring, the subsequent patch will call it
separately.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0c4109eb6c6c..91ac99f99bff 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1929,6 +1929,22 @@ static int vring_alloc_state_extra_packed(struct 
vring_virtqueue_packed *vring)
return -ENOMEM;
 }
 
+static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq,
+ struct vring_virtqueue_packed *vring)
+{
+   vq->packed.ring_dma_addr = vring->ring_dma_addr;
+   vq->packed.driver_event_dma_addr = vring->driver_event_dma_addr;
+   vq->packed.device_event_dma_addr = vring->device_event_dma_addr;
+
+   vq->packed.ring_size_in_bytes = vring->ring_size_in_bytes;
+   vq->packed.event_size_in_bytes = vring->event_size_in_bytes;
+
+   vq->packed.vring = vring->vring;
+
+   vq->packed.desc_state = vring->desc_state;
+   vq->packed.desc_extra = vring->desc_extra;
+}
+
 static struct virtqueue *vring_create_virtqueue_packed(
unsigned int index,
unsigned int num,
@@ -1969,15 +1985,6 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
 
-   vq->packed.ring_dma_addr = vring.ring_dma_addr;
-   vq->packed.driver_event_dma_addr = vring.driver_event_dma_addr;
-   vq->packed.device_event_dma_addr = vring.device_event_dma_addr;
-
-   vq->packed.ring_size_in_bytes = vring.ring_size_in_bytes;
-   vq->packed.event_size_in_bytes = vring.event_size_in_bytes;
-
-   vq->packed.vring = vring.vring;
-
vq->packed.next_avail_idx = 0;
vq->packed.avail_wrap_counter = 1;
vq->packed.event_flags_shadow = 0;
@@ -1987,10 +1994,8 @@ static struct virtqueue *vring_create_virtqueue_packed(
if (err)
goto err_state_extra;
 
-   vq->packed.desc_state = vring.desc_state;
-   vq->packed.desc_extra = vring.desc_extra;
-
virtqueue_init(vq, num);
+   virtqueue_vring_attach_packed(vq, &vring);
 
/* No callback?  Tell other side not to bother us. */
if (!callback) {
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 19/40] virtio_ring: packed: extract the logic of vring init

2022-06-28 Thread Xuan Zhuo
Separate the logic of initializing vring, and subsequent patches will
call it separately.

This function completes the variable initialization of packed vring. It
together with the logic of atatch constitutes the initialization of
vring.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 28 
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 91ac99f99bff..2f58266539eb 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1945,6 +1945,21 @@ static void virtqueue_vring_attach_packed(struct 
vring_virtqueue *vq,
vq->packed.desc_extra = vring->desc_extra;
 }
 
+static void virtqueue_vring_init_packed(struct vring_virtqueue *vq)
+{
+   vq->packed.next_avail_idx = 0;
+   vq->packed.avail_wrap_counter = 1;
+   vq->packed.event_flags_shadow = 0;
+   vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL;
+
+   /* No callback?  Tell other side not to bother us. */
+   if (!vq->vq.callback) {
+   vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
+   vq->packed.vring.driver->flags =
+   cpu_to_le16(vq->packed.event_flags_shadow);
+   }
+}
+
 static struct virtqueue *vring_create_virtqueue_packed(
unsigned int index,
unsigned int num,
@@ -1985,24 +2000,13 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
!context;
 
-   vq->packed.next_avail_idx = 0;
-   vq->packed.avail_wrap_counter = 1;
-   vq->packed.event_flags_shadow = 0;
-   vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL;
-
err = vring_alloc_state_extra_packed(&vring);
if (err)
goto err_state_extra;
 
virtqueue_init(vq, num);
virtqueue_vring_attach_packed(vq, &vring);
-
-   /* No callback?  Tell other side not to bother us. */
-   if (!callback) {
-   vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE;
-   vq->packed.vring.driver->flags =
-   cpu_to_le16(vq->packed.event_flags_shadow);
-   }
+   virtqueue_vring_init_packed(vq);
 
spin_lock(&vdev->vqs_list_lock);
list_add_tail(&vq->vq.list, &vdev->vqs);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 20/40] virtio_ring: packed: introduce virtqueue_reinit_packed()

2022-06-28 Thread Xuan Zhuo
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.

Subsequent patches will call this function after reset vq to
reinitialize vq.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2f58266539eb..650f701a5480 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1960,6 +1960,27 @@ static void virtqueue_vring_init_packed(struct 
vring_virtqueue *vq)
}
 }
 
+static void virtqueue_reinit_packed(struct vring_virtqueue *vq)
+{
+   int size, i;
+
+   memset(vq->packed.vring.device, 0, vq->packed.event_size_in_bytes);
+   memset(vq->packed.vring.driver, 0, vq->packed.event_size_in_bytes);
+   memset(vq->packed.vring.desc, 0, vq->packed.ring_size_in_bytes);
+
+   size = sizeof(struct vring_desc_state_packed) * vq->packed.vring.num;
+   memset(vq->packed.desc_state, 0, size);
+
+   size = sizeof(struct vring_desc_extra) * vq->packed.vring.num;
+   memset(vq->packed.desc_extra, 0, size);
+
+   for (i = 0; i < vq->packed.vring.num - 1; i++)
+   vq->packed.desc_extra[i].next = i + 1;
+
+   virtqueue_init(vq, vq->packed.vring.num);
+   virtqueue_vring_init_packed(vq);
+}
+
 static struct virtqueue *vring_create_virtqueue_packed(
unsigned int index,
unsigned int num,
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 21/40] virtio_ring: packed: introduce virtqueue_resize_packed()

2022-06-28 Thread Xuan Zhuo
virtio ring packed supports resize.

Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.

In the case of an error, re-initialize(by virtqueue_reinit_packed()) the
virtqueue to ensure that the vring can be used.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 650f701a5480..4860787286db 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2042,6 +2042,35 @@ static struct virtqueue *vring_create_virtqueue_packed(
return NULL;
 }
 
+static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
+{
+   struct vring_virtqueue_packed vring = {};
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = _vq->vdev;
+   int err;
+
+   if (vring_alloc_queue_packed(&vring, vdev, num))
+   goto err_ring;
+
+   err = vring_alloc_state_extra_packed(&vring);
+   if (err)
+   goto err_state_extra;
+
+   vring_free(&vq->vq);
+
+   virtqueue_init(vq, vring.vring.num);
+   virtqueue_vring_attach_packed(vq, &vring);
+   virtqueue_vring_init_packed(vq);
+
+   return 0;
+
+err_state_extra:
+   vring_free_packed(&vring, vdev);
+err_ring:
+   virtqueue_reinit_packed(vq);
+   return -ENOMEM;
+}
+
 
 /*
  * Generic functions and exported symbols.
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 23/40] virtio_pci: move struct virtio_pci_common_cfg to virtio_pci_modern.h

2022-06-28 Thread Xuan Zhuo
In order to facilitate the expansion of virtio_pci_common_cfg in the
future, move it from uapi to virtio_pci_modern.h. In this way, we can
freely expand virtio_pci_common_cfg in the future.

Other projects using virtio_pci_common_cfg in uapi need to maintain a
separate virtio_pci_common_cfg or use the offset macro defined in uapi.

Signed-off-by: Xuan Zhuo 
---
 include/linux/virtio_pci_modern.h | 26 ++
 include/uapi/linux/virtio_pci.h   | 26 --
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index eb2bd9b4077d..c4f7ffbacb4e 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -5,6 +5,32 @@
 #include 
 #include 
 
+/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
+struct virtio_pci_common_cfg {
+   /* About the whole device. */
+   __le32 device_feature_select;   /* read-write */
+   __le32 device_feature;  /* read-only */
+   __le32 guest_feature_select;/* read-write */
+   __le32 guest_feature;   /* read-write */
+   __le16 msix_config; /* read-write */
+   __le16 num_queues;  /* read-only */
+   __u8 device_status; /* read-write */
+   __u8 config_generation; /* read-only */
+
+   /* About a specific virtqueue. */
+   __le16 queue_select;/* read-write */
+   __le16 queue_size;  /* read-write, power of 2. */
+   __le16 queue_msix_vector;   /* read-write */
+   __le16 queue_enable;/* read-write */
+   __le16 queue_notify_off;/* read-only */
+   __le32 queue_desc_lo;   /* read-write */
+   __le32 queue_desc_hi;   /* read-write */
+   __le32 queue_avail_lo;  /* read-write */
+   __le32 queue_avail_hi;  /* read-write */
+   __le32 queue_used_lo;   /* read-write */
+   __le32 queue_used_hi;   /* read-write */
+};
+
 struct virtio_pci_modern_device {
struct pci_dev *pci_dev;
 
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 3a86f36d7e3d..247ec42af2c8 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -140,32 +140,6 @@ struct virtio_pci_notify_cap {
__le32 notify_off_multiplier;   /* Multiplier for queue_notify_off. */
 };
 
-/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
-struct virtio_pci_common_cfg {
-   /* About the whole device. */
-   __le32 device_feature_select;   /* read-write */
-   __le32 device_feature;  /* read-only */
-   __le32 guest_feature_select;/* read-write */
-   __le32 guest_feature;   /* read-write */
-   __le16 msix_config; /* read-write */
-   __le16 num_queues;  /* read-only */
-   __u8 device_status; /* read-write */
-   __u8 config_generation; /* read-only */
-
-   /* About a specific virtqueue. */
-   __le16 queue_select;/* read-write */
-   __le16 queue_size;  /* read-write, power of 2. */
-   __le16 queue_msix_vector;   /* read-write */
-   __le16 queue_enable;/* read-write */
-   __le16 queue_notify_off;/* read-only */
-   __le32 queue_desc_lo;   /* read-write */
-   __le32 queue_desc_hi;   /* read-write */
-   __le32 queue_avail_lo;  /* read-write */
-   __le32 queue_avail_hi;  /* read-write */
-   __le32 queue_used_lo;   /* read-write */
-   __le32 queue_used_hi;   /* read-write */
-};
-
 /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
 struct virtio_pci_cfg_cap {
struct virtio_pci_cap cap;
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 24/40] virtio_pci: struct virtio_pci_common_cfg add queue_notify_data

2022-06-28 Thread Xuan Zhuo
Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
here https://github.com/oasis-tcs/virtio-spec/issues/89

Since I want to add queue_reset after queue_notify_data, I submitted
this patch first.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 include/linux/virtio_pci_modern.h | 2 ++
 include/uapi/linux/virtio_pci.h   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index c4f7ffbacb4e..9f31dde46f57 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -29,6 +29,8 @@ struct virtio_pci_common_cfg {
__le32 queue_avail_hi;  /* read-write */
__le32 queue_used_lo;   /* read-write */
__le32 queue_used_hi;   /* read-write */
+   __le16 queue_notify_data;   /* read-write */
+   __le16 padding;
 };
 
 struct virtio_pci_modern_device {
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 247ec42af2c8..748b3eb62d2f 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -176,6 +176,7 @@ struct virtio_pci_cfg_cap {
 #define VIRTIO_PCI_COMMON_Q_AVAILHI44
 #define VIRTIO_PCI_COMMON_Q_USEDLO 48
 #define VIRTIO_PCI_COMMON_Q_USEDHI 52
+#define VIRTIO_PCI_COMMON_Q_NDATA  56
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 22/40] virtio_ring: introduce virtqueue_resize()

2022-06-28 Thread Xuan Zhuo
Introduce virtqueue_resize() to implement the resize of vring.
Based on these, the driver can dynamically adjust the size of the vring.
For example: ethtool -G.

virtqueue_resize() implements resize based on the vq reset function. In
case of failure to allocate a new vring, it will give up resize and use
the original vring.

During this process, if the re-enable reset vq fails, the vq can no
longer be used. Although the probability of this situation is not high.

The parameter recycle is used to recycle the buffer that is no longer
used.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 72 
 include/linux/virtio.h   |  3 ++
 2 files changed, 75 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4860787286db..5ec43607cc15 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2542,6 +2542,78 @@ struct virtqueue *vring_create_virtqueue(
 }
 EXPORT_SYMBOL_GPL(vring_create_virtqueue);
 
+/**
+ * virtqueue_resize - resize the vring of vq
+ * @_vq: the struct virtqueue we're talking about.
+ * @num: new ring num
+ * @recycle: callback for recycle the useless buffer
+ *
+ * When it is really necessary to create a new vring, it will set the current 
vq
+ * into the reset state. Then call the passed callback to recycle the buffer
+ * that is no longer used. Only after the new vring is successfully created, 
the
+ * old vring will be released.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -ENOMEM: Failed to allocate a new ring, fall back to the original ring size.
+ *  vq can still work normally
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -E2BIG/-EINVAL: num error
+ * -EPERM: Operation not permitted
+ *
+ */
+int virtqueue_resize(struct virtqueue *_vq, u32 num,
+void (*recycle)(struct virtqueue *vq, void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+   bool packed;
+   void *buf;
+   int err;
+
+   if (!vq->we_own_ring)
+   return -EPERM;
+
+   if (num > vq->vq.num_max)
+   return -E2BIG;
+
+   if (!num)
+   return -EINVAL;
+
+   packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED) ? true : false;
+
+   if ((packed ? vq->packed.vring.num : vq->split.vring.num) == num)
+   return 0;
+
+   if (!vdev->config->reset_vq)
+   return -ENOENT;
+
+   if (!vdev->config->enable_reset_vq)
+   return -ENOENT;
+
+   err = vdev->config->reset_vq(_vq);
+   if (err)
+   return err;
+
+   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+   recycle(_vq, buf);
+
+   if (packed)
+   err = virtqueue_resize_packed(_vq, num);
+   else
+   err = virtqueue_resize_split(_vq, num);
+
+   if (vdev->config->enable_reset_vq(_vq))
+   return -EBUSY;
+
+   return err;
+}
+EXPORT_SYMBOL_GPL(virtqueue_resize);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index a82620032e43..1272566adec6 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -91,6 +91,9 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq);
 dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq);
 dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq);
 
+int virtqueue_resize(struct virtqueue *vq, u32 num,
+void (*recycle)(struct virtqueue *vq, void *buf));
+
 /**
  * virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 25/40] virtio: allow to unbreak/break virtqueue individually

2022-06-28 Thread Xuan Zhuo
This patch allows the new introduced
__virtqueue_break()/__virtqueue_unbreak() to break/unbreak the
virtqueue.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 24 
 include/linux/virtio.h   |  3 +++
 2 files changed, 27 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5ec43607cc15..7b02be7fce67 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2744,6 +2744,30 @@ unsigned int virtqueue_get_vring_size(struct virtqueue 
*_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
+/*
+ * This function should only be called by the core, not directly by the driver.
+ */
+void __virtqueue_break(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+   WRITE_ONCE(vq->broken, true);
+}
+EXPORT_SYMBOL_GPL(__virtqueue_break);
+
+/*
+ * This function should only be called by the core, not directly by the driver.
+ */
+void __virtqueue_unbreak(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   /* Pairs with READ_ONCE() in virtqueue_is_broken(). */
+   WRITE_ONCE(vq->broken, false);
+}
+EXPORT_SYMBOL_GPL(__virtqueue_unbreak);
+
 bool virtqueue_is_broken(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 1272566adec6..dc474a0d48d1 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -138,6 +138,9 @@ bool is_virtio_device(struct device *dev);
 void virtio_break_device(struct virtio_device *dev);
 void __virtio_unbreak_device(struct virtio_device *dev);
 
+void __virtqueue_break(struct virtqueue *_vq);
+void __virtqueue_unbreak(struct virtqueue *_vq);
+
 void virtio_config_changed(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
 int virtio_device_freeze(struct virtio_device *dev);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 26/40] virtio: queue_reset: add VIRTIO_F_RING_RESET

2022-06-28 Thread Xuan Zhuo
Added VIRTIO_F_RING_RESET, it came from here

https://github.com/oasis-tcs/virtio-spec/issues/124
https://github.com/oasis-tcs/virtio-spec/issues/139

This feature indicates that the driver can reset a queue individually.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 include/uapi/linux/virtio_config.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h 
b/include/uapi/linux/virtio_config.h
index f0fb0ae021c0..3c05162bc988 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -52,7 +52,7 @@
  * rest are per-device feature bits.
  */
 #define VIRTIO_TRANSPORT_F_START   28
-#define VIRTIO_TRANSPORT_F_END 38
+#define VIRTIO_TRANSPORT_F_END 41
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -98,4 +98,9 @@
  * Does the device support Single Root I/O Virtualization?
  */
 #define VIRTIO_F_SR_IOV37
+
+/*
+ * This feature indicates that the driver can reset a queue individually.
+ */
+#define VIRTIO_F_RING_RESET40
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 28/40] virtio_pci: introduce helper to get/set queue reset

2022-06-28 Thread Xuan Zhuo
Introduce new helpers to implement queue reset and get queue reset
status.

 https://github.com/oasis-tcs/virtio-spec/issues/124
 https://github.com/oasis-tcs/virtio-spec/issues/139

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_pci_modern_dev.c | 35 ++
 include/linux/virtio_pci_modern.h  |  2 ++
 2 files changed, 37 insertions(+)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index fa2a9445bb18..07415654247c 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -3,6 +3,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * vp_modern_map_capability - map a part of virtio pci capability
@@ -474,6 +475,40 @@ void vp_modern_set_status(struct virtio_pci_modern_device 
*mdev,
 }
 EXPORT_SYMBOL_GPL(vp_modern_set_status);
 
+/*
+ * vp_modern_get_queue_reset - get the queue reset status
+ * @mdev: the modern virtio-pci device
+ * @index: queue index
+ */
+int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index)
+{
+   struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+
+   vp_iowrite16(index, &cfg->queue_select);
+   return vp_ioread16(&cfg->queue_reset);
+}
+EXPORT_SYMBOL_GPL(vp_modern_get_queue_reset);
+
+/*
+ * vp_modern_set_queue_reset - reset the queue
+ * @mdev: the modern virtio-pci device
+ * @index: queue index
+ */
+void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 
index)
+{
+   struct virtio_pci_common_cfg __iomem *cfg = mdev->common;
+
+   vp_iowrite16(index, &cfg->queue_select);
+   vp_iowrite16(1, &cfg->queue_reset);
+
+   while (vp_ioread16(&cfg->queue_reset))
+   msleep(1);
+
+   while (vp_ioread16(&cfg->queue_enable))
+   msleep(1);
+}
+EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset);
+
 /*
  * vp_modern_queue_vector - set the MSIX vector for a specific virtqueue
  * @mdev: the modern virtio-pci device
diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index beebc7a4a31d..ded01157f864 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -134,4 +134,6 @@ void __iomem * vp_modern_map_vq_notify(struct 
virtio_pci_modern_device *mdev,
   u16 index, resource_size_t *pa);
 int vp_modern_probe(struct virtio_pci_modern_device *mdev);
 void vp_modern_remove(struct virtio_pci_modern_device *mdev);
+int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 
index);
+void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 
index);
 #endif
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 27/40] virtio_pci: struct virtio_pci_common_cfg add queue_reset

2022-06-28 Thread Xuan Zhuo
Add queue_reset in virtio_pci_common_cfg.

 https://github.com/oasis-tcs/virtio-spec/issues/124
 https://github.com/oasis-tcs/virtio-spec/issues/139

Signed-off-by: Xuan Zhuo 
---
 include/linux/virtio_pci_modern.h | 2 +-
 include/uapi/linux/virtio_pci.h   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index 9f31dde46f57..beebc7a4a31d 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -30,7 +30,7 @@ struct virtio_pci_common_cfg {
__le32 queue_used_lo;   /* read-write */
__le32 queue_used_hi;   /* read-write */
__le16 queue_notify_data;   /* read-write */
-   __le16 padding;
+   __le16 queue_reset; /* read-write */
 };
 
 struct virtio_pci_modern_device {
diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h
index 748b3eb62d2f..4f0a8d86cb11 100644
--- a/include/uapi/linux/virtio_pci.h
+++ b/include/uapi/linux/virtio_pci.h
@@ -177,6 +177,7 @@ struct virtio_pci_cfg_cap {
 #define VIRTIO_PCI_COMMON_Q_USEDLO 48
 #define VIRTIO_PCI_COMMON_Q_USEDHI 52
 #define VIRTIO_PCI_COMMON_Q_NDATA  56
+#define VIRTIO_PCI_COMMON_Q_RESET  58
 
 #endif /* VIRTIO_PCI_NO_MODERN */
 
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 29/40] virtio_pci: extract the logic of active vq for modern pci

2022-06-28 Thread Xuan Zhuo
Introduce vp_active_vq() to configure vring to backend after vq attach
vring. And configure vq vector if necessary.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_pci_modern.c | 46 ++
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index e7e0b8c850f6..9041d9a41b7d 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -176,6 +176,29 @@ static void vp_reset(struct virtio_device *vdev)
vp_synchronize_vectors(vdev);
 }
 
+static int vp_active_vq(struct virtqueue *vq, u16 msix_vec)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+   struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+   unsigned long index;
+
+   index = vq->index;
+
+   /* activate the queue */
+   vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq));
+   vp_modern_queue_address(mdev, index, virtqueue_get_desc_addr(vq),
+   virtqueue_get_avail_addr(vq),
+   virtqueue_get_used_addr(vq));
+
+   if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
+   msix_vec = vp_modern_queue_vector(mdev, index, msix_vec);
+   if (msix_vec == VIRTIO_MSI_NO_VECTOR)
+   return -EBUSY;
+   }
+
+   return 0;
+}
+
 static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector)
 {
return vp_modern_config_vector(&vp_dev->mdev, vector);
@@ -220,32 +243,19 @@ static struct virtqueue *setup_vq(struct 
virtio_pci_device *vp_dev,
 
vq->num_max = num;
 
-   /* activate the queue */
-   vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq));
-   vp_modern_queue_address(mdev, index, virtqueue_get_desc_addr(vq),
-   virtqueue_get_avail_addr(vq),
-   virtqueue_get_used_addr(vq));
+   err = vp_active_vq(vq, msix_vec);
+   if (err)
+   goto err;
 
vq->priv = (void __force *)vp_modern_map_vq_notify(mdev, index, NULL);
if (!vq->priv) {
err = -ENOMEM;
-   goto err_map_notify;
-   }
-
-   if (msix_vec != VIRTIO_MSI_NO_VECTOR) {
-   msix_vec = vp_modern_queue_vector(mdev, index, msix_vec);
-   if (msix_vec == VIRTIO_MSI_NO_VECTOR) {
-   err = -EBUSY;
-   goto err_assign_vector;
-   }
+   goto err;
}
 
return vq;
 
-err_assign_vector:
-   if (!mdev->notify_base)
-   pci_iounmap(mdev->pci_dev, (void __iomem __force *)vq->priv);
-err_map_notify:
+err:
vring_del_virtqueue(vq);
return ERR_PTR(err);
 }
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 30/40] virtio_pci: support VIRTIO_F_RING_RESET

2022-06-28 Thread Xuan Zhuo
This patch implements virtio pci support for QUEUE RESET.

Performing reset on a queue is divided into these steps:

 1. notify the device to reset the queue
 2. recycle the buffer submitted
 3. reset the vring (may re-alloc)
 4. mmap vring to device, and enable the queue

This patch implements virtio_reset_vq(), virtio_enable_resetq() in the
pci scenario.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_pci_common.c | 12 +++-
 drivers/virtio/virtio_pci_modern.c | 96 ++
 drivers/virtio/virtio_ring.c   |  2 +
 include/linux/virtio.h |  1 +
 4 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index ca51fcc9daab..ad258a9d3b9f 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -214,9 +214,15 @@ static void vp_del_vq(struct virtqueue *vq)
struct virtio_pci_vq_info *info = vp_dev->vqs[vq->index];
unsigned long flags;
 
-   spin_lock_irqsave(&vp_dev->lock, flags);
-   list_del(&info->node);
-   spin_unlock_irqrestore(&vp_dev->lock, flags);
+   /*
+* If it fails during re-enable reset vq. This way we won't rejoin
+* info->node to the queue. Prevent unexpected irqs.
+*/
+   if (!vq->reset) {
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_del(&info->node);
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+   }
 
vp_dev->del_vq(info);
kfree(info);
diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 9041d9a41b7d..754e5e10386b 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -34,6 +34,9 @@ static void vp_transport_features(struct virtio_device *vdev, 
u64 features)
if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
__virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
+
+   if (features & BIT_ULL(VIRTIO_F_RING_RESET))
+   __virtio_set_bit(vdev, VIRTIO_F_RING_RESET);
 }
 
 /* virtio config->finalize_features() implementation */
@@ -199,6 +202,95 @@ static int vp_active_vq(struct virtqueue *vq, u16 msix_vec)
return 0;
 }
 
+static int vp_modern_reset_vq(struct virtqueue *vq)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+   struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+   struct virtio_pci_vq_info *info;
+   unsigned long flags;
+
+   if (!virtio_has_feature(vq->vdev, VIRTIO_F_RING_RESET))
+   return -ENOENT;
+
+   vp_modern_set_queue_reset(mdev, vq->index);
+
+   info = vp_dev->vqs[vq->index];
+
+   /* delete vq from irq handler */
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_del(&info->node);
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+   INIT_LIST_HEAD(&info->node);
+
+   /* For the case where vq has an exclusive irq, to prevent the irq from
+* being received again and the pending irq, call synchronize_irq(), and
+* break it.
+*
+* We can't use disable_irq() since it conflicts with the affinity
+* managed IRQ that is used by some drivers. So this is done on top of
+* IRQ hardening.
+*
+* In the scenario based on shared interrupts, vq will be searched from
+* the queue virtqueues. Since the previous list_del() has been deleted
+* from the queue, it is impossible for vq to be called in this case.
+* There is no need to close the corresponding interrupt.
+*/
+   if (vp_dev->per_vq_vectors && info->msix_vector != 
VIRTIO_MSI_NO_VECTOR) {
+#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION
+   __virtqueue_break(vq);
+#endif
+   synchronize_irq(pci_irq_vector(vp_dev->pci_dev, 
info->msix_vector));
+   }
+
+   vq->reset = true;
+
+   return 0;
+}
+
+static int vp_modern_enable_reset_vq(struct virtqueue *vq)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+   struct virtio_pci_modern_device *mdev = &vp_dev->mdev;
+   struct virtio_pci_vq_info *info;
+   unsigned long flags, index;
+   int err;
+
+   if (!vq->reset)
+   return -EBUSY;
+
+   index = vq->index;
+   info = vp_dev->vqs[index];
+
+   if (vp_modern_get_queue_reset(mdev, index))
+   return -EBUSY;
+
+   if (vp_modern_get_queue_enable(mdev, index))
+   return -EBUSY;
+
+   err = vp_active_vq(vq, info->msix_vector);
+   if (err)
+   return err;
+
+   if (vq->callback) {
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_add(&info->node, &vp_dev->virtqueues);
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+   } else {
+   INIT_LIST_HEAD(&info->node);
+   }
+
+#ifdef CONFIG_VIRT

[PATCH v11 31/40] virtio: find_vqs() add arg sizes

2022-06-28 Thread Xuan Zhuo
find_vqs() adds a new parameter sizes to specify the size of each vq
vring.

NULL as sizes means that all queues in find_vqs() use the maximum size.
A value in the array is 0, which means that the corresponding queue uses
the maximum size.

In the split scenario, the meaning of size is the largest size, because
it may be limited by memory, the virtio core will try a smaller size.
And the size is power of 2.

Signed-off-by: Xuan Zhuo 
Acked-by: Hans de Goede 
Reviewed-by: Mathieu Poirier 
---
 arch/um/drivers/virtio_uml.c |  2 +-
 drivers/platform/mellanox/mlxbf-tmfifo.c |  1 +
 drivers/remoteproc/remoteproc_virtio.c   |  1 +
 drivers/s390/virtio/virtio_ccw.c |  1 +
 drivers/virtio/virtio_mmio.c |  1 +
 drivers/virtio/virtio_pci_common.c   |  2 +-
 drivers/virtio/virtio_pci_common.h   |  2 +-
 drivers/virtio/virtio_pci_modern.c   |  7 +--
 drivers/virtio/virtio_vdpa.c |  1 +
 include/linux/virtio_config.h| 14 +-
 10 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index e719af8bdf56..79e38afd4b91 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -1011,7 +1011,7 @@ static struct virtqueue *vu_setup_vq(struct virtio_device 
*vdev,
 
 static int vu_find_vqs(struct virtio_device *vdev, unsigned nvqs,
   struct virtqueue *vqs[], vq_callback_t *callbacks[],
-  const char * const names[], const bool *ctx,
+  const char * const names[], u32 sizes[], const bool *ctx,
   struct irq_affinity *desc)
 {
struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev);
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c 
b/drivers/platform/mellanox/mlxbf-tmfifo.c
index 1ae3c56b66b0..8be13d416f48 100644
--- a/drivers/platform/mellanox/mlxbf-tmfifo.c
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -928,6 +928,7 @@ static int mlxbf_tmfifo_virtio_find_vqs(struct 
virtio_device *vdev,
struct virtqueue *vqs[],
vq_callback_t *callbacks[],
const char * const names[],
+   u32 sizes[],
const bool *ctx,
struct irq_affinity *desc)
 {
diff --git a/drivers/remoteproc/remoteproc_virtio.c 
b/drivers/remoteproc/remoteproc_virtio.c
index 0f7706e23eb9..81c4f5776109 100644
--- a/drivers/remoteproc/remoteproc_virtio.c
+++ b/drivers/remoteproc/remoteproc_virtio.c
@@ -158,6 +158,7 @@ static int rproc_virtio_find_vqs(struct virtio_device 
*vdev, unsigned int nvqs,
 struct virtqueue *vqs[],
 vq_callback_t *callbacks[],
 const char * const names[],
+u32 sizes[],
 const bool * ctx,
 struct irq_affinity *desc)
 {
diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c
index 6b86d0280d6b..72500cd2dbf5 100644
--- a/drivers/s390/virtio/virtio_ccw.c
+++ b/drivers/s390/virtio/virtio_ccw.c
@@ -635,6 +635,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, 
unsigned nvqs,
   struct virtqueue *vqs[],
   vq_callback_t *callbacks[],
   const char * const names[],
+  u32 sizes[],
   const bool *ctx,
   struct irq_affinity *desc)
 {
diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index a20d5a6b5819..5e3ba3cc7fd0 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -474,6 +474,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
   struct virtqueue *vqs[],
   vq_callback_t *callbacks[],
   const char * const names[],
+  u32 sizes[],
   const bool *ctx,
   struct irq_affinity *desc)
 {
diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index ad258a9d3b9f..7ad734584823 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -396,7 +396,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
unsigned int nvqs,
 /* the config->find_vqs() implementation */
 int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs,
struct virtqueue *vqs[], vq_callback_t *callbacks[],
-   const char * const names[], const bool *ctx,
+   const char * const names[], u32 sizes[], const bool *ctx,
struct irq_affinity *desc)
 {
int err;
diff --git a/drivers/virti

[PATCH v11 32/40] virtio_pci: support the arg sizes of find_vqs()

2022-06-28 Thread Xuan Zhuo
Virtio PCI supports new parameter sizes of find_vqs().

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_pci_common.c | 18 ++
 drivers/virtio/virtio_pci_common.h |  1 +
 drivers/virtio/virtio_pci_legacy.c |  6 +-
 drivers/virtio/virtio_pci_modern.c | 10 +++---
 4 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 7ad734584823..00ad476a815d 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -174,6 +174,7 @@ static int vp_request_msix_vectors(struct virtio_device 
*vdev, int nvectors,
 static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned int 
index,
 void (*callback)(struct virtqueue *vq),
 const char *name,
+u32 size,
 bool ctx,
 u16 msix_vec)
 {
@@ -186,7 +187,7 @@ static struct virtqueue *vp_setup_vq(struct virtio_device 
*vdev, unsigned int in
if (!info)
return ERR_PTR(-ENOMEM);
 
-   vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, ctx,
+   vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, size, ctx,
  msix_vec);
if (IS_ERR(vq))
goto out_info;
@@ -283,7 +284,7 @@ void vp_del_vqs(struct virtio_device *vdev)
 
 static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
struct virtqueue *vqs[], vq_callback_t *callbacks[],
-   const char * const names[], bool per_vq_vectors,
+   const char * const names[], u32 sizes[], bool per_vq_vectors,
const bool *ctx,
struct irq_affinity *desc)
 {
@@ -326,8 +327,8 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned int nvqs,
else
msix_vec = VP_MSIX_VQ_VECTOR;
vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
-ctx ? ctx[i] : false,
-msix_vec);
+sizes ? sizes[i] : 0,
+ctx ? ctx[i] : false, msix_vec);
if (IS_ERR(vqs[i])) {
err = PTR_ERR(vqs[i]);
goto error_find;
@@ -357,7 +358,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned int nvqs,
 
 static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
struct virtqueue *vqs[], vq_callback_t *callbacks[],
-   const char * const names[], const bool *ctx)
+   const char * const names[], u32 sizes[], const bool *ctx)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
int i, err, queue_idx = 0;
@@ -379,6 +380,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
unsigned int nvqs,
continue;
}
vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
+sizes ? sizes[i] : 0,
 ctx ? ctx[i] : false,
 VIRTIO_MSI_NO_VECTOR);
if (IS_ERR(vqs[i])) {
@@ -402,15 +404,15 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int 
nvqs,
int err;
 
/* Try MSI-X with one vector per queue. */
-   err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, true, ctx, 
desc);
+   err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, sizes, true, 
ctx, desc);
if (!err)
return 0;
/* Fallback: MSI-X with one vector for config, one shared for queues. */
-   err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, false, ctx, 
desc);
+   err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, sizes, false, 
ctx, desc);
if (!err)
return 0;
/* Finally fall back to regular interrupts. */
-   return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx);
+   return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, sizes, ctx);
 }
 
 const char *vp_bus_name(struct virtio_device *vdev)
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index a5ff838b85a5..c0448378b698 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -80,6 +80,7 @@ struct virtio_pci_device {
  unsigned int idx,
  void (*callback)(struct virtqueue *vq),
  const char *name,
+ u32 size,
  bool ctx,
  u16 msix_vec);
void (*del_vq)(struct virtio_pci_vq_

[PATCH v11 33/40] virtio_mmio: support the arg sizes of find_vqs()

2022-06-28 Thread Xuan Zhuo
Virtio MMIO support the new parameter sizes of find_vqs().

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_mmio.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 5e3ba3cc7fd0..c888fee18caf 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -360,7 +360,7 @@ static void vm_synchronize_cbs(struct virtio_device *vdev)
 
 static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int 
index,
  void (*callback)(struct virtqueue *vq),
- const char *name, bool ctx)
+ const char *name, u32 size, bool ctx)
 {
struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev);
struct virtio_mmio_vq_info *info;
@@ -395,8 +395,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev, unsigned int in
goto error_new_virtqueue;
}
 
+   if (!size || size > num)
+   size = num;
+
/* Create the vring */
-   vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, vdev,
+   vq = vring_create_virtqueue(index, size, VIRTIO_MMIO_VRING_ALIGN, vdev,
 true, true, ctx, vm_notify, callback, name);
if (!vq) {
err = -ENOMEM;
@@ -497,6 +500,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned 
int nvqs,
}
 
vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i],
+sizes ? sizes[i] : 0,
 ctx ? ctx[i] : false);
if (IS_ERR(vqs[i])) {
vm_del_vqs(vdev);
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 34/40] virtio: add helper virtio_find_vqs_ctx_size()

2022-06-28 Thread Xuan Zhuo
Introduce helper virtio_find_vqs_ctx_size() to call find_vqs and specify
the maximum size of each vq ring.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 include/linux/virtio_config.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index 31b04ac8284b..aa6cdf353748 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -239,6 +239,18 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, 
unsigned nvqs,
  ctx, desc);
 }
 
+static inline
+int virtio_find_vqs_ctx_size(struct virtio_device *vdev, u32 nvqs,
+struct virtqueue *vqs[],
+vq_callback_t *callbacks[],
+const char * const names[],
+u32 sizes[],
+const bool *ctx, struct irq_affinity *desc)
+{
+   return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, sizes,
+ ctx, desc);
+}
+
 /**
  * virtio_synchronize_cbs - synchronize with virtqueue callbacks
  * @vdev: the device
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 35/40] virtio_net: set the default max ring size by find_vqs()

2022-06-28 Thread Xuan Zhuo
Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx,
rx at the same time.

 | rx/tx ring size
---
speed == UNKNOWN or < 10G| 1024
speed < 40G  | 4096
speed >= 40G | 8192

Call virtnet_update_settings() once before calling init_vqs() to update
speed.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/net/virtio_net.c | 42 
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8a5810bcb839..40532ecbe7fc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3208,6 +3208,29 @@ static unsigned int mergeable_min_buf_len(struct 
virtnet_info *vi, struct virtqu
   (unsigned int)GOOD_PACKET_LEN);
 }
 
+static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes)
+{
+   u32 i, rx_size, tx_size;
+
+   if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_1) {
+   rx_size = 1024;
+   tx_size = 1024;
+
+   } else if (vi->speed < SPEED_4) {
+   rx_size = 1024 * 4;
+   tx_size = 1024 * 4;
+
+   } else {
+   rx_size = 1024 * 8;
+   tx_size = 1024 * 8;
+   }
+
+   for (i = 0; i < vi->max_queue_pairs; i++) {
+   sizes[rxq2vq(i)] = rx_size;
+   sizes[txq2vq(i)] = tx_size;
+   }
+}
+
 static int virtnet_find_vqs(struct virtnet_info *vi)
 {
vq_callback_t **callbacks;
@@ -3215,6 +3238,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
int ret = -ENOMEM;
int i, total_vqs;
const char **names;
+   u32 *sizes;
bool *ctx;
 
/* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by
@@ -3242,10 +3266,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
ctx = NULL;
}
 
+   sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL);
+   if (!sizes)
+   goto err_sizes;
+
/* Parameters for control virtqueue, if any */
if (vi->has_cvq) {
callbacks[total_vqs - 1] = NULL;
names[total_vqs - 1] = "control";
+   sizes[total_vqs - 1] = 64;
}
 
/* Allocate/initialize parameters for send/receive virtqueues */
@@ -3260,8 +3289,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
ctx[rxq2vq(i)] = true;
}
 
-   ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
- names, ctx, NULL);
+   virtnet_config_sizes(vi, sizes);
+
+   ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks,
+  names, sizes, ctx, NULL);
if (ret)
goto err_find;
 
@@ -3281,6 +3312,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 
 
 err_find:
+   kfree(sizes);
+err_sizes:
kfree(ctx);
 err_ctx:
kfree(names);
@@ -3630,6 +3663,9 @@ static int virtnet_probe(struct virtio_device *vdev)
vi->curr_queue_pairs = num_online_cpus();
vi->max_queue_pairs = max_queue_pairs;
 
+   virtnet_init_settings(dev);
+   virtnet_update_settings(vi);
+
/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
err = init_vqs(vi);
if (err)
@@ -3642,8 +3678,6 @@ static int virtnet_probe(struct virtio_device *vdev)
netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs);
netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs);
 
-   virtnet_init_settings(dev);
-
if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) {
vi->failover = net_failover_create(vi->dev);
if (IS_ERR(vi->failover)) {
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 37/40] virtio_net: split free_unused_bufs()

2022-06-28 Thread Xuan Zhuo
This patch separates two functions for freeing sq buf and rq buf from
free_unused_bufs().

When supporting the enable/disable tx/rq queue in the future, it is
necessary to support separate recovery of a sq buf or a rq buf.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/net/virtio_net.c | 41 
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 63f990bdc302..9fe222a3663a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3151,6 +3151,27 @@ static void free_receive_page_frags(struct virtnet_info 
*vi)
put_page(vi->rq[i].alloc_frag.page);
 }
 
+static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
+{
+   if (!is_xdp_frame(buf))
+   dev_kfree_skb(buf);
+   else
+   xdp_return_frame(ptr_to_xdp(buf));
+}
+
+static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf)
+{
+   struct virtnet_info *vi = vq->vdev->priv;
+   int i = vq2rxq(vq);
+
+   if (vi->mergeable_rx_bufs)
+   put_page(virt_to_head_page(buf));
+   else if (vi->big_packets)
+   give_pages(&vi->rq[i], buf);
+   else
+   put_page(virt_to_head_page(buf));
+}
+
 static void free_unused_bufs(struct virtnet_info *vi)
 {
void *buf;
@@ -3158,26 +3179,14 @@ static void free_unused_bufs(struct virtnet_info *vi)
 
for (i = 0; i < vi->max_queue_pairs; i++) {
struct virtqueue *vq = vi->sq[i].vq;
-   while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-   if (!is_xdp_frame(buf))
-   dev_kfree_skb(buf);
-   else
-   xdp_return_frame(ptr_to_xdp(buf));
-   }
+   while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
+   virtnet_sq_free_unused_buf(vq, buf);
}
 
for (i = 0; i < vi->max_queue_pairs; i++) {
struct virtqueue *vq = vi->rq[i].vq;
-
-   while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-   if (vi->mergeable_rx_bufs) {
-   put_page(virt_to_head_page(buf));
-   } else if (vi->big_packets) {
-   give_pages(&vi->rq[i], buf);
-   } else {
-   put_page(virt_to_head_page(buf));
-   }
-   }
+   while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
+   virtnet_rq_free_unused_buf(vq, buf);
}
 }
 
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 36/40] virtio_net: get ringparam by virtqueue_get_vring_max_size()

2022-06-28 Thread Xuan Zhuo
Use virtqueue_get_vring_max_size() in virtnet_get_ringparam() to set
tx,rx_max_pending.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/net/virtio_net.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 40532ecbe7fc..63f990bdc302 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2254,10 +2254,10 @@ static void virtnet_get_ringparam(struct net_device 
*dev,
 {
struct virtnet_info *vi = netdev_priv(dev);
 
-   ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0].vq);
-   ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0].vq);
-   ring->rx_pending = ring->rx_max_pending;
-   ring->tx_pending = ring->tx_max_pending;
+   ring->rx_max_pending = virtqueue_get_vring_max_size(vi->rq[0].vq);
+   ring->tx_max_pending = virtqueue_get_vring_max_size(vi->sq[0].vq);
+   ring->rx_pending = virtqueue_get_vring_size(vi->rq[0].vq);
+   ring->tx_pending = virtqueue_get_vring_size(vi->sq[0].vq);
 }
 
 static bool virtnet_commit_rss_command(struct virtnet_info *vi)
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 38/40] virtio_net: support rx queue resize

2022-06-28 Thread Xuan Zhuo
This patch implements the resize function of the rx queues.
Based on this function, it is possible to modify the ring num of the
queue.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9fe222a3663a..6ab16fd193e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -278,6 +278,8 @@ struct padded_vnet_hdr {
char padding[12];
 };
 
+static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
+
 static bool is_xdp_frame(void *ptr)
 {
return (unsigned long)ptr & VIRTIO_XDP_FLAG;
@@ -1846,6 +1848,26 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
struct net_device *dev)
return NETDEV_TX_OK;
 }
 
+static int virtnet_rx_resize(struct virtnet_info *vi,
+struct receive_queue *rq, u32 ring_num)
+{
+   int err, qindex;
+
+   qindex = rq - vi->rq;
+
+   napi_disable(&rq->napi);
+
+   err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf);
+   if (err)
+   netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: 
%d\n", qindex, err);
+
+   if (!try_fill_recv(vi, rq, GFP_KERNEL))
+   schedule_delayed_work(&vi->refill, 0);
+
+   virtnet_napi_enable(rq->vq, &rq->napi);
+   return err;
+}
+
 /*
  * Send command via the control virtqueue and check status.  Commands
  * supported by the hypervisor, as indicated by feature bits, should
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 40/40] virtio_net: support set_ringparam

2022-06-28 Thread Xuan Zhuo
Support set_ringparam based on virtio queue reset.

Users can use ethtool -G eth0  to modify the ring size of
virtio-net.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/net/virtio_net.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fd358462f802..cc554cbac431 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2330,6 +2330,53 @@ static void virtnet_get_ringparam(struct net_device *dev,
ring->tx_pending = virtqueue_get_vring_size(vi->sq[0].vq);
 }
 
+static int virtnet_set_ringparam(struct net_device *dev,
+struct ethtool_ringparam *ring,
+struct kernel_ethtool_ringparam *kernel_ring,
+struct netlink_ext_ack *extack)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   u32 rx_pending, tx_pending;
+   struct receive_queue *rq;
+   struct send_queue *sq;
+   int i, err;
+
+   if (ring->rx_mini_pending || ring->rx_jumbo_pending)
+   return -EINVAL;
+
+   rx_pending = virtqueue_get_vring_size(vi->rq[0].vq);
+   tx_pending = virtqueue_get_vring_size(vi->sq[0].vq);
+
+   if (ring->rx_pending == rx_pending &&
+   ring->tx_pending == tx_pending)
+   return 0;
+
+   if (ring->rx_pending > virtqueue_get_vring_max_size(vi->rq[0].vq))
+   return -EINVAL;
+
+   if (ring->tx_pending > virtqueue_get_vring_max_size(vi->sq[0].vq))
+   return -EINVAL;
+
+   for (i = 0; i < vi->max_queue_pairs; i++) {
+   rq = vi->rq + i;
+   sq = vi->sq + i;
+
+   if (ring->tx_pending != tx_pending) {
+   err = virtnet_tx_resize(vi, sq, ring->tx_pending);
+   if (err)
+   return err;
+   }
+
+   if (ring->rx_pending != rx_pending) {
+   err = virtnet_rx_resize(vi, rq, ring->rx_pending);
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static bool virtnet_commit_rss_command(struct virtnet_info *vi)
 {
struct net_device *dev = vi->dev;
@@ -2817,6 +2864,7 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
.get_drvinfo = virtnet_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_ringparam = virtnet_get_ringparam,
+   .set_ringparam = virtnet_set_ringparam,
.get_strings = virtnet_get_strings,
.get_sset_count = virtnet_get_sset_count,
.get_ethtool_stats = virtnet_get_ethtool_stats,
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v11 39/40] virtio_net: support tx queue resize

2022-06-28 Thread Xuan Zhuo
This patch implements the resize function of the tx queues.
Based on this function, it is possible to modify the ring num of the
queue.

Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6ab16fd193e5..fd358462f802 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -135,6 +135,9 @@ struct send_queue {
struct virtnet_sq_stats stats;
 
struct napi_struct napi;
+
+   /* Record whether sq is in reset state. */
+   bool reset;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -279,6 +282,7 @@ struct padded_vnet_hdr {
 };
 
 static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
+static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
 
 static bool is_xdp_frame(void *ptr)
 {
@@ -1603,6 +1607,11 @@ static void virtnet_poll_cleantx(struct receive_queue 
*rq)
return;
 
if (__netif_tx_trylock(txq)) {
+   if (READ_ONCE(sq->reset)) {
+   __netif_tx_unlock(txq);
+   return;
+   }
+
do {
virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, true);
@@ -1868,6 +1877,45 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
return err;
 }
 
+static int virtnet_tx_resize(struct virtnet_info *vi,
+struct send_queue *sq, u32 ring_num)
+{
+   struct netdev_queue *txq;
+   int err, qindex;
+
+   qindex = sq - vi->sq;
+
+   virtnet_napi_tx_disable(&sq->napi);
+
+   txq = netdev_get_tx_queue(vi->dev, qindex);
+
+   /* 1. wait all ximt complete
+* 2. fix the race of netif_stop_subqueue() vs netif_start_subqueue()
+*/
+   __netif_tx_lock_bh(txq);
+
+   /* Prevent rx poll from accessing sq. */
+   WRITE_ONCE(sq->reset, true);
+
+   /* Prevent the upper layer from trying to send packets. */
+   netif_stop_subqueue(vi->dev, qindex);
+
+   __netif_tx_unlock_bh(txq);
+
+   err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+   if (err)
+   netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: 
%d\n", qindex, err);
+
+   /* Memory barrier before set reset and start subqueue. */
+   smp_mb();
+
+   WRITE_ONCE(sq->reset, false);
+   netif_tx_wake_queue(txq);
+
+   virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
+   return err;
+}
+
 /*
  * Send command via the control virtqueue and check status.  Commands
  * supported by the hypervisor, as indicated by feature bits, should
-- 
2.31.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization