Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members
Hi Gustavo, Thanks for your patch! On Mon, Jun 27, 2022 at 8:04 PM Gustavo A. R. Silva wrote: > There is a regular need in the kernel to provide a way to declare > having a dynamically sized set of trailing elements in a structure. > Kernel code should always use “flexible array members”[1] for these > cases. The older style of one-element or zero-length arrays should > no longer be used[2]. These rules apply to the kernel, but uapi is not considered part of the kernel, so different rules apply. Uapi header files should work with whatever compiler that can be used for compiling userspace. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] vdpa_sim_blk: limit the number of request handled per batch
On Tue, Jun 28, 2022 at 6:01 AM Jason Wang wrote: > > On Thu, Jun 23, 2022 at 4:58 PM Stefano Garzarella > wrote: > > > > On Thu, Jun 23, 2022 at 11:50:22AM +0800, Jason Wang wrote: > > >On Wed, Jun 22, 2022 at 12:09 AM Stefano Garzarella > > >wrote: > > >> > > >> Limit the number of requests (4 per queue as for vdpa_sim_net) handled > > >> in a batch to prevent the worker from using the CPU for too long. > > >> > > >> Suggested-by: Eugenio Pérez > > >> Signed-off-by: Stefano Garzarella > > >> --- > > >> drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 15 ++- > > >> 1 file changed, 14 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > >> b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > >> index a83a5c76f620..ac86478845b6 100644 > > >> --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > >> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > > >> @@ -197,6 +197,7 @@ static bool vdpasim_blk_handle_req(struct vdpasim > > >> *vdpasim, > > >> static void vdpasim_blk_work(struct work_struct *work) > > >> { > > >> struct vdpasim *vdpasim = container_of(work, struct vdpasim, > > >> work); > > >> + bool reschedule = false; > > >> int i; > > >> > > >> spin_lock(&vdpasim->lock); > > >> @@ -206,11 +207,15 @@ static void vdpasim_blk_work(struct work_struct > > >> *work) > > >> > > >> for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) { > > >> struct vdpasim_virtqueue *vq = &vdpasim->vqs[i]; > > >> + bool vq_work = true; > > >> + int reqs = 0; > > >> > > >> if (!vq->ready) > > >> continue; > > >> > > >> - while (vdpasim_blk_handle_req(vdpasim, vq)) { > > >> + while (vq_work) { > > >> + vq_work = vdpasim_blk_handle_req(vdpasim, vq); > > >> + > > > > > >Is it better to check and exit the loop early here? > > > > Maybe, but I'm not sure. > > > > In vdpa_sim_net we call vringh_complete_iotlb() and send notification > > also in the error path, > > Looks not? > > read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov, &ctrl, > sizeof(ctrl)); > if (read != sizeof(ctrl)) > break; > > We break the loop. I was looking at vdpasim_net_work(), but I was confused since it handles 2 queues. I'll break the loop as it was before. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH] virtio-net: fix the race between refill work and close
We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the the source the scheduling of the refill work. This means an NAPI after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] == BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr 88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 38 -- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index db05b5e930be..21bf1e5c81ef 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -251,6 +251,12 @@ struct virtnet_info { /* Does the affinity hint is set for virtqueues? */ bool affinity_hint_set; + /* Is refill work enabled? */ + bool refill_work_enabled; + + /* The lock to synchronize the access to refill_work_enabled */ + spinlock_t refill_lock; + /* CPU hotplug instances for online & dead */ struct hlist_node node; struct hlist_node node_dead; @@ -348,6 +354,20 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) return p; } +static void enable_refill_work(struct virtnet_info *vi) +{ + spin_lock(&vi->refill_lock); + vi->refill_work_enabled = true; + spin_unlock(&vi->refill_lock); +} + +static void disable_refill_work(struct virtnet_info *vi) +{ + spin_lock(&vi->refill_lock); + vi->refill_work_enabled = false; + spin_unlock(&vi->refill_lock); +} + static void virtqueue_napi_schedule(struct napi_struct *napi, struct virtqueue *vq) { @@ -1527,8 +1547,12 @@ static int virtnet_receive(struct receive_queue *rq, int budget, } if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) { - if (!try_fill_recv(vi, rq, GFP_ATOMIC)) - schedule_delayed_work(&vi->refill, 0); + if (!try_fill_recv(vi, rq, GFP_ATOMIC)) { + spin_lock(&vi->refill_lock); + if (vi->refill_work_enabled) + schedule_delayed_work(&vi->refill, 0); + spin_unlock(&vi->refill_lock); + } } u64_stats_update_begin(&rq->stats.syncp); @@ -1651,6 +1675,8 @@ static int virtnet_open(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int i, err; + enable_refill_work(vi); + for (i = 0; i < vi->max_queue_pairs; i++) { if (i < vi->curr_queue_pairs) /* Make sure we have some buffers: if oom use wq. */ @@ -2033,6 +2059,8 @@ static int virtnet_close(struct net_device *dev) struct virtnet_info *vi = netdev_priv(dev); int i; + /* Make sure NAPI doesn't schedule refill work */ + disable_refill_work(vi); /* Make sure refill_work doesn't re-enable napi! */ cancel_delayed_work_sync(&vi->refill); @@ -2776,6 +2804,9 @@ static void virtnet_freeze_down(struct virtio_device *vdev) netif_tx_lock_bh(vi->dev); netif_device_detach(vi->dev); netif_tx_unlock_bh(vi->dev); + /* Make sure NAPI doesn't schedule refill work */ + disable_refill_work(vi); + /* Make sure refill_work doesn't re-enable napi! */ cancel_delayed_work_sync(&vi->refill); if (netif_running(vi->dev)) { @@ -2799,6 +2830,8 @@ static int virtnet_restore_up(struct virtio_device *vdev) virtio_device_ready(vdev); + enable_refill_work(vi); + if (netif_running(vi->dev)) { for (i = 0; i < vi->curr_queue_pairs; i++) if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) @@ -3548,6 +3581,7 @@ static int virtnet_probe(struct virtio_device *vdev) vdev->priv = vi; INIT_WORK(&vi->config_work, vi
Re: [PATCH -next] vdpa/mlx5: Use eth_zero_addr() to assign zero address
On Tue, Jun 28, 2022 at 09:44:18AM +, Xu Qiang wrote: > Using eth_zero_addr() to assign zero address insetad of typo > memset(). > > Reported-by: Hulk Robot > Signed-off-by: Xu Qiang > --- > drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index e85c1d71f4ed..f738c78ef446 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1457,8 +1457,8 @@ static int mlx5_vdpa_add_mac_vlan_rules(struct > mlx5_vdpa_net *ndev, u8 *mac, > > *ucast = rule; > > - memset(dmac_c, 0, ETH_ALEN); > - memset(dmac_v, 0, ETH_ALEN); > + eth_zero_addr(dmac_c); > + eth_zero_addr(dmac_v); > dmac_c[0] = 1; > dmac_v[0] = 1; > rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1); > -- > 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers
On 2022-05-27 00:50, Dmitry Osipenko wrote: Hello, This patchset introduces memory shrinker for the VirtIO-GPU DRM driver and adds memory purging and eviction support to VirtIO-GPU driver. The new dma-buf locking convention is introduced here as well. During OOM, the shrinker will release BOs that are marked as "not needed" by userspace using the new madvise IOCTL, it will also evict idling BOs to SWAP. The userspace in this case is the Mesa VirGL driver, it will mark the cached BOs as "not needed", allowing kernel driver to release memory of the cached shmem BOs on lowmem situations, preventing OOM kills. The Panfrost driver is switched to use generic memory shrinker. I think we still have some outstanding issues here - Alyssa reported some weirdness yesterday, so I just tried provoking a low-memory condition locally with this series applied and a few debug options enabled, and the results as below were... interesting. Thanks, Robin. ->8- [ 68.295951] == [ 68.295956] WARNING: possible circular locking dependency detected [ 68.295963] 5.19.0-rc3+ #400 Not tainted [ 68.295972] -- [ 68.295977] cc1/295 is trying to acquire lock: [ 68.295986] 08d7f1a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198 [ 68.296036] [ 68.296036] but task is already holding lock: [ 68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x4d8/0x1470 [ 68.296080] [ 68.296080] which lock already depends on the new lock. [ 68.296080] [ 68.296085] [ 68.296085] the existing dependency chain (in reverse order) is: [ 68.296090] [ 68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 68.296111]fs_reclaim_acquire+0xb8/0x150 [ 68.296130]dma_resv_lockdep+0x298/0x3fc [ 68.296148]do_one_initcall+0xe4/0x5f8 [ 68.296163]kernel_init_freeable+0x414/0x49c [ 68.296180]kernel_init+0x2c/0x148 [ 68.296195]ret_from_fork+0x10/0x20 [ 68.296207] [ 68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}: [ 68.296229]__lock_acquire+0x1724/0x2398 [ 68.296246]lock_acquire+0x218/0x5b0 [ 68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378 [ 68.296277]ww_mutex_lock+0x7c/0x4d8 [ 68.296291]drm_gem_shmem_free+0x7c/0x198 [ 68.296304]panfrost_gem_free_object+0x118/0x138 [ 68.296318]drm_gem_object_free+0x40/0x68 [ 68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8 [ 68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170 [ 68.296368]do_shrink_slab+0x220/0x808 [ 68.296381]shrink_slab+0x11c/0x408 [ 68.296392]shrink_node+0x6ac/0xb90 [ 68.296403]do_try_to_free_pages+0x1dc/0x8d0 [ 68.296416]try_to_free_pages+0x1ec/0x5b0 [ 68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470 [ 68.296444]__alloc_pages+0x4e0/0x5b8 [ 68.296455]__folio_alloc+0x24/0x60 [ 68.296467]vma_alloc_folio+0xb8/0x2f8 [ 68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68 [ 68.296498]__handle_mm_fault+0x918/0x12a8 [ 68.296513]handle_mm_fault+0x130/0x300 [ 68.296527]do_page_fault+0x1d0/0x568 [ 68.296539]do_translation_fault+0xa0/0xb8 [ 68.296551]do_mem_abort+0x68/0xf8 [ 68.296562]el0_da+0x74/0x100 [ 68.296572]el0t_64_sync_handler+0x68/0xc0 [ 68.296585]el0t_64_sync+0x18c/0x190 [ 68.296596] [ 68.296596] other info that might help us debug this: [ 68.296596] [ 68.296601] Possible unsafe locking scenario: [ 68.296601] [ 68.296604]CPU0CPU1 [ 68.296608] [ 68.296612] lock(fs_reclaim); [ 68.296622] lock(reservation_ww_class_mutex); [ 68.296633]lock(fs_reclaim); [ 68.296644] lock(reservation_ww_class_mutex); [ 68.296654] [ 68.296654] *** DEADLOCK *** [ 68.296654] [ 68.296658] 3 locks held by cc1/295: [ 68.29] #0: 0616e898 (&mm->mmap_lock){}-{3:3}, at: do_page_fault+0x144/0x568 [ 68.296702] #1: 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x4d8/0x1470 [ 68.296740] #2: 8c1215b0 (shrinker_rwsem){}-{3:3}, at: shrink_slab+0xc0/0x408 [ 68.296774] [ 68.296774] stack backtrace: [ 68.296780] CPU: 2 PID: 295 Comm: cc1 Not tainted 5.19.0-rc3+ #400 [ 68.296794] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Sep 3 2019 [ 68.296803] Call trace: [ 68.296808] dump_backtrace+0x1e4/0x1f0 [ 68.296821] show_stack+0x20/0x70 [ 68.296832] dump_stack_lvl+0x8c/0xb8 [ 68.296849] dump_stack+0x1c/0x38 [ 68.296864] print_circular_bug.isra.0+0x284/0x378 [ 68.296881] check_noncircular+0x1d8/0x1
Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members
On Tue, Jun 28, 2022 at 04:21:29AM +0200, Gustavo A. R. Silva wrote: > > > Though maybe we could just switch off > > > -Wgnu-variable-sized-type-not-at-end during configuration ? > We need to think in a different strategy. I think we will need to switch off the warning in userspace - this is doable for rdma-core. On the other hand, if the goal is to enable the array size check compiler warning I would suggest focusing only on those structs that actually hit that warning in the kernel. IIRC infiniband doesn't trigger it because it just pointer casts the flex array to some other struct. It isn't actually an array it is a placeholder for a trailing structure, so it is never indexed. This is also why we hit the warning because the convient way for userspace to compose the message is to squash the header and trailer structs together in a super struct on the stack, then invoke the ioctl. Jason ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 1/4] vdpa: Add suspend operation
On Thu, Jun 23, 2022 at 06:07:35PM +0200, Eugenio Pérez wrote: This operation is optional: It it's not implemented, backend feature bit will not be exposed. Signed-off-by: Eugenio Pérez --- include/linux/vdpa.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 7b4a13d3bd91..d282f464d2f1 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -218,6 +218,9 @@ struct vdpa_map_file { * @reset: Reset device * @vdev: vdpa device * Returns integer: success (0) or error (< 0) + * @suspend: Suspend or resume the device (optional) ^ IIUC we removed the resume operation (that should be done with reset), so should we update this documentation? Thanks, Stefano + * @vdev: vdpa device + * Returns integer: success (0) or error (< 0) * @get_config_size:Get the size of the configuration space includes * fields that are conditional on feature bits. * @vdev: vdpa device @@ -319,6 +322,7 @@ struct vdpa_config_ops { u8 (*get_status)(struct vdpa_device *vdev); void (*set_status)(struct vdpa_device *vdev, u8 status); int (*reset)(struct vdpa_device *vdev); + int (*suspend)(struct vdpa_device *vdev); size_t (*get_config_size)(struct vdpa_device *vdev); void (*get_config)(struct vdpa_device *vdev, unsigned int offset, void *buf, unsigned int len); -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 2/4] vhost-vdpa: introduce SUSPEND backend feature bit
On Thu, Jun 23, 2022 at 06:07:36PM +0200, Eugenio Pérez wrote: Userland knows if it can suspend the device or not by checking this feature bit. It's only offered if the vdpa driver backend implements the suspend() operation callback, and to offer it or userland to ack it if the backend does not offer that callback is an error. Should we document in the previous patch that the callback must be implemented only if the drive/device support it? The rest LGTM although I have a doubt whether it is better to move this patch after patch 3, or merge it with patch 3, for bisectability since we enable the feature here but if the userspace calls ioctl() with VHOST_VDPA_SUSPEND we reply back that it is not supported. Thanks, Stefano Signed-off-by: Eugenio Pérez --- drivers/vhost/vdpa.c | 16 +++- include/uapi/linux/vhost_types.h | 2 ++ 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 23dcbfdfa13b..3d636e192061 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -347,6 +347,14 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v, return 0; } +static bool vhost_vdpa_can_suspend(const struct vhost_vdpa *v) +{ + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->suspend; +} + static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep) { struct vdpa_device *vdpa = v->vdpa; @@ -577,7 +585,11 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, if (cmd == VHOST_SET_BACKEND_FEATURES) { if (copy_from_user(&features, featurep, sizeof(features))) return -EFAULT; - if (features & ~VHOST_VDPA_BACKEND_FEATURES) + if (features & ~(VHOST_VDPA_BACKEND_FEATURES | +BIT_ULL(VHOST_BACKEND_F_SUSPEND))) + return -EOPNOTSUPP; + if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) && +!vhost_vdpa_can_suspend(v)) return -EOPNOTSUPP; vhost_set_backend_features(&v->vdev, features); return 0; @@ -628,6 +640,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, break; case VHOST_GET_BACKEND_FEATURES: features = VHOST_VDPA_BACKEND_FEATURES; + if (vhost_vdpa_can_suspend(v)) + features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND); if (copy_to_user(featurep, &features, sizeof(features))) r = -EFAULT; break; diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index 634cee485abb..1bdd6e363f4c 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -161,5 +161,7 @@ struct vhost_vdpa_iova_range { * message */ #define VHOST_BACKEND_F_IOTLB_ASID 0x3 +/* Device can be suspended */ +#define VHOST_BACKEND_F_SUSPEND 0x4 #endif -- 2.31.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 3/4] vhost-vdpa: uAPI to suspend the device
On Thu, Jun 23, 2022 at 06:07:37PM +0200, Eugenio Pérez wrote: >The ioctl adds support for suspending the device from userspace. > >This is a must before getting virtqueue indexes (base) for live migration, >since the device could modify them after userland gets them. There are >individual ways to perform that action for some devices >(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no >way to perform it for any vhost device (and, in particular, vhost-vdpa). > >After a successful return of the ioctl call the device must not process >more virtqueue descriptors. The device can answer to read or writes of >config fields as if it were not suspended. In particular, writing to >"queue_enable" with a value of 1 will not make the device start >processing buffers of the virtqueue. > >Signed-off-by: Eugenio Pérez >--- > drivers/vhost/vdpa.c | 19 +++ > include/uapi/linux/vhost.h | 14 ++ > 2 files changed, 33 insertions(+) > >diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c >index 3d636e192061..7fa671ac4bdf 100644 >--- a/drivers/vhost/vdpa.c >+++ b/drivers/vhost/vdpa.c >@@ -478,6 +478,22 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa >*v, u32 __user *argp) > return 0; > } > >+/* After a successful return of ioctl the device must not process more >+ * virtqueue descriptors. The device can answer to read or writes of config >+ * fields as if it were not suspended. In particular, writing to >"queue_enable" >+ * with a value of 1 will not make the device start processing buffers. >+ */ >+static long vhost_vdpa_suspend(struct vhost_vdpa *v) >+{ >+ struct vdpa_device *vdpa = v->vdpa; >+ const struct vdpa_config_ops *ops = vdpa->config; >+ >+ if (!ops->suspend) >+ return -EOPNOTSUPP; >+ >+ return ops->suspend(vdpa); >+} >+ > static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, > void __user *argp) > { >@@ -654,6 +670,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > case VHOST_VDPA_GET_VQS_COUNT: > r = vhost_vdpa_get_vqs_count(v, argp); > break; >+ case VHOST_VDPA_SUSPEND: >+ r = vhost_vdpa_suspend(v); >+ break; > default: > r = vhost_dev_ioctl(&v->vdev, cmd, argp); > if (r == -ENOIOCTLCMD) >diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h >index cab645d4a645..6d9f45163155 100644 >--- a/include/uapi/linux/vhost.h >+++ b/include/uapi/linux/vhost.h >@@ -171,4 +171,18 @@ > #define VHOST_VDPA_SET_GROUP_ASID _IOW(VHOST_VIRTIO, 0x7C, \ >struct vhost_vring_state) > >+/* Suspend or resume a device so it does not process virtqueue requests >anymore >+ * >+ * After the return of ioctl with suspend != 0, the device must finish any >+ * pending operations like in flight requests. It must also preserve all the >+ * necessary state (the virtqueue vring base plus the possible device specific >+ * states) that is required for restoring in the future. The device must not >+ * change its configuration after that point. >+ * >+ * After the return of ioctl with suspend == 0, the device can continue >+ * processing buffers as long as typical conditions are met (vq is enabled, >+ * DRIVER_OK status bit is enabled, etc). >+ */ >+#define VHOST_VDPA_SUSPEND_IOW(VHOST_VIRTIO, 0x7D, int) ^ IIUC we are not using the argument anymore, so this should be changed in _IO(VHOST_VIRTIO, 0x7D). And we should update a bit the documentation. Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6] x86/paravirt: useless assignment instructions cause Unixbench full core performance degradation
On 6/28/22 08:54, Guo Hui wrote: The instructions assigned to the vcpu_is_preempted function parameter in the X86 architecture physical machine are redundant instructions, causing the multi-core performance of Unixbench to drop by about 4% to 5%. The C function is as follows: static bool vcpu_is_preempted(long vcpu); The parameter 'vcpu' in the function osq_lock that calls the function vcpu_is_preempted is assigned as follows: The C code is in the function node_cpu: cpu = node->cpu - 1; The instructions corresponding to the C code are: mov 0x14(%rax),%edi sub $0x1,%edi The above instructions are unnecessary in the X86 Native operating environment, causing high cache-misses and degrading performance. This patch uses static_key to not execute this instruction in the Native runtime environment. The patch effect is as follows two machines, Unixbench runs with full core score: 1. Machine configuration: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz CPU core: 40 Memory: 256G OS Kernel: 5.19-rc3 Before using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 948326591.2 81261.9 Double-Precision Whetstone 55.0 211986.3 38543.0 Execl Throughput 43.0 43453.2 10105.4 File Copy 1024 bufsize 2000 maxblocks 3960.0 438936.2 1108.4 File Copy 256 bufsize 500 maxblocks1655.0 118197.4714.2 File Copy 4096 bufsize 8000 maxblocks 5800.01534674.7 2646.0 Pipe Throughput 12440.0 46482107.6 37365.0 Pipe-based Context Switching 4000.01915094.2 4787.7 Process Creation126.0 85442.2 6781.1 Shell Scripts (1 concurrent) 42.4 69400.7 16368.1 Shell Scripts (8 concurrent) 6.0 8877.2 14795.3 System Call Overhead 15000.04714906.1 3143.3 System Benchmarks Index Score7923.3 After using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 947032915.5 81151.1 Double-Precision Whetstone 55.0 211971.2 38540.2 Execl Throughput 43.0 45054.8 10477.9 File Copy 1024 bufsize 2000 maxblocks 3960.0 515024.9 1300.6 File Copy 256 bufsize 500 maxblocks1655.0 146354.6884.3 File Copy 4096 bufsize 8000 maxblocks 5800.01679995.9 2896.5 Pipe Throughput 12440.0 46466394.2 37352.4 Pipe-based Context Switching 4000.01898221.4 4745.6 Process Creation126.0 85653.1 6797.9 Shell Scripts (1 concurrent) 42.4 69437.3 16376.7 Shell Scripts (8 concurrent) 6.0 8898.9 14831.4 System Call Overhead 15000.04658746.7 3105.8 System Benchmarks Index Score8248.8 2. Machine configuration: Hygon C86 7185 32-core Processor CPU core: 128 Memory: 256G OS Kernel: 5.19-rc3 Before using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 2256644068.3 193371.4 Double-Precision Whetstone 55.0 438969.9 79812.7 Execl Throughput 43.0 10108.6 2350.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 275892.8696.7 File Copy 256 bufsize 500 maxblocks1655.0 72082.7435.5 File Copy 4096 bufsize 8000 maxblocks 5800.0 925043.4 1594.9 Pipe Throughput 12440.0 118905512.5 95583.2 Pipe-based Context Switching 4000.07820945.7 19552.4 Process Creation126.0 31233.3 2478.8 Shell Scripts (1 concurrent) 42.4 49042.8 11566.7 Shell Scripts (8 concurrent) 6.0 6656.0 11093.3 System Call Overhead 15000.06816047.5 4544.0 System Benchmarks Index Score7756.6 After using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 2252272929.4 192996.8 Double-Precision Whetstone 55.0 451847.2 82154.0 Execl Throughput 43.0 10595.1 2464.0 File Copy 1024 bufsize 2000 maxblocks 3960.0
Re: [PATCH v6 00/22] Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers
On Tue, Jun 28, 2022 at 5:51 AM Dmitry Osipenko wrote: > > On 6/28/22 15:31, Robin Murphy wrote: > > ->8- > > [ 68.295951] == > > [ 68.295956] WARNING: possible circular locking dependency detected > > [ 68.295963] 5.19.0-rc3+ #400 Not tainted > > [ 68.295972] -- > > [ 68.295977] cc1/295 is trying to acquire lock: > > [ 68.295986] 08d7f1a0 > > (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_free+0x7c/0x198 > > [ 68.296036] > > [ 68.296036] but task is already holding lock: > > [ 68.296041] 8c14b820 (fs_reclaim){+.+.}-{0:0}, at: > > __alloc_pages_slowpath.constprop.0+0x4d8/0x1470 > > [ 68.296080] > > [ 68.296080] which lock already depends on the new lock. > > [ 68.296080] > > [ 68.296085] > > [ 68.296085] the existing dependency chain (in reverse order) is: > > [ 68.296090] > > [ 68.296090] -> #1 (fs_reclaim){+.+.}-{0:0}: > > [ 68.296111]fs_reclaim_acquire+0xb8/0x150 > > [ 68.296130]dma_resv_lockdep+0x298/0x3fc > > [ 68.296148]do_one_initcall+0xe4/0x5f8 > > [ 68.296163]kernel_init_freeable+0x414/0x49c > > [ 68.296180]kernel_init+0x2c/0x148 > > [ 68.296195]ret_from_fork+0x10/0x20 > > [ 68.296207] > > [ 68.296207] -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}: > > [ 68.296229]__lock_acquire+0x1724/0x2398 > > [ 68.296246]lock_acquire+0x218/0x5b0 > > [ 68.296260]__ww_mutex_lock.constprop.0+0x158/0x2378 > > [ 68.296277]ww_mutex_lock+0x7c/0x4d8 > > [ 68.296291]drm_gem_shmem_free+0x7c/0x198 > > [ 68.296304]panfrost_gem_free_object+0x118/0x138 > > [ 68.296318]drm_gem_object_free+0x40/0x68 > > [ 68.296334]drm_gem_shmem_shrinker_run_objects_scan+0x42c/0x5b8 > > [ 68.296352]drm_gem_shmem_shrinker_scan_objects+0xa4/0x170 > > [ 68.296368]do_shrink_slab+0x220/0x808 > > [ 68.296381]shrink_slab+0x11c/0x408 > > [ 68.296392]shrink_node+0x6ac/0xb90 > > [ 68.296403]do_try_to_free_pages+0x1dc/0x8d0 > > [ 68.296416]try_to_free_pages+0x1ec/0x5b0 > > [ 68.296429]__alloc_pages_slowpath.constprop.0+0x528/0x1470 > > [ 68.296444]__alloc_pages+0x4e0/0x5b8 > > [ 68.296455]__folio_alloc+0x24/0x60 > > [ 68.296467]vma_alloc_folio+0xb8/0x2f8 > > [ 68.296483]alloc_zeroed_user_highpage_movable+0x58/0x68 > > [ 68.296498]__handle_mm_fault+0x918/0x12a8 > > [ 68.296513]handle_mm_fault+0x130/0x300 > > [ 68.296527]do_page_fault+0x1d0/0x568 > > [ 68.296539]do_translation_fault+0xa0/0xb8 > > [ 68.296551]do_mem_abort+0x68/0xf8 > > [ 68.296562]el0_da+0x74/0x100 > > [ 68.296572]el0t_64_sync_handler+0x68/0xc0 > > [ 68.296585]el0t_64_sync+0x18c/0x190 > > [ 68.296596] > > [ 68.296596] other info that might help us debug this: > > [ 68.296596] > > [ 68.296601] Possible unsafe locking scenario: > > [ 68.296601] > > [ 68.296604]CPU0CPU1 > > [ 68.296608] > > [ 68.296612] lock(fs_reclaim); > > [ 68.296622] lock(reservation_ww_class_mutex); > > [ 68.296633]lock(fs_reclaim); > > [ 68.296644] lock(reservation_ww_class_mutex); > > [ 68.296654] > > [ 68.296654] *** DEADLOCK *** > > This splat could be ignored for now. I'm aware about it, although > haven't looked closely at how to fix it since it's a kind of a lockdep > misreporting. The lockdep splat could be fixed with something similar to what I've done in msm, ie. basically just not acquire the lock in the finalizer: https://patchwork.freedesktop.org/patch/489364/ There is one gotcha to watch for, as danvet pointed out (scan_objects() could still see the obj in the LRU before the finalizer removes it), but if scan_objects() does the kref_get_unless_zero() trick, it is safe. BR, -R ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7] x86/paravirt: useless assignment instructions cause Unixbench full core performance degradation
On 6/28/22 12:12, Guo Hui wrote: The instructions assigned to the vcpu_is_preempted function parameter in the X86 architecture physical machine are redundant instructions, causing the multi-core performance of Unixbench to drop by about 4% to 5%. The C function is as follows: static bool vcpu_is_preempted(long vcpu); The parameter 'vcpu' in the function osq_lock that calls the function vcpu_is_preempted is assigned as follows: The C code is in the function node_cpu: cpu = node->cpu - 1; The instructions corresponding to the C code are: mov 0x14(%rax),%edi sub $0x1,%edi The above instructions are unnecessary in the X86 Native operating environment, causing high cache-misses and degrading performance. This patch uses static_key to not execute this instruction in the Native runtime environment. The patch effect is as follows two machines, Unixbench runs with full core score: 1. Machine configuration: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz CPU core: 40 Memory: 256G OS Kernel: 5.19-rc3 Before using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 948326591.2 81261.9 Double-Precision Whetstone 55.0 211986.3 38543.0 Execl Throughput 43.0 43453.2 10105.4 File Copy 1024 bufsize 2000 maxblocks 3960.0 438936.2 1108.4 File Copy 256 bufsize 500 maxblocks1655.0 118197.4714.2 File Copy 4096 bufsize 8000 maxblocks 5800.01534674.7 2646.0 Pipe Throughput 12440.0 46482107.6 37365.0 Pipe-based Context Switching 4000.01915094.2 4787.7 Process Creation126.0 85442.2 6781.1 Shell Scripts (1 concurrent) 42.4 69400.7 16368.1 Shell Scripts (8 concurrent) 6.0 8877.2 14795.3 System Call Overhead 15000.04714906.1 3143.3 System Benchmarks Index Score7923.3 After using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 947032915.5 81151.1 Double-Precision Whetstone 55.0 211971.2 38540.2 Execl Throughput 43.0 45054.8 10477.9 File Copy 1024 bufsize 2000 maxblocks 3960.0 515024.9 1300.6 File Copy 256 bufsize 500 maxblocks1655.0 146354.6884.3 File Copy 4096 bufsize 8000 maxblocks 5800.01679995.9 2896.5 Pipe Throughput 12440.0 46466394.2 37352.4 Pipe-based Context Switching 4000.01898221.4 4745.6 Process Creation126.0 85653.1 6797.9 Shell Scripts (1 concurrent) 42.4 69437.3 16376.7 Shell Scripts (8 concurrent) 6.0 8898.9 14831.4 System Call Overhead 15000.04658746.7 3105.8 System Benchmarks Index Score8248.8 2. Machine configuration: Hygon C86 7185 32-core Processor CPU core: 128 Memory: 256G OS Kernel: 5.19-rc3 Before using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 2256644068.3 193371.4 Double-Precision Whetstone 55.0 438969.9 79812.7 Execl Throughput 43.0 10108.6 2350.8 File Copy 1024 bufsize 2000 maxblocks 3960.0 275892.8696.7 File Copy 256 bufsize 500 maxblocks1655.0 72082.7435.5 File Copy 4096 bufsize 8000 maxblocks 5800.0 925043.4 1594.9 Pipe Throughput 12440.0 118905512.5 95583.2 Pipe-based Context Switching 4000.07820945.7 19552.4 Process Creation126.0 31233.3 2478.8 Shell Scripts (1 concurrent) 42.4 49042.8 11566.7 Shell Scripts (8 concurrent) 6.0 6656.0 11093.3 System Call Overhead 15000.06816047.5 4544.0 System Benchmarks Index Score7756.6 After using the patch: System Benchmarks Index Values BASELINE RESULTINDEX Dhrystone 2 using register variables 116700.0 2252272929.4 192996.8 Double-Precision Whetstone 55.0 451847.2 82154.0 Execl Throughput 43.0 10595.1 2464.0 File Copy 1024 bufsize 2000 maxblocks 3960.0
Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members
On Mon, Jun 27, 2022 at 09:40:52PM -0300, Jason Gunthorpe wrote: > On Mon, Jun 27, 2022 at 08:27:37PM +0200, Daniel Borkmann wrote: > > [...] > > Fyi, this breaks BPF CI: > > > > https://github.com/kernel-patches/bpf/runs/7078719372?check_suite_focus=true > > > > [...] > > progs/map_ptr_kern.c:314:26: error: field 'trie_key' with variable sized > > type 'struct bpf_lpm_trie_key' not at the end of a struct or class is a GNU > > extension [-Werror,-Wgnu-variable-sized-type-not-at-end] > > struct bpf_lpm_trie_key trie_key; > > ^ The issue here seems to be a collision between "unknown array size" and known sizes: struct bpf_lpm_trie_key { __u32 prefixlen; /* up to 32 for AF_INET, 128 for AF_INET6 */ __u8data[0];/* Arbitrary size */ }; struct lpm_key { struct bpf_lpm_trie_key trie_key; __u32 data; }; This is treating trie_key as a header, which it's not: it's a complete structure. :) Perhaps: struct lpm_key { __u32 prefixlen; __u32 data; }; I don't see anything else trying to include bpf_lpm_trie_key. > > This will break the rdma-core userspace as well, with a similar > error: > > /usr/bin/clang-13 -DVERBS_DEBUG -Dibverbs_EXPORTS -Iinclude > -I/usr/include/libnl3 -I/usr/include/drm -g -O2 -fdebug-prefix-map=/__w/1/s=. > -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time > -D_FORTIFY_SOURCE=2 -Wmissing-prototypes -Wmissing-declarations > -Wwrite-strings -Wformat=2 -Wcast-function-type -Wformat-nonliteral > -Wdate-time -Wnested-externs -Wshadow -Wstrict-prototypes > -Wold-style-definition -Werror -Wredundant-decls -g -fPIC -std=gnu11 -MD > -MT libibverbs/CMakeFiles/ibverbs.dir/cmd_flow.c.o -MF > libibverbs/CMakeFiles/ibverbs.dir/cmd_flow.c.o.d -o > libibverbs/CMakeFiles/ibverbs.dir/cmd_flow.c.o -c ../libibverbs/cmd_flow.c > In file included from ../libibverbs/cmd_flow.c:33: > In file included from include/infiniband/cmd_write.h:36: > In file included from include/infiniband/cmd_ioctl.h:41: > In file included from include/infiniband/verbs.h:48: > In file included from include/infiniband/verbs_api.h:66: > In file included from include/infiniband/ib_user_ioctl_verbs.h:38: > include/rdma/ib_user_verbs.h:436:34: error: field 'base' with variable sized > type 'struct ib_uverbs_create_cq_resp' not at the end of a struct or class is > a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end] > struct ib_uverbs_create_cq_resp base; > ^ > include/rdma/ib_user_verbs.h:644:34: error: field 'base' with variable sized > type 'struct ib_uverbs_create_qp_resp' not at the end of a struct or class is > a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end] > struct ib_uverbs_create_qp_resp base; This looks very similar, a struct of unknown size is being treated as a header struct: struct ib_uverbs_create_cq_resp { __u32 cq_handle; __u32 cqe; __aligned_u64 driver_data[0]; }; struct ib_uverbs_ex_create_cq_resp { struct ib_uverbs_create_cq_resp base; __u32 comp_mask; __u32 response_length; }; And it only gets used here: DECLARE_UVERBS_WRITE(IB_USER_VERBS_CMD_CREATE_CQ, ib_uverbs_create_cq, UAPI_DEF_WRITE_UDATA_IO( struct ib_uverbs_create_cq, struct ib_uverbs_create_cq_resp), ^^^ UAPI_DEF_METHOD_NEEDS_FN(create_cq)), which must also be assuming it's a header. So probably better to just drop the driver_data field? I don't see anything using it (that I can find) besides as a sanity-check that the field exists and is at the end of the struct. -- Kees Cook ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members
On Tue, Jun 28, 2022 at 09:27:21AM +0200, Geert Uytterhoeven wrote: > Hi Gustavo, > > Thanks for your patch! > > On Mon, Jun 27, 2022 at 8:04 PM Gustavo A. R. Silva > wrote: > > There is a regular need in the kernel to provide a way to declare > > having a dynamically sized set of trailing elements in a structure. > > Kernel code should always use “flexible array members”[1] for these > > cases. The older style of one-element or zero-length arrays should > > no longer be used[2]. > > These rules apply to the kernel, but uapi is not considered part of the > kernel, so different rules apply. Uapi header files should work with > whatever compiler that can be used for compiling userspace. Right, userspace isn't bound by these rules, but the kernel ends up consuming these structures, so we need to fix them. The [0] -> [] changes (when they are not erroneously being used within other structures) is valid for all compilers. Flexible arrays are C99; it's been 23 years. :) But, yes, where we DO break stuff we need to workaround it, etc. -- Kees Cook ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH][next] treewide: uapi: Replace zero-length arrays with flexible-array members
On Tue, Jun 28, 2022 at 10:54:58AM -0700, Kees Cook wrote: > which must also be assuming it's a header. So probably better to just > drop the driver_data field? I don't see anything using it (that I can > find) besides as a sanity-check that the field exists and is at the end > of the struct. The field is guaranteeing alignment of the following structure. IIRC there are a few cases that we don't have a u64 already to force this. Jason ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 -next] vdpa/mlx5: Use eth_zero_addr() to assign zero address
On Tue, Jun 28, 2022 at 12:34:57PM +, Xu Qiang wrote: > Using eth_zero_addr() to assign zero address instead of memset(). > > Reported-by: Hulk Robot > Signed-off-by: Xu Qiang Acked-by: Michael S. Tsirkin > --- > v2: > - fix typo in commit log > drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c > b/drivers/vdpa/mlx5/net/mlx5_vnet.c > index e85c1d71f4ed..f738c78ef446 100644 > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c > @@ -1457,8 +1457,8 @@ static int mlx5_vdpa_add_mac_vlan_rules(struct > mlx5_vdpa_net *ndev, u8 *mac, > > *ucast = rule; > > - memset(dmac_c, 0, ETH_ALEN); > - memset(dmac_v, 0, ETH_ALEN); > + eth_zero_addr(dmac_c); > + eth_zero_addr(dmac_v); > dmac_c[0] = 1; > dmac_v[0] = 1; > rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1); > -- > 2.17.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 0/3] virtio: support requiring restricted access per device
On Wed, 22 Jun 2022, Juergen Gross wrote: > Instead of an all or nothing approach add support for requiring > restricted memory access per device. > > Changes in V3: > - new patches 1 + 2 > - basically complete rework of patch 3 > > Juergen Gross (3): > virtio: replace restricted mem access flag with callback > kernel: remove platform_has() infrastructure > xen: don't require virtio with grants for non-PV guests On the whole series: Reviewed-by: Stefano Stabellini > MAINTAINERS| 8 > arch/arm/xen/enlighten.c | 4 +++- > arch/s390/mm/init.c| 4 ++-- > arch/x86/mm/mem_encrypt_amd.c | 4 ++-- > arch/x86/xen/enlighten_hvm.c | 4 +++- > arch/x86/xen/enlighten_pv.c| 5 - > drivers/virtio/Kconfig | 4 > drivers/virtio/Makefile| 1 + > drivers/virtio/virtio.c| 4 ++-- > drivers/virtio/virtio_anchor.c | 18 + > drivers/xen/Kconfig| 9 + > drivers/xen/grant-dma-ops.c| 10 ++ > include/asm-generic/Kbuild | 1 - > include/asm-generic/platform-feature.h | 8 > include/linux/platform-feature.h | 19 -- > include/linux/virtio_anchor.h | 19 ++ > include/xen/xen-ops.h | 6 ++ > include/xen/xen.h | 8 > kernel/Makefile| 2 +- > kernel/platform-feature.c | 27 -- > 20 files changed, 84 insertions(+), 81 deletions(-) > create mode 100644 drivers/virtio/virtio_anchor.c > delete mode 100644 include/asm-generic/platform-feature.h > delete mode 100644 include/linux/platform-feature.h > create mode 100644 include/linux/virtio_anchor.h > delete mode 100644 kernel/platform-feature.c > > -- > 2.35.3 > > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3] virtio: disable notification hardening by default
On Tue, Jun 28, 2022 at 2:17 PM Jason Wang wrote: > > On Tue, Jun 28, 2022 at 1:00 PM Michael S. Tsirkin wrote: > > > > On Tue, Jun 28, 2022 at 11:49:12AM +0800, Jason Wang wrote: > > > > Heh. Yea sure. But things work fine for people. What is the chance > > > > your review found and fixed all driver bugs? > > > > > > I don't/can't audit all bugs but the race between open/close against > > > ready/reset. It looks to me a good chance to fix them all but if you > > > think differently, let me know > > > > > > > After two attempts > > > > I don't feel like hoping audit will fix all bugs. > > > > > > I've started the auditing and have 15+ patches in the queue. (only > > > covers bluetooth, console, pmem, virtio-net and caif). Spotting the > > > issue is not hard but the testing, It would take at least the time of > > > one release to finalize I guess. > > > > Absolutely. So I am looking for a way to implement hardening that does > > not break existing drivers. > > I totally agree with you to seek a way without bothering the drivers. > Just wonder if this is possbile. > > > > > > > > > > > > > > > > > > > > > > > > > The reason config was kind of easy is that config interrupt is > > > > > > rarely > > > > > > vital for device function so arbitrarily deferring that does not > > > > > > lead to > > > > > > deadlocks - what you are trying to do with VQ interrupts is > > > > > > fundamentally different. Things are especially bad if we just drop > > > > > > an interrupt but deferring can lead to problems too. > > > > > > > > > > I'm not sure I see the difference, disable_irq() stuffs also delay the > > > > > interrupt processing until enable_irq(). > > > > > > > > > > > > Absolutely. I am not at all sure disable_irq fixes all problems. > > > > > > > > > > > > > > > > Consider as an example > > > > > > virtio-net: fix race between ndo_open() and > > > > > > virtio_device_ready() > > > > > > if you just defer vq interrupts you get deadlocks. > > > > > > > > > > > > > > > > > > > > > > I don't see a deadlock here, maybe you can show more detail on this? > > > > > > > > What I mean is this: if we revert the above commit, things still > > > > work (out of spec, but still). If we revert and defer interrupts until > > > > device ready then ndo_open that triggers before device ready deadlocks. > > > > > > Ok, I guess you meant on a hypervisor that is strictly written with spec. > > > > I mean on hypervisor that starts processing queues after getting a kick > > even without DRIVER_OK. > > Oh right. > > > > > > > > > > > > > > > > > > > > > > > So, thinking about all this, how about a simple per vq flag meaning > > > > > > "this vq was kicked since reset"? > > > > > > > > > > And ignore the notification if vq is not kicked? It sounds like the > > > > > callback needs to be synchronized with the kick. > > > > > > > > Note we only need to synchronize it when it changes, which is > > > > only during initialization and reset. > > > > > > Yes. > > > > > > > > > > > > > > > > > > > > > > > If driver does not kick then it's not ready to get callbacks, right? > > > > > > > > > > > > Sounds quite clean, but we need to think through memory ordering > > > > > > concerns - I guess it's only when we change the value so > > > > > > if (!vq->kicked) { > > > > > > vq->kicked = true; > > > > > > mb(); > > > > > > } > > > > > > > > > > > > will do the trick, right? > > > > > > > > > > There's no much difference with the existing approach: > > > > > > > > > > 1) your proposal implicitly makes callbacks ready in virtqueue_kick() > > > > > 2) my proposal explicitly makes callbacks ready via > > > > > virtio_device_ready() > > > > > > > > > > Both require careful auditing of all the existing drivers to make sure > > > > > no kick before DRIVER_OK. > > > > > > > > Jason, kick before DRIVER_OK is out of spec, sure. But it is unrelated > > > > to hardening > > > > > > Yes but with your proposal, it seems to couple kick with DRIVER_OK > > > somehow. > > > > I don't see how - my proposal ignores DRIVER_OK issues. > > Yes, what I meant is, in your proposal, the first kick after rest is a > hint that the driver is ok (but actually it could not). > > > > > > > and in absence of config interrupts is generally easily > > > > fixed just by sticking virtio_device_ready early in initialization. > > > > > > So if the kick is done before the subsystem registration, there's > > > still a window in the middle (assuming we stick virtio_device_ready() > > > early): > > > > > > virtio_device_ready() > > > virtqueue_kick() > > > /* the window */ > > > subsystem_registration() > > > > Absolutely, however, I do not think we really have many such drivers > > since this has been known as a wrong thing to do since the beginning. > > Want to try to find any? > > Yes, let me try and update. This is basically the device that have an RX queue, so I've found the following drivers: scmi, mac80211_hwsim, vsock, bt, ball
Re: [PATCH v6 1/4] vdpa: Add suspend operation
On Fri, Jun 24, 2022 at 12:07 AM Eugenio Pérez wrote: > > This operation is optional: It it's not implemented, backend feature bit > will not be exposed. A question, do we allow suspending a device without DRIVER_OK? Thanks > > Signed-off-by: Eugenio Pérez > --- > include/linux/vdpa.h | 4 > 1 file changed, 4 insertions(+) > > diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h > index 7b4a13d3bd91..d282f464d2f1 100644 > --- a/include/linux/vdpa.h > +++ b/include/linux/vdpa.h > @@ -218,6 +218,9 @@ struct vdpa_map_file { > * @reset: Reset device > * @vdev: vdpa device > * Returns integer: success (0) or error (< 0) > + * @suspend: Suspend or resume the device (optional) > + * @vdev: vdpa device > + * Returns integer: success (0) or error (< 0) > * @get_config_size: Get the size of the configuration space > includes > * fields that are conditional on feature bits. > * @vdev: vdpa device > @@ -319,6 +322,7 @@ struct vdpa_config_ops { > u8 (*get_status)(struct vdpa_device *vdev); > void (*set_status)(struct vdpa_device *vdev, u8 status); > int (*reset)(struct vdpa_device *vdev); > + int (*suspend)(struct vdpa_device *vdev); > size_t (*get_config_size)(struct vdpa_device *vdev); > void (*get_config)(struct vdpa_device *vdev, unsigned int offset, >void *buf, unsigned int len); > -- > 2.31.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 2/4] vhost-vdpa: introduce SUSPEND backend feature bit
On Fri, Jun 24, 2022 at 12:08 AM Eugenio Pérez wrote: > > Userland knows if it can suspend the device or not by checking this feature > bit. > > It's only offered if the vdpa driver backend implements the suspend() > operation callback, and to offer it or userland to ack it if the backend > does not offer that callback is an error. > > Signed-off-by: Eugenio Pérez > --- > drivers/vhost/vdpa.c | 16 +++- > include/uapi/linux/vhost_types.h | 2 ++ > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 23dcbfdfa13b..3d636e192061 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -347,6 +347,14 @@ static long vhost_vdpa_set_config(struct vhost_vdpa *v, > return 0; > } > > +static bool vhost_vdpa_can_suspend(const struct vhost_vdpa *v) > +{ > + struct vdpa_device *vdpa = v->vdpa; > + const struct vdpa_config_ops *ops = vdpa->config; > + > + return ops->suspend; > +} > + > static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user > *featurep) > { > struct vdpa_device *vdpa = v->vdpa; > @@ -577,7 +585,11 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > if (cmd == VHOST_SET_BACKEND_FEATURES) { > if (copy_from_user(&features, featurep, sizeof(features))) > return -EFAULT; > - if (features & ~VHOST_VDPA_BACKEND_FEATURES) > + if (features & ~(VHOST_VDPA_BACKEND_FEATURES | > +BIT_ULL(VHOST_BACKEND_F_SUSPEND))) > + return -EOPNOTSUPP; > + if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) && > +!vhost_vdpa_can_suspend(v)) Do we need to advertise this to the management? Thanks > return -EOPNOTSUPP; > vhost_set_backend_features(&v->vdev, features); > return 0; > @@ -628,6 +640,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > break; > case VHOST_GET_BACKEND_FEATURES: > features = VHOST_VDPA_BACKEND_FEATURES; > + if (vhost_vdpa_can_suspend(v)) > + features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND); > if (copy_to_user(featurep, &features, sizeof(features))) > r = -EFAULT; > break; > diff --git a/include/uapi/linux/vhost_types.h > b/include/uapi/linux/vhost_types.h > index 634cee485abb..1bdd6e363f4c 100644 > --- a/include/uapi/linux/vhost_types.h > +++ b/include/uapi/linux/vhost_types.h > @@ -161,5 +161,7 @@ struct vhost_vdpa_iova_range { > * message > */ > #define VHOST_BACKEND_F_IOTLB_ASID 0x3 > +/* Device can be suspended */ > +#define VHOST_BACKEND_F_SUSPEND 0x4 > > #endif > -- > 2.31.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 3/4] vhost-vdpa: uAPI to suspend the device
On Fri, Jun 24, 2022 at 12:08 AM Eugenio Pérez wrote: > > The ioctl adds support for suspending the device from userspace. > > This is a must before getting virtqueue indexes (base) for live migration, > since the device could modify them after userland gets them. There are > individual ways to perform that action for some devices > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no > way to perform it for any vhost device (and, in particular, vhost-vdpa). > > After a successful return of the ioctl call the device must not process > more virtqueue descriptors. The device can answer to read or writes of > config fields as if it were not suspended. In particular, writing to > "queue_enable" with a value of 1 will not make the device start > processing buffers of the virtqueue. > > Signed-off-by: Eugenio Pérez > --- > drivers/vhost/vdpa.c | 19 +++ > include/uapi/linux/vhost.h | 14 ++ > 2 files changed, 33 insertions(+) > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > index 3d636e192061..7fa671ac4bdf 100644 > --- a/drivers/vhost/vdpa.c > +++ b/drivers/vhost/vdpa.c > @@ -478,6 +478,22 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa > *v, u32 __user *argp) > return 0; > } > > +/* After a successful return of ioctl the device must not process more > + * virtqueue descriptors. The device can answer to read or writes of config > + * fields as if it were not suspended. In particular, writing to > "queue_enable" > + * with a value of 1 will not make the device start processing buffers. > + */ > +static long vhost_vdpa_suspend(struct vhost_vdpa *v) > +{ > + struct vdpa_device *vdpa = v->vdpa; > + const struct vdpa_config_ops *ops = vdpa->config; > + > + if (!ops->suspend) > + return -EOPNOTSUPP; > + > + return ops->suspend(vdpa); > +} > + > static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, >void __user *argp) > { > @@ -654,6 +670,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > case VHOST_VDPA_GET_VQS_COUNT: > r = vhost_vdpa_get_vqs_count(v, argp); > break; > + case VHOST_VDPA_SUSPEND: > + r = vhost_vdpa_suspend(v); > + break; > default: > r = vhost_dev_ioctl(&v->vdev, cmd, argp); > if (r == -ENOIOCTLCMD) > diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h > index cab645d4a645..6d9f45163155 100644 > --- a/include/uapi/linux/vhost.h > +++ b/include/uapi/linux/vhost.h > @@ -171,4 +171,18 @@ > #define VHOST_VDPA_SET_GROUP_ASID _IOW(VHOST_VIRTIO, 0x7C, \ > struct vhost_vring_state) > > +/* Suspend or resume a device so it does not process virtqueue requests > anymore > + * > + * After the return of ioctl with suspend != 0, the device must finish any > + * pending operations like in flight requests. I'm not sure we should mandate the flush here. This probably blocks us from adding inflight descriptor reporting in the future. Thanks It must also preserve all the > + * necessary state (the virtqueue vring base plus the possible device > specific > + * states) that is required for restoring in the future. The device must not > + * change its configuration after that point. > + * > + * After the return of ioctl with suspend == 0, the device can continue > + * processing buffers as long as typical conditions are met (vq is enabled, > + * DRIVER_OK status bit is enabled, etc). > + */ > +#define VHOST_VDPA_SUSPEND _IOW(VHOST_VIRTIO, 0x7D, int) > + > #endif > -- > 2.31.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 4/4] vdpa_sim: Implement suspend vdpa op
On Fri, Jun 24, 2022 at 12:08 AM Eugenio Pérez wrote: > > Implement suspend operation for vdpa_sim devices, so vhost-vdpa will > offer that backend feature and userspace can effectively suspend the > device. > > This is a must before get virtqueue indexes (base) for live migration, > since the device could modify them after userland gets them. There are > individual ways to perform that action for some devices > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no > way to perform it for any vhost device (and, in particular, vhost-vdpa). > > Reviewed-by: Stefano Garzarella > Signed-off-by: Eugenio Pérez > --- > drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 + > drivers/vdpa/vdpa_sim/vdpa_sim.h | 1 + > drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 3 +++ > drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 3 +++ > 4 files changed, 28 insertions(+) > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c > b/drivers/vdpa/vdpa_sim/vdpa_sim.c > index 0f2865899647..213883487f9b 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c > @@ -107,6 +107,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim) > for (i = 0; i < vdpasim->dev_attr.nas; i++) > vhost_iotlb_reset(&vdpasim->iommu[i]); > > + vdpasim->running = true; > spin_unlock(&vdpasim->iommu_lock); > > vdpasim->features = 0; > @@ -505,6 +506,24 @@ static int vdpasim_reset(struct vdpa_device *vdpa) > return 0; > } > > +static int vdpasim_suspend(struct vdpa_device *vdpa) > +{ > + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); > + int i; > + > + spin_lock(&vdpasim->lock); > + vdpasim->running = false; > + if (vdpasim->running) { > + /* Check for missed buffers */ > + for (i = 0; i < vdpasim->dev_attr.nvqs; ++i) > + vdpasim_kick_vq(vdpa, i); This seems only valid if we allow resuming? Thanks > + > + } > + spin_unlock(&vdpasim->lock); > + > + return 0; > +} > + > static size_t vdpasim_get_config_size(struct vdpa_device *vdpa) > { > struct vdpasim *vdpasim = vdpa_to_sim(vdpa); > @@ -694,6 +713,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = { > .get_status = vdpasim_get_status, > .set_status = vdpasim_set_status, > .reset = vdpasim_reset, > + .suspend= vdpasim_suspend, > .get_config_size= vdpasim_get_config_size, > .get_config = vdpasim_get_config, > .set_config = vdpasim_set_config, > @@ -726,6 +746,7 @@ static const struct vdpa_config_ops > vdpasim_batch_config_ops = { > .get_status = vdpasim_get_status, > .set_status = vdpasim_set_status, > .reset = vdpasim_reset, > + .suspend= vdpasim_suspend, > .get_config_size= vdpasim_get_config_size, > .get_config = vdpasim_get_config, > .set_config = vdpasim_set_config, > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h > b/drivers/vdpa/vdpa_sim/vdpa_sim.h > index 622782e92239..061986f30911 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h > @@ -66,6 +66,7 @@ struct vdpasim { > u32 generation; > u64 features; > u32 groups; > + bool running; > /* spinlock to synchronize iommu table */ > spinlock_t iommu_lock; > }; > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > index 42d401d43911..bcdb1982c378 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c > @@ -204,6 +204,9 @@ static void vdpasim_blk_work(struct work_struct *work) > if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) > goto out; > > + if (!vdpasim->running) > + goto out; > + > for (i = 0; i < VDPASIM_BLK_VQ_NUM; i++) { > struct vdpasim_virtqueue *vq = &vdpasim->vqs[i]; > > diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > index 5125976a4df8..886449e88502 100644 > --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c > @@ -154,6 +154,9 @@ static void vdpasim_net_work(struct work_struct *work) > > spin_lock(&vdpasim->lock); > > + if (!vdpasim->running) > + goto out; > + > if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) > goto out; > > -- > 2.31.1 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH V3] virtio: disable notification hardening by default
On Wed, Jun 29, 2022 at 12:07:11PM +0800, Jason Wang wrote: > On Tue, Jun 28, 2022 at 2:17 PM Jason Wang wrote: > > > > On Tue, Jun 28, 2022 at 1:00 PM Michael S. Tsirkin wrote: > > > > > > On Tue, Jun 28, 2022 at 11:49:12AM +0800, Jason Wang wrote: > > > > > Heh. Yea sure. But things work fine for people. What is the chance > > > > > your review found and fixed all driver bugs? > > > > > > > > I don't/can't audit all bugs but the race between open/close against > > > > ready/reset. It looks to me a good chance to fix them all but if you > > > > think differently, let me know > > > > > > > > > After two attempts > > > > > I don't feel like hoping audit will fix all bugs. > > > > > > > > I've started the auditing and have 15+ patches in the queue. (only > > > > covers bluetooth, console, pmem, virtio-net and caif). Spotting the > > > > issue is not hard but the testing, It would take at least the time of > > > > one release to finalize I guess. > > > > > > Absolutely. So I am looking for a way to implement hardening that does > > > not break existing drivers. > > > > I totally agree with you to seek a way without bothering the drivers. > > Just wonder if this is possbile. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The reason config was kind of easy is that config interrupt is > > > > > > > rarely > > > > > > > vital for device function so arbitrarily deferring that does not > > > > > > > lead to > > > > > > > deadlocks - what you are trying to do with VQ interrupts is > > > > > > > fundamentally different. Things are especially bad if we just drop > > > > > > > an interrupt but deferring can lead to problems too. > > > > > > > > > > > > I'm not sure I see the difference, disable_irq() stuffs also delay > > > > > > the > > > > > > interrupt processing until enable_irq(). > > > > > > > > > > > > > > > Absolutely. I am not at all sure disable_irq fixes all problems. > > > > > > > > > > > > > > > > > > > Consider as an example > > > > > > > virtio-net: fix race between ndo_open() and > > > > > > > virtio_device_ready() > > > > > > > if you just defer vq interrupts you get deadlocks. > > > > > > > > > > > > > > > > > > > > > > > > > > I don't see a deadlock here, maybe you can show more detail on this? > > > > > > > > > > What I mean is this: if we revert the above commit, things still > > > > > work (out of spec, but still). If we revert and defer interrupts until > > > > > device ready then ndo_open that triggers before device ready > > > > > deadlocks. > > > > > > > > Ok, I guess you meant on a hypervisor that is strictly written with > > > > spec. > > > > > > I mean on hypervisor that starts processing queues after getting a kick > > > even without DRIVER_OK. > > > > Oh right. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, thinking about all this, how about a simple per vq flag > > > > > > > meaning > > > > > > > "this vq was kicked since reset"? > > > > > > > > > > > > And ignore the notification if vq is not kicked? It sounds like the > > > > > > callback needs to be synchronized with the kick. > > > > > > > > > > Note we only need to synchronize it when it changes, which is > > > > > only during initialization and reset. > > > > > > > > Yes. > > > > > > > > > > > > > > > > > > > > > > > > > > > > If driver does not kick then it's not ready to get callbacks, > > > > > > > right? > > > > > > > > > > > > > > Sounds quite clean, but we need to think through memory ordering > > > > > > > concerns - I guess it's only when we change the value so > > > > > > > if (!vq->kicked) { > > > > > > > vq->kicked = true; > > > > > > > mb(); > > > > > > > } > > > > > > > > > > > > > > will do the trick, right? > > > > > > > > > > > > There's no much difference with the existing approach: > > > > > > > > > > > > 1) your proposal implicitly makes callbacks ready in > > > > > > virtqueue_kick() > > > > > > 2) my proposal explicitly makes callbacks ready via > > > > > > virtio_device_ready() > > > > > > > > > > > > Both require careful auditing of all the existing drivers to make > > > > > > sure > > > > > > no kick before DRIVER_OK. > > > > > > > > > > Jason, kick before DRIVER_OK is out of spec, sure. But it is unrelated > > > > > to hardening > > > > > > > > Yes but with your proposal, it seems to couple kick with DRIVER_OK > > > > somehow. > > > > > > I don't see how - my proposal ignores DRIVER_OK issues. > > > > Yes, what I meant is, in your proposal, the first kick after rest is a > > hint that the driver is ok (but actually it could not). > > > > > > > > > > and in absence of config interrupts is generally easily > > > > > fixed just by sticking virtio_device_ready early in initialization. > > > > > > > > So if the kick is done before the subsystem registration, there's > > > > still a window in the middle (assuming we stick virtio_device_ready() > > > > early): > > > > > > > > virtio_device_ready() > > > > virtqueue_ki
[PATCH v11 01/40] virtio: add helper virtqueue_get_vring_max_size()
Record the maximum queue num supported by the device. virtio-net can display the maximum (supported by hardware) ring size in ethtool -g eth0. When the subsequent patch implements vring reset, it can judge whether the ring size passed by the driver is legal based on this. Signed-off-by: Xuan Zhuo --- arch/um/drivers/virtio_uml.c | 1 + drivers/platform/mellanox/mlxbf-tmfifo.c | 2 ++ drivers/remoteproc/remoteproc_virtio.c | 2 ++ drivers/s390/virtio/virtio_ccw.c | 3 +++ drivers/virtio/virtio_mmio.c | 2 ++ drivers/virtio/virtio_pci_legacy.c | 2 ++ drivers/virtio/virtio_pci_modern.c | 2 ++ drivers/virtio/virtio_ring.c | 14 ++ drivers/virtio/virtio_vdpa.c | 2 ++ include/linux/virtio.h | 2 ++ 10 files changed, 32 insertions(+) diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c index 82ff3785bf69..e719af8bdf56 100644 --- a/arch/um/drivers/virtio_uml.c +++ b/arch/um/drivers/virtio_uml.c @@ -958,6 +958,7 @@ static struct virtqueue *vu_setup_vq(struct virtio_device *vdev, goto error_create; } vq->priv = info; + vq->num_max = num; num = virtqueue_get_vring_size(vq); if (vu_dev->protocol_features & diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c index 38800e86ed8a..1ae3c56b66b0 100644 --- a/drivers/platform/mellanox/mlxbf-tmfifo.c +++ b/drivers/platform/mellanox/mlxbf-tmfifo.c @@ -959,6 +959,8 @@ static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev, goto error; } + vq->num_max = vring->num; + vqs[i] = vq; vring->vq = vq; vq->priv = vring; diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c index d43d74733f0a..0f7706e23eb9 100644 --- a/drivers/remoteproc/remoteproc_virtio.c +++ b/drivers/remoteproc/remoteproc_virtio.c @@ -125,6 +125,8 @@ static struct virtqueue *rp_find_vq(struct virtio_device *vdev, return ERR_PTR(-ENOMEM); } + vq->num_max = num; + rvring->vq = vq; vq->priv = rvring; diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 161d3b141f0d..6b86d0280d6b 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -530,6 +530,9 @@ static struct virtqueue *virtio_ccw_setup_vq(struct virtio_device *vdev, err = -ENOMEM; goto out_err; } + + vq->num_max = info->num; + /* it may have been reduced */ info->num = virtqueue_get_vring_size(vq); diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 083ff1eb743d..a20d5a6b5819 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -403,6 +403,8 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int in goto error_new_virtqueue; } + vq->num_max = num; + /* Activate the queue */ writel(virtqueue_get_vring_size(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NUM); if (vm_dev->version == 1) { diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index a5e5721145c7..2257f1b3d8ae 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -135,6 +135,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, if (!vq) return ERR_PTR(-ENOMEM); + vq->num_max = num; + q_pfn = virtqueue_get_desc_addr(vq) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT; if (q_pfn >> 32) { dev_err(&vp_dev->pci_dev->dev, diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 623906b4996c..e7e0b8c850f6 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -218,6 +218,8 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, if (!vq) return ERR_PTR(-ENOMEM); + vq->num_max = num; + /* activate the queue */ vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq)); vp_modern_queue_address(mdev, index, virtqueue_get_desc_addr(vq), diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index a5ec724c01d8..4cac600856ad 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2385,6 +2385,20 @@ void vring_transport_features(struct virtio_device *vdev) } EXPORT_SYMBOL_GPL(vring_transport_features); +/** + * virtqueue_get_vring_max_size - return the max size of the virtqueue's vring + * @_vq: the struct virtqueue containing the vring of interest. + * + * Returns the max size of the vring. + * + * Unlike other operations, this need not be serialized. + */ +unsigned int virtqueue_get_vring_ma
[PATCH v11 00/40] virtio pci support VIRTIO_F_RING_RESET
The virtio spec already supports the virtio queue reset function. This patch set is to add this function to the kernel. The relevant virtio spec information is here: https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Also regarding MMIO support for queue reset, I plan to support it after this patch is passed. This patch set implements the refactoring of vring. Finally, the virtuque_resize() interface is provided based on the reset function of the transport layer. Test environment: Host: 4.19.91 Qemu: QEMU emulator version 6.2.50 (with vq reset support) Test Cmd: ethtool -G eth1 rx $1 tx $2; ethtool -g eth1 The default is split mode, modify Qemu virtio-net to add PACKED feature to test packed mode. Qemu code: https://github.com/fengidri/qemu/compare/89f3bfa3265554d1d591ee4d7f1197b6e3397e84...master In order to simplify the review of this patch set, the function of reusing the old buffers after resize will be introduced in subsequent patch sets. Please review. Thanks. v11: 1. struct virtio_pci_common_cfg to virtio_pci_modern.h 2. conflict resolution v10: 1. on top of the harden vring IRQ 2. factor out split and packed from struct vring_virtqueue 3. some suggest from @Jason Wang v9: 1. Provide a virtqueue_resize() interface directly 2. A patch set including vring resize, virtio pci reset, virtio-net resize 3. No more separate structs v8: 1. Provide a virtqueue_reset() interface directly 2. Split the two patch sets, this is the first part 3. Add independent allocation helper for allocating state, extra v7: 1. fix #6 subject typo 2. fix #6 ring_size_in_bytes is uninitialized 3. check by: make W=12 v6: 1. virtio_pci: use synchronize_irq(irq) to sync the irq callbacks 2. Introduce virtqueue_reset_vring() to implement the reset of vring during the reset process. May use the old vring if num of the vq not change. 3. find_vqs() support sizes to special the max size of each vq v5: 1. add virtio-net support set_ringparam v4: 1. just the code of virtio, without virtio-net 2. Performing reset on a queue is divided into these steps: 1. reset_vq: reset one vq 2. recycle the buffer from vq by virtqueue_detach_unused_buf() 3. release the ring of the vq by vring_release_virtqueue() 4. enable_reset_vq: re-enable the reset queue 3. Simplify the parameters of enable_reset_vq() 4. add container structures for virtio_pci_common_cfg v3: 1. keep vq, irq unreleased Xuan Zhuo (40): virtio: add helper virtqueue_get_vring_max_size() virtio: struct virtio_config_ops add callbacks for queue_reset virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset virtio_ring: extract the logic of freeing vring virtio_ring: split vring_virtqueue virtio_ring: introduce virtqueue_init() virtio_ring: split: introduce vring_free_split() virtio_ring: split: extract the logic of alloc queue virtio_ring: split: extract the logic of alloc state and extra virtio_ring: split: extract the logic of attach vring virtio_ring: split: extract the logic of vring init virtio_ring: split: introduce virtqueue_reinit_split() virtio_ring: split: reserve vring_align, may_reduce_num virtio_ring: split: introduce virtqueue_resize_split() virtio_ring: packed: introduce vring_free_packed virtio_ring: packed: extract the logic of alloc queue virtio_ring: packed: extract the logic of alloc state and extra virtio_ring: packed: extract the logic of attach vring virtio_ring: packed: extract the logic of vring init virtio_ring: packed: introduce virtqueue_reinit_packed() virtio_ring: packed: introduce virtqueue_resize_packed() virtio_ring: introduce virtqueue_resize() virtio_pci: move struct virtio_pci_common_cfg to virtio_pci_modern.h virtio_pci: struct virtio_pci_common_cfg add queue_notify_data virtio: allow to unbreak/break virtqueue individually virtio: queue_reset: add VIRTIO_F_RING_RESET virtio_pci: struct virtio_pci_common_cfg add queue_reset virtio_pci: introduce helper to get/set queue reset virtio_pci: extract the logic of active vq for modern pci virtio_pci: support VIRTIO_F_RING_RESET virtio: find_vqs() add arg sizes virtio_pci: support the arg sizes of find_vqs() virtio_mmio: support the arg sizes of find_vqs() virtio: add helper virtio_find_vqs_ctx_size() virtio_net: set the default max ring size by find_vqs() virtio_net: get ringparam by virtqueue_get_vring_max_size() virtio_net: split free_unused_bufs() virtio_net: support rx queue resize virtio_net: support tx queue resize virtio_net: support set_ringparam arch/um/drivers/virtio_uml.c | 3 +- drivers/net/virtio_net.c | 209 +- drivers/platform/mellanox/mlxbf-tmfifo.c | 3 + drivers/remoteproc/remoteproc_virtio.c | 3 + drivers/s390/virtio/virtio_ccw.c | 4 + drivers/virtio/virtio_mmio
[PATCH v11 03/40] virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset
Added documentation for virtqueue_detach_unused_buf, allowing it to be called on queue reset. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 4cac600856ad..4ed51bc05a51 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2130,8 +2130,8 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed); * @_vq: the struct virtqueue we're talking about. * * Returns NULL or the "data" token handed to virtqueue_add_*(). - * This is not valid on an active queue; it is useful only for device - * shutdown. + * This is not valid on an active queue; it is useful for device + * shutdown or the reset queue. */ void *virtqueue_detach_unused_buf(struct virtqueue *_vq) { -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 02/40] virtio: struct virtio_config_ops add callbacks for queue_reset
reset can be divided into the following four steps (example): 1. transport: notify the device to reset the queue 2. vring: recycle the buffer submitted 3. vring: reset/resize the vring (may re-alloc) 4. transport: mmap vring to device, and enable the queue In order to support queue reset, add two callbacks(reset_vq, enable_reset_vq) in struct virtio_config_ops to implement steps 1 and 4. Signed-off-by: Xuan Zhuo --- include/linux/virtio_config.h | 12 1 file changed, 12 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index b47c2e7ed0ee..ded51b0d4823 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -78,6 +78,16 @@ struct virtio_shm_region { * @set_vq_affinity: set the affinity for a virtqueue (optional). * @get_vq_affinity: get the affinity for a virtqueue (optional). * @get_shm_region: get a shared memory region based on the index. + * @reset_vq: reset a queue individually (optional). + * vq: the virtqueue + * Returns 0 on success or error status + * reset_vq will guarantee that the callbacks are disabled and synchronized. + * Except for the callback, the caller should guarantee that the vring is + * not accessed by any functions of virtqueue. + * @enable_reset_vq: enable a reset queue + * vq: the virtqueue + * Returns 0 on success or error status + * If reset_vq is set, then enable_reset_vq must also be set. */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -104,6 +114,8 @@ struct virtio_config_ops { int index); bool (*get_shm_region)(struct virtio_device *vdev, struct virtio_shm_region *region, u8 id); + int (*reset_vq)(struct virtqueue *vq); + int (*enable_reset_vq)(struct virtqueue *vq); }; /* If driver didn't advertise the feature, it will never appear. */ -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 04/40] virtio_ring: extract the logic of freeing vring
Introduce vring_free() to free the vring of vq. Subsequent patches will use vring_free() alone. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 4ed51bc05a51..bb4e8ae09c9b 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2316,14 +2316,10 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, } EXPORT_SYMBOL_GPL(vring_new_virtqueue); -void vring_del_virtqueue(struct virtqueue *_vq) +static void vring_free(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); - spin_lock(&vq->vq.vdev->vqs_list_lock); - list_del(&_vq->list); - spin_unlock(&vq->vq.vdev->vqs_list_lock); - if (vq->we_own_ring) { if (vq->packed_ring) { vring_free_queue(vq->vq.vdev, @@ -2354,6 +2350,18 @@ void vring_del_virtqueue(struct virtqueue *_vq) kfree(vq->split.desc_state); kfree(vq->split.desc_extra); } +} + +void vring_del_virtqueue(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + spin_lock(&vq->vq.vdev->vqs_list_lock); + list_del(&_vq->list); + spin_unlock(&vq->vq.vdev->vqs_list_lock); + + vring_free(_vq); + kfree(vq); } EXPORT_SYMBOL_GPL(vring_del_virtqueue); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 05/40] virtio_ring: split vring_virtqueue
Separate the two inline structures(split and packed) from the structure vring_virtqueue. In this way, we can use these two structures later to pass parameters and retain temporary variables. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 116 ++- 1 file changed, 60 insertions(+), 56 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index bb4e8ae09c9b..2806e033a651 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -85,6 +85,64 @@ struct vring_desc_extra { u16 next; /* The next desc state in a list. */ }; +struct vring_virtqueue_split { + /* Actual memory layout for this queue. */ + struct vring vring; + + /* Last written value to avail->flags */ + u16 avail_flags_shadow; + + /* +* Last written value to avail->idx in +* guest byte order. +*/ + u16 avail_idx_shadow; + + /* Per-descriptor state. */ + struct vring_desc_state_split *desc_state; + struct vring_desc_extra *desc_extra; + + /* DMA address and size information */ + dma_addr_t queue_dma_addr; + size_t queue_size_in_bytes; +}; + +struct vring_virtqueue_packed { + /* Actual memory layout for this queue. */ + struct { + unsigned int num; + struct vring_packed_desc *desc; + struct vring_packed_desc_event *driver; + struct vring_packed_desc_event *device; + } vring; + + /* Driver ring wrap counter. */ + bool avail_wrap_counter; + + /* Avail used flags. */ + u16 avail_used_flags; + + /* Index of the next avail descriptor. */ + u16 next_avail_idx; + + /* +* Last written value to driver->flags in +* guest byte order. +*/ + u16 event_flags_shadow; + + /* Per-descriptor state. */ + struct vring_desc_state_packed *desc_state; + struct vring_desc_extra *desc_extra; + + /* DMA address and size information */ + dma_addr_t ring_dma_addr; + dma_addr_t driver_event_dma_addr; + dma_addr_t device_event_dma_addr; + size_t ring_size_in_bytes; + size_t event_size_in_bytes; +}; + struct vring_virtqueue { struct virtqueue vq; @@ -124,64 +182,10 @@ struct vring_virtqueue { union { /* Available for split ring */ - struct { - /* Actual memory layout for this queue. */ - struct vring vring; - - /* Last written value to avail->flags */ - u16 avail_flags_shadow; - - /* -* Last written value to avail->idx in -* guest byte order. -*/ - u16 avail_idx_shadow; - - /* Per-descriptor state. */ - struct vring_desc_state_split *desc_state; - struct vring_desc_extra *desc_extra; - - /* DMA address and size information */ - dma_addr_t queue_dma_addr; - size_t queue_size_in_bytes; - } split; + struct vring_virtqueue_split split; /* Available for packed ring */ - struct { - /* Actual memory layout for this queue. */ - struct { - unsigned int num; - struct vring_packed_desc *desc; - struct vring_packed_desc_event *driver; - struct vring_packed_desc_event *device; - } vring; - - /* Driver ring wrap counter. */ - bool avail_wrap_counter; - - /* Avail used flags. */ - u16 avail_used_flags; - - /* Index of the next avail descriptor. */ - u16 next_avail_idx; - - /* -* Last written value to driver->flags in -* guest byte order. -*/ - u16 event_flags_shadow; - - /* Per-descriptor state. */ - struct vring_desc_state_packed *desc_state; - struct vring_desc_extra *desc_extra; - - /* DMA address and size information */ - dma_addr_t ring_dma_addr; - dma_addr_t driver_event_dma_addr; - dma_addr_t device_event_dma_addr; - size_t ring_size_in_bytes; - size_t event_size_in_bytes; - } packed; + struct vring_virtqueue_packed packed; }; /* How to notify other side. FIXME: c
[PATCH v11 07/40] virtio_ring: split: introduce vring_free_split()
Free the structure struct vring_vritqueue_split. Subsequent patches require it. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 986dbd9294d6..49d61e412dc6 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -939,6 +939,16 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) return NULL; } +static void vring_free_split(struct vring_virtqueue_split *vring, +struct virtio_device *vdev) +{ + vring_free_queue(vdev, vring->queue_size_in_bytes, vring->vring.desc, +vring->queue_dma_addr); + + kfree(vring->desc_state); + kfree(vring->desc_extra); +} + static struct virtqueue *vring_create_virtqueue_split( unsigned int index, unsigned int num, -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 08/40] virtio_ring: split: extract the logic of alloc queue
Separate the logic of split to create vring queue. This feature is required for subsequent virtuqueue reset vring. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 68 ++-- 1 file changed, 42 insertions(+), 26 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 49d61e412dc6..a9ceb9c16c54 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -949,28 +949,19 @@ static void vring_free_split(struct vring_virtqueue_split *vring, kfree(vring->desc_extra); } -static struct virtqueue *vring_create_virtqueue_split( - unsigned int index, - unsigned int num, - unsigned int vring_align, - struct virtio_device *vdev, - bool weak_barriers, - bool may_reduce_num, - bool context, - bool (*notify)(struct virtqueue *), - void (*callback)(struct virtqueue *), - const char *name) +static int vring_alloc_queue_split(struct vring_virtqueue_split *vring, + struct virtio_device *vdev, + u32 num, + unsigned int vring_align, + bool may_reduce_num) { - struct virtqueue *vq; void *queue = NULL; dma_addr_t dma_addr; - size_t queue_size_in_bytes; - struct vring vring; /* We assume num is a power of 2. */ if (num & (num - 1)) { dev_warn(&vdev->dev, "Bad virtqueue length %u\n", num); - return NULL; + return -EINVAL; } /* TODO: allocate each queue chunk individually */ @@ -981,11 +972,11 @@ static struct virtqueue *vring_create_virtqueue_split( if (queue) break; if (!may_reduce_num) - return NULL; + return -ENOMEM; } if (!num) - return NULL; + return -ENOMEM; if (!queue) { /* Try to get a single page. You are my only hope! */ @@ -993,21 +984,46 @@ static struct virtqueue *vring_create_virtqueue_split( &dma_addr, GFP_KERNEL|__GFP_ZERO); } if (!queue) - return NULL; + return -ENOMEM; + + vring_init(&vring->vring, num, queue, vring_align); - queue_size_in_bytes = vring_size(num, vring_align); - vring_init(&vring, num, queue, vring_align); + vring->queue_dma_addr = dma_addr; + vring->queue_size_in_bytes = vring_size(num, vring_align); + + return 0; +} + +static struct virtqueue *vring_create_virtqueue_split( + unsigned int index, + unsigned int num, + unsigned int vring_align, + struct virtio_device *vdev, + bool weak_barriers, + bool may_reduce_num, + bool context, + bool (*notify)(struct virtqueue *), + void (*callback)(struct virtqueue *), + const char *name) +{ + struct vring_virtqueue_split vring = {}; + struct virtqueue *vq; + int err; + + err = vring_alloc_queue_split(&vring, vdev, num, vring_align, + may_reduce_num); + if (err) + return NULL; - vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context, - notify, callback, name); + vq = __vring_new_virtqueue(index, vring.vring, vdev, weak_barriers, + context, notify, callback, name); if (!vq) { - vring_free_queue(vdev, queue_size_in_bytes, queue, -dma_addr); + vring_free_split(&vring, vdev); return NULL; } - to_vvq(vq)->split.queue_dma_addr = dma_addr; - to_vvq(vq)->split.queue_size_in_bytes = queue_size_in_bytes; + to_vvq(vq)->split.queue_dma_addr = vring.queue_dma_addr; + to_vvq(vq)->split.queue_size_in_bytes = vring.queue_size_in_bytes; to_vvq(vq)->we_own_ring = true; return vq; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 06/40] virtio_ring: introduce virtqueue_init()
Separate the logic of virtqueue initialization. This logic is irrelevant to ring layout. This logic can be called independently when implementing resize/reset later. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 61 ++-- 1 file changed, 31 insertions(+), 30 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 2806e033a651..986dbd9294d6 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -368,6 +368,34 @@ static int vring_mapping_error(const struct vring_virtqueue *vq, return dma_mapping_error(vring_dma_dev(vq), addr); } +static void virtqueue_init(struct vring_virtqueue *vq, u32 num) +{ + struct virtio_device *vdev; + + vdev = vq->vq.vdev; + + vq->vq.num_free = num; + if (vq->packed_ring) + vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR); + else + vq->last_used_idx = 0; + vq->event_triggered = false; + vq->num_added = 0; + vq->use_dma_api = vring_use_dma_api(vdev); +#ifdef DEBUG + vq->in_use = false; + vq->last_add_time_valid = false; +#endif + + vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); + + if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) + vq->weak_barriers = false; + + /* Put everything in free lists. */ + vq->free_head = 0; +} + /* * Split ring specific functions - *_split(). @@ -1706,7 +1734,6 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->vq.callback = callback; vq->vq.vdev = vdev; vq->vq.name = name; - vq->vq.num_free = num; vq->vq.index = index; vq->we_own_ring = true; vq->notify = notify; @@ -1716,22 +1743,10 @@ static struct virtqueue *vring_create_virtqueue_packed( #else vq->broken = false; #endif - vq->last_used_idx = 0 | (1 << VRING_PACKED_EVENT_F_WRAP_CTR); - vq->event_triggered = false; - vq->num_added = 0; vq->packed_ring = true; - vq->use_dma_api = vring_use_dma_api(vdev); -#ifdef DEBUG - vq->in_use = false; - vq->last_add_time_valid = false; -#endif vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); - - if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) - vq->weak_barriers = false; vq->packed.ring_dma_addr = ring_dma_addr; vq->packed.driver_event_dma_addr = driver_event_dma_addr; @@ -1759,8 +1774,7 @@ static struct virtqueue *vring_create_virtqueue_packed( memset(vq->packed.desc_state, 0, num * sizeof(struct vring_desc_state_packed)); - /* Put everything in free lists. */ - vq->free_head = 0; + virtqueue_init(vq, num); vq->packed.desc_extra = vring_alloc_desc_extra(num); if (!vq->packed.desc_extra) @@ -2205,7 +2219,6 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->vq.callback = callback; vq->vq.vdev = vdev; vq->vq.name = name; - vq->vq.num_free = vring.num; vq->vq.index = index; vq->we_own_ring = false; vq->notify = notify; @@ -2215,21 +2228,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, #else vq->broken = false; #endif - vq->last_used_idx = 0; - vq->event_triggered = false; - vq->num_added = 0; - vq->use_dma_api = vring_use_dma_api(vdev); -#ifdef DEBUG - vq->in_use = false; - vq->last_add_time_valid = false; -#endif vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); - - if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) - vq->weak_barriers = false; vq->split.queue_dma_addr = 0; vq->split.queue_size_in_bytes = 0; @@ -2255,11 +2256,11 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, if (!vq->split.desc_extra) goto err_extra; - /* Put everything in free lists. */ - vq->free_head = 0; memset(vq->split.desc_state, 0, vring.num * sizeof(struct vring_desc_state_split)); + virtqueue_init(vq, vq->split.vring.num); + spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); spin_unlock(&vdev->vqs_list_lock); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 10/40] virtio_ring: split: extract the logic of attach vring
Separate the logic of attach vring, subsequent patches will call it separately. Since the "struct vring_virtqueue_split split" is created on the stack and has been initialized to 0. So using split->queue_dma_addr/split->queue_size_in_bytes assignment for queue_dma_addr/queue_size_in_bytes can keep the same as the original code. On the other hand, subsequent patches can use the "struct vring_virtqueue_split split" obtained by vring_alloc_queue_split() to directly complete the attach operation. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index cedd340d6db7..9025bd373d3b 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -940,6 +940,18 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) return NULL; } +static void virtqueue_vring_attach_split(struct vring_virtqueue *vq, +struct vring_virtqueue_split *vring) +{ + vq->split.queue_dma_addr = vring->queue_dma_addr; + vq->split.queue_size_in_bytes = vring->queue_size_in_bytes; + + vq->split.vring = vring->vring; + + vq->split.desc_state = vring->desc_state; + vq->split.desc_extra = vring->desc_extra; +} + static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring) { struct vring_desc_state_split *state; @@ -2287,10 +2299,6 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->split.queue_dma_addr = 0; - vq->split.queue_size_in_bytes = 0; - - vq->split.vring = _vring; vq->split.avail_flags_shadow = 0; vq->split.avail_idx_shadow = 0; @@ -2310,10 +2318,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, return NULL; } - vq->split.desc_state = vring.desc_state; - vq->split.desc_extra = vring.desc_extra; - virtqueue_init(vq, vring.vring.num); + virtqueue_vring_attach_split(vq, &vring); spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 09/40] virtio_ring: split: extract the logic of alloc state and extra
Separate the logic of creating desc_state, desc_extra, and subsequent patches will call it independently. Since only the structure vring is passed into __vring_new_virtqueue(), when creating the function vring_alloc_state_extra_split(), we prefer to use vring_virtqueue_split as a parameter, and it will be more convenient to pass vring_virtqueue_split to some subsequent functions. So a new vring_virtqueue_split variable is added in __vring_new_virtqueue(). Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 58 +--- 1 file changed, 40 insertions(+), 18 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index a9ceb9c16c54..cedd340d6db7 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -204,6 +204,7 @@ struct vring_virtqueue { #endif }; +static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num); /* * Helpers. @@ -939,6 +940,32 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) return NULL; } +static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring) +{ + struct vring_desc_state_split *state; + struct vring_desc_extra *extra; + u32 num = vring->vring.num; + + state = kmalloc_array(num, sizeof(struct vring_desc_state_split), GFP_KERNEL); + if (!state) + goto err_state; + + extra = vring_alloc_desc_extra(num); + if (!extra) + goto err_extra; + + memset(state, 0, num * sizeof(struct vring_desc_state_split)); + + vring->desc_state = state; + vring->desc_extra = extra; + return 0; + +err_extra: + kfree(state); +err_state: + return -ENOMEM; +} + static void vring_free_split(struct vring_virtqueue_split *vring, struct virtio_device *vdev) { @@ -2224,7 +2251,7 @@ EXPORT_SYMBOL_GPL(vring_interrupt); /* Only available for split ring */ struct virtqueue *__vring_new_virtqueue(unsigned int index, - struct vring vring, + struct vring _vring, struct virtio_device *vdev, bool weak_barriers, bool context, @@ -2232,7 +2259,9 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, void (*callback)(struct virtqueue *), const char *name) { + struct vring_virtqueue_split vring = {}; struct vring_virtqueue *vq; + int err; if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED)) return NULL; @@ -2261,7 +2290,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->split.queue_dma_addr = 0; vq->split.queue_size_in_bytes = 0; - vq->split.vring = vring; + vq->split.vring = _vring; vq->split.avail_flags_shadow = 0; vq->split.avail_idx_shadow = 0; @@ -2273,30 +2302,23 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->split.avail_flags_shadow); } - vq->split.desc_state = kmalloc_array(vring.num, - sizeof(struct vring_desc_state_split), GFP_KERNEL); - if (!vq->split.desc_state) - goto err_state; + vring.vring = _vring; - vq->split.desc_extra = vring_alloc_desc_extra(vring.num); - if (!vq->split.desc_extra) - goto err_extra; + err = vring_alloc_state_extra_split(&vring); + if (err) { + kfree(vq); + return NULL; + } - memset(vq->split.desc_state, 0, vring.num * - sizeof(struct vring_desc_state_split)); + vq->split.desc_state = vring.desc_state; + vq->split.desc_extra = vring.desc_extra; - virtqueue_init(vq, vq->split.vring.num); + virtqueue_init(vq, vring.vring.num); spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); spin_unlock(&vdev->vqs_list_lock); return &vq->vq; - -err_extra: - kfree(vq->split.desc_state); -err_state: - kfree(vq); - return NULL; } EXPORT_SYMBOL_GPL(__vring_new_virtqueue); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 11/40] virtio_ring: split: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will call it separately. This function completes the variable initialization of split vring. It together with the logic of atatch constitutes the initialization of vring. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 30 +++--- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 9025bd373d3b..35540daaa1e7 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -940,6 +940,24 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq) return NULL; } +static void virtqueue_vring_init_split(struct vring_virtqueue *vq) +{ + struct virtio_device *vdev; + + vdev = vq->vq.vdev; + + vq->split.avail_flags_shadow = 0; + vq->split.avail_idx_shadow = 0; + + /* No callback? Tell other side not to bother us. */ + if (!vq->vq.callback) { + vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; + if (!vq->event) + vq->split.vring.avail->flags = cpu_to_virtio16(vdev, + vq->split.avail_flags_shadow); + } +} + static void virtqueue_vring_attach_split(struct vring_virtqueue *vq, struct vring_virtqueue_split *vring) { @@ -2299,17 +2317,6 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->split.avail_flags_shadow = 0; - vq->split.avail_idx_shadow = 0; - - /* No callback? Tell other side not to bother us. */ - if (!callback) { - vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT; - if (!vq->event) - vq->split.vring.avail->flags = cpu_to_virtio16(vdev, - vq->split.avail_flags_shadow); - } - vring.vring = _vring; err = vring_alloc_state_extra_split(&vring); @@ -2320,6 +2327,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index, virtqueue_init(vq, vring.vring.num); virtqueue_vring_attach_split(vq, &vring); + virtqueue_vring_init_split(vq); spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 12/40] virtio_ring: split: introduce virtqueue_reinit_split()
Introduce a function to initialize vq without allocating new ring, desc_state, desc_extra. Subsequent patches will call this function after reset vq to reinitialize vq. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_ring.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 35540daaa1e7..4c8972da5423 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -958,6 +958,25 @@ static void virtqueue_vring_init_split(struct vring_virtqueue *vq) } } +static void virtqueue_reinit_split(struct vring_virtqueue *vq) +{ + int size, i; + + memset(vq->split.vring.desc, 0, vq->split.queue_size_in_bytes); + + size = sizeof(struct vring_desc_state_split) * vq->split.vring.num; + memset(vq->split.desc_state, 0, size); + + size = sizeof(struct vring_desc_extra) * vq->split.vring.num; + memset(vq->split.desc_extra, 0, size); + + for (i = 0; i < vq->split.vring.num - 1; i++) + vq->split.desc_extra[i].next = i + 1; + + virtqueue_init(vq, vq->split.vring.num); + virtqueue_vring_init_split(vq); +} + static void virtqueue_vring_attach_split(struct vring_virtqueue *vq, struct vring_virtqueue_split *vring) { -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 13/40] virtio_ring: split: reserve vring_align, may_reduce_num
In vring_create_virtqueue_split() save vring_align, may_reduce_num to structure vring_virtqueue_split. Used to create a new vring when implementing resize . Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 4c8972da5423..9c83c5e6d5a9 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -105,6 +105,13 @@ struct vring_virtqueue_split { /* DMA address and size information */ dma_addr_t queue_dma_addr; size_t queue_size_in_bytes; + + /* +* The parameters for creating vrings are reserved for creating new +* vring. +*/ + u32 vring_align; + bool may_reduce_num; }; struct vring_virtqueue_packed { @@ -1098,6 +1105,8 @@ static struct virtqueue *vring_create_virtqueue_split( return NULL; } + to_vvq(vq)->split.vring_align = vring_align; + to_vvq(vq)->split.may_reduce_num = may_reduce_num; to_vvq(vq)->split.queue_dma_addr = vring.queue_dma_addr; to_vvq(vq)->split.queue_size_in_bytes = vring.queue_size_in_bytes; to_vvq(vq)->we_own_ring = true; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 15/40] virtio_ring: packed: introduce vring_free_packed
Free the structure struct vring_vritqueue_packed. Subsequent patches require it. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 1aaa1e5f9991..4f497b6f2d04 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1830,6 +1830,27 @@ static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num) return desc_extra; } +static void vring_free_packed(struct vring_virtqueue_packed *vring, + struct virtio_device *vdev) +{ + if (vring->vring.desc) + vring_free_queue(vdev, vring->ring_size_in_bytes, +vring->vring.desc, vring->ring_dma_addr); + + if (vring->vring.driver) + vring_free_queue(vdev, vring->event_size_in_bytes, +vring->vring.driver, +vring->driver_event_dma_addr); + + if (vring->vring.device) + vring_free_queue(vdev, vring->event_size_in_bytes, +vring->vring.device, +vring->device_event_dma_addr); + + kfree(vring->desc_state); + kfree(vring->desc_extra); +} + static struct virtqueue *vring_create_virtqueue_packed( unsigned int index, unsigned int num, -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 17/40] virtio_ring: packed: extract the logic of alloc state and extra
Separate the logic for alloc desc_state and desc_extra, which will be called separately by subsequent patches. Use struct vring_packed to pass desc_state, desc_extra. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 48 +--- 1 file changed, 34 insertions(+), 14 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 891257d9cdf8..0c4109eb6c6c 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1902,6 +1902,33 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring, return -ENOMEM; } +static int vring_alloc_state_extra_packed(struct vring_virtqueue_packed *vring) +{ + struct vring_desc_state_packed *state; + struct vring_desc_extra *extra; + u32 num = vring->vring.num; + + state = kmalloc_array(num, sizeof(struct vring_desc_state_packed), GFP_KERNEL); + if (!state) + goto err_desc_state; + + memset(state, 0, num * sizeof(struct vring_desc_state_packed)); + + extra = vring_alloc_desc_extra(num); + if (!extra) + goto err_desc_extra; + + vring->desc_state = state; + vring->desc_extra = extra; + + return 0; + +err_desc_extra: + kfree(state); +err_desc_state: + return -ENOMEM; +} + static struct virtqueue *vring_create_virtqueue_packed( unsigned int index, unsigned int num, @@ -1916,6 +1943,7 @@ static struct virtqueue *vring_create_virtqueue_packed( { struct vring_virtqueue_packed vring = {}; struct vring_virtqueue *vq; + int err; if (vring_alloc_queue_packed(&vring, vdev, num)) goto err_ring; @@ -1955,21 +1983,15 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->packed.event_flags_shadow = 0; vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL; - vq->packed.desc_state = kmalloc_array(num, - sizeof(struct vring_desc_state_packed), - GFP_KERNEL); - if (!vq->packed.desc_state) - goto err_desc_state; + err = vring_alloc_state_extra_packed(&vring); + if (err) + goto err_state_extra; - memset(vq->packed.desc_state, 0, - num * sizeof(struct vring_desc_state_packed)); + vq->packed.desc_state = vring.desc_state; + vq->packed.desc_extra = vring.desc_extra; virtqueue_init(vq, num); - vq->packed.desc_extra = vring_alloc_desc_extra(num); - if (!vq->packed.desc_extra) - goto err_desc_extra; - /* No callback? Tell other side not to bother us. */ if (!callback) { vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE; @@ -1982,9 +2004,7 @@ static struct virtqueue *vring_create_virtqueue_packed( spin_unlock(&vdev->vqs_list_lock); return &vq->vq; -err_desc_extra: - kfree(vq->packed.desc_state); -err_desc_state: +err_state_extra: kfree(vq); err_vq: vring_free_packed(&vring, vdev); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 16/40] virtio_ring: packed: extract the logic of alloc queue
Separate the logic of packed to create vring queue. For the convenience of passing parameters, add a structure vring_packed. This feature is required for subsequent virtuqueue reset vring. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 80 +++- 1 file changed, 51 insertions(+), 29 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 4f497b6f2d04..891257d9cdf8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1851,19 +1851,10 @@ static void vring_free_packed(struct vring_virtqueue_packed *vring, kfree(vring->desc_extra); } -static struct virtqueue *vring_create_virtqueue_packed( - unsigned int index, - unsigned int num, - unsigned int vring_align, - struct virtio_device *vdev, - bool weak_barriers, - bool may_reduce_num, - bool context, - bool (*notify)(struct virtqueue *), - void (*callback)(struct virtqueue *), - const char *name) +static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring, + struct virtio_device *vdev, + u32 num) { - struct vring_virtqueue *vq; struct vring_packed_desc *ring; struct vring_packed_desc_event *driver, *device; dma_addr_t ring_dma_addr, driver_event_dma_addr, device_event_dma_addr; @@ -1875,7 +1866,11 @@ static struct virtqueue *vring_create_virtqueue_packed( &ring_dma_addr, GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (!ring) - goto err_ring; + goto err; + + vring->vring.desc = ring; + vring->ring_dma_addr = ring_dma_addr; + vring->ring_size_in_bytes = ring_size_in_bytes; event_size_in_bytes = sizeof(struct vring_packed_desc_event); @@ -1883,13 +1878,47 @@ static struct virtqueue *vring_create_virtqueue_packed( &driver_event_dma_addr, GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (!driver) - goto err_driver; + goto err; + + vring->vring.driver = driver; + vring->event_size_in_bytes = event_size_in_bytes; + vring->driver_event_dma_addr = driver_event_dma_addr; device = vring_alloc_queue(vdev, event_size_in_bytes, &device_event_dma_addr, GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO); if (!device) - goto err_device; + goto err; + + vring->vring.device = device; + vring->device_event_dma_addr = device_event_dma_addr; + + vring->vring.num= num; + + return 0; + +err: + vring_free_packed(vring, vdev); + return -ENOMEM; +} + +static struct virtqueue *vring_create_virtqueue_packed( + unsigned int index, + unsigned int num, + unsigned int vring_align, + struct virtio_device *vdev, + bool weak_barriers, + bool may_reduce_num, + bool context, + bool (*notify)(struct virtqueue *), + void (*callback)(struct virtqueue *), + const char *name) +{ + struct vring_virtqueue_packed vring = {}; + struct vring_virtqueue *vq; + + if (vring_alloc_queue_packed(&vring, vdev, num)) + goto err_ring; vq = kmalloc(sizeof(*vq), GFP_KERNEL); if (!vq) @@ -1912,17 +1941,14 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->packed.ring_dma_addr = ring_dma_addr; - vq->packed.driver_event_dma_addr = driver_event_dma_addr; - vq->packed.device_event_dma_addr = device_event_dma_addr; + vq->packed.ring_dma_addr = vring.ring_dma_addr; + vq->packed.driver_event_dma_addr = vring.driver_event_dma_addr; + vq->packed.device_event_dma_addr = vring.device_event_dma_addr; - vq->packed.ring_size_in_bytes = ring_size_in_bytes; - vq->packed.event_size_in_bytes = event_size_in_bytes; + vq->packed.ring_size_in_bytes = vring.ring_size_in_bytes; + vq->packed.event_size_in_bytes = vring.event_size_in_bytes; - vq->packed.vring.num = num; - vq->packed.vring.desc = ring; - vq->packed.vring.driver = driver; - vq->packed.vring.device = device; + vq->packed.vring = vring.vring; vq->packed.next_avail_idx = 0; vq->packed.avail_wrap_counter = 1; @@ -1961,11 +1987,7 @@ static struct virtqueue *vring_create_virtqueue_packed( err_desc_state: kfree(vq); err_vq: - vring_free_queue(vdev, event_size_in_bytes, device, device_event_dma_addr); -err_device: - vring_free_queue(vdev, event_size_in_bytes, driver, driver_event_dma_addr); -err_driver: - vring_free_queue(vde
[PATCH v11 14/40] virtio_ring: split: introduce virtqueue_resize_split()
virtio ring split supports resize. Only after the new vring is successfully allocated based on the new num, we will release the old vring. In any case, an error is returned, indicating that the vring still points to the old vring. In the case of an error, re-initialize(virtqueue_reinit_split()) the virtqueue to ensure that the vring can be used. In addition, vring_align, may_reduce_num are necessary for reallocating vring, so they are retained for creating vq. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 32 1 file changed, 32 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 9c83c5e6d5a9..1aaa1e5f9991 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -212,6 +212,7 @@ struct vring_virtqueue { }; static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num); +static void vring_free(struct virtqueue *_vq); /* * Helpers. @@ -1114,6 +1115,37 @@ static struct virtqueue *vring_create_virtqueue_split( return vq; } +static int virtqueue_resize_split(struct virtqueue *_vq, u32 num) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + struct vring_virtqueue_split vring = {}; + struct virtio_device *vdev = _vq->vdev; + int err; + + err = vring_alloc_queue_split(&vring, vdev, num, vq->split.vring_align, + vq->split.may_reduce_num); + if (err) + goto err; + + err = vring_alloc_state_extra_split(&vring); + if (err) { + vring_free_split(&vring, vdev); + goto err; + } + + vring_free(&vq->vq); + + virtqueue_init(vq, vring.vring.num); + virtqueue_vring_attach_split(vq, &vring); + virtqueue_vring_init_split(vq); + + return 0; + +err: + virtqueue_reinit_split(vq); + return -ENOMEM; +} + /* * Packed ring specific functions - *_packed(). -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 18/40] virtio_ring: packed: extract the logic of attach vring
Separate the logic of attach vring, the subsequent patch will call it separately. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 29 + 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 0c4109eb6c6c..91ac99f99bff 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1929,6 +1929,22 @@ static int vring_alloc_state_extra_packed(struct vring_virtqueue_packed *vring) return -ENOMEM; } +static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq, + struct vring_virtqueue_packed *vring) +{ + vq->packed.ring_dma_addr = vring->ring_dma_addr; + vq->packed.driver_event_dma_addr = vring->driver_event_dma_addr; + vq->packed.device_event_dma_addr = vring->device_event_dma_addr; + + vq->packed.ring_size_in_bytes = vring->ring_size_in_bytes; + vq->packed.event_size_in_bytes = vring->event_size_in_bytes; + + vq->packed.vring = vring->vring; + + vq->packed.desc_state = vring->desc_state; + vq->packed.desc_extra = vring->desc_extra; +} + static struct virtqueue *vring_create_virtqueue_packed( unsigned int index, unsigned int num, @@ -1969,15 +1985,6 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->packed.ring_dma_addr = vring.ring_dma_addr; - vq->packed.driver_event_dma_addr = vring.driver_event_dma_addr; - vq->packed.device_event_dma_addr = vring.device_event_dma_addr; - - vq->packed.ring_size_in_bytes = vring.ring_size_in_bytes; - vq->packed.event_size_in_bytes = vring.event_size_in_bytes; - - vq->packed.vring = vring.vring; - vq->packed.next_avail_idx = 0; vq->packed.avail_wrap_counter = 1; vq->packed.event_flags_shadow = 0; @@ -1987,10 +1994,8 @@ static struct virtqueue *vring_create_virtqueue_packed( if (err) goto err_state_extra; - vq->packed.desc_state = vring.desc_state; - vq->packed.desc_extra = vring.desc_extra; - virtqueue_init(vq, num); + virtqueue_vring_attach_packed(vq, &vring); /* No callback? Tell other side not to bother us. */ if (!callback) { -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 19/40] virtio_ring: packed: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will call it separately. This function completes the variable initialization of packed vring. It together with the logic of atatch constitutes the initialization of vring. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 28 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 91ac99f99bff..2f58266539eb 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1945,6 +1945,21 @@ static void virtqueue_vring_attach_packed(struct vring_virtqueue *vq, vq->packed.desc_extra = vring->desc_extra; } +static void virtqueue_vring_init_packed(struct vring_virtqueue *vq) +{ + vq->packed.next_avail_idx = 0; + vq->packed.avail_wrap_counter = 1; + vq->packed.event_flags_shadow = 0; + vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL; + + /* No callback? Tell other side not to bother us. */ + if (!vq->vq.callback) { + vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE; + vq->packed.vring.driver->flags = + cpu_to_le16(vq->packed.event_flags_shadow); + } +} + static struct virtqueue *vring_create_virtqueue_packed( unsigned int index, unsigned int num, @@ -1985,24 +2000,13 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; - vq->packed.next_avail_idx = 0; - vq->packed.avail_wrap_counter = 1; - vq->packed.event_flags_shadow = 0; - vq->packed.avail_used_flags = 1 << VRING_PACKED_DESC_F_AVAIL; - err = vring_alloc_state_extra_packed(&vring); if (err) goto err_state_extra; virtqueue_init(vq, num); virtqueue_vring_attach_packed(vq, &vring); - - /* No callback? Tell other side not to bother us. */ - if (!callback) { - vq->packed.event_flags_shadow = VRING_PACKED_EVENT_FLAG_DISABLE; - vq->packed.vring.driver->flags = - cpu_to_le16(vq->packed.event_flags_shadow); - } + virtqueue_vring_init_packed(vq); spin_lock(&vdev->vqs_list_lock); list_add_tail(&vq->vq.list, &vdev->vqs); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 20/40] virtio_ring: packed: introduce virtqueue_reinit_packed()
Introduce a function to initialize vq without allocating new ring, desc_state, desc_extra. Subsequent patches will call this function after reset vq to reinitialize vq. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 2f58266539eb..650f701a5480 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1960,6 +1960,27 @@ static void virtqueue_vring_init_packed(struct vring_virtqueue *vq) } } +static void virtqueue_reinit_packed(struct vring_virtqueue *vq) +{ + int size, i; + + memset(vq->packed.vring.device, 0, vq->packed.event_size_in_bytes); + memset(vq->packed.vring.driver, 0, vq->packed.event_size_in_bytes); + memset(vq->packed.vring.desc, 0, vq->packed.ring_size_in_bytes); + + size = sizeof(struct vring_desc_state_packed) * vq->packed.vring.num; + memset(vq->packed.desc_state, 0, size); + + size = sizeof(struct vring_desc_extra) * vq->packed.vring.num; + memset(vq->packed.desc_extra, 0, size); + + for (i = 0; i < vq->packed.vring.num - 1; i++) + vq->packed.desc_extra[i].next = i + 1; + + virtqueue_init(vq, vq->packed.vring.num); + virtqueue_vring_init_packed(vq); +} + static struct virtqueue *vring_create_virtqueue_packed( unsigned int index, unsigned int num, -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 21/40] virtio_ring: packed: introduce virtqueue_resize_packed()
virtio ring packed supports resize. Only after the new vring is successfully allocated based on the new num, we will release the old vring. In any case, an error is returned, indicating that the vring still points to the old vring. In the case of an error, re-initialize(by virtqueue_reinit_packed()) the virtqueue to ensure that the vring can be used. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 29 + 1 file changed, 29 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 650f701a5480..4860787286db 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2042,6 +2042,35 @@ static struct virtqueue *vring_create_virtqueue_packed( return NULL; } +static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num) +{ + struct vring_virtqueue_packed vring = {}; + struct vring_virtqueue *vq = to_vvq(_vq); + struct virtio_device *vdev = _vq->vdev; + int err; + + if (vring_alloc_queue_packed(&vring, vdev, num)) + goto err_ring; + + err = vring_alloc_state_extra_packed(&vring); + if (err) + goto err_state_extra; + + vring_free(&vq->vq); + + virtqueue_init(vq, vring.vring.num); + virtqueue_vring_attach_packed(vq, &vring); + virtqueue_vring_init_packed(vq); + + return 0; + +err_state_extra: + vring_free_packed(&vring, vdev); +err_ring: + virtqueue_reinit_packed(vq); + return -ENOMEM; +} + /* * Generic functions and exported symbols. -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 23/40] virtio_pci: move struct virtio_pci_common_cfg to virtio_pci_modern.h
In order to facilitate the expansion of virtio_pci_common_cfg in the future, move it from uapi to virtio_pci_modern.h. In this way, we can freely expand virtio_pci_common_cfg in the future. Other projects using virtio_pci_common_cfg in uapi need to maintain a separate virtio_pci_common_cfg or use the offset macro defined in uapi. Signed-off-by: Xuan Zhuo --- include/linux/virtio_pci_modern.h | 26 ++ include/uapi/linux/virtio_pci.h | 26 -- 2 files changed, 26 insertions(+), 26 deletions(-) diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index eb2bd9b4077d..c4f7ffbacb4e 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -5,6 +5,32 @@ #include #include +/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */ +struct virtio_pci_common_cfg { + /* About the whole device. */ + __le32 device_feature_select; /* read-write */ + __le32 device_feature; /* read-only */ + __le32 guest_feature_select;/* read-write */ + __le32 guest_feature; /* read-write */ + __le16 msix_config; /* read-write */ + __le16 num_queues; /* read-only */ + __u8 device_status; /* read-write */ + __u8 config_generation; /* read-only */ + + /* About a specific virtqueue. */ + __le16 queue_select;/* read-write */ + __le16 queue_size; /* read-write, power of 2. */ + __le16 queue_msix_vector; /* read-write */ + __le16 queue_enable;/* read-write */ + __le16 queue_notify_off;/* read-only */ + __le32 queue_desc_lo; /* read-write */ + __le32 queue_desc_hi; /* read-write */ + __le32 queue_avail_lo; /* read-write */ + __le32 queue_avail_hi; /* read-write */ + __le32 queue_used_lo; /* read-write */ + __le32 queue_used_hi; /* read-write */ +}; + struct virtio_pci_modern_device { struct pci_dev *pci_dev; diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index 3a86f36d7e3d..247ec42af2c8 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -140,32 +140,6 @@ struct virtio_pci_notify_cap { __le32 notify_off_multiplier; /* Multiplier for queue_notify_off. */ }; -/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */ -struct virtio_pci_common_cfg { - /* About the whole device. */ - __le32 device_feature_select; /* read-write */ - __le32 device_feature; /* read-only */ - __le32 guest_feature_select;/* read-write */ - __le32 guest_feature; /* read-write */ - __le16 msix_config; /* read-write */ - __le16 num_queues; /* read-only */ - __u8 device_status; /* read-write */ - __u8 config_generation; /* read-only */ - - /* About a specific virtqueue. */ - __le16 queue_select;/* read-write */ - __le16 queue_size; /* read-write, power of 2. */ - __le16 queue_msix_vector; /* read-write */ - __le16 queue_enable;/* read-write */ - __le16 queue_notify_off;/* read-only */ - __le32 queue_desc_lo; /* read-write */ - __le32 queue_desc_hi; /* read-write */ - __le32 queue_avail_lo; /* read-write */ - __le32 queue_avail_hi; /* read-write */ - __le32 queue_used_lo; /* read-write */ - __le32 queue_used_hi; /* read-write */ -}; - /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */ struct virtio_pci_cfg_cap { struct virtio_pci_cap cap; -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 24/40] virtio_pci: struct virtio_pci_common_cfg add queue_notify_data
Add queue_notify_data in struct virtio_pci_common_cfg, which comes from here https://github.com/oasis-tcs/virtio-spec/issues/89 Since I want to add queue_reset after queue_notify_data, I submitted this patch first. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- include/linux/virtio_pci_modern.h | 2 ++ include/uapi/linux/virtio_pci.h | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index c4f7ffbacb4e..9f31dde46f57 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -29,6 +29,8 @@ struct virtio_pci_common_cfg { __le32 queue_avail_hi; /* read-write */ __le32 queue_used_lo; /* read-write */ __le32 queue_used_hi; /* read-write */ + __le16 queue_notify_data; /* read-write */ + __le16 padding; }; struct virtio_pci_modern_device { diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index 247ec42af2c8..748b3eb62d2f 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -176,6 +176,7 @@ struct virtio_pci_cfg_cap { #define VIRTIO_PCI_COMMON_Q_AVAILHI44 #define VIRTIO_PCI_COMMON_Q_USEDLO 48 #define VIRTIO_PCI_COMMON_Q_USEDHI 52 +#define VIRTIO_PCI_COMMON_Q_NDATA 56 #endif /* VIRTIO_PCI_NO_MODERN */ -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 22/40] virtio_ring: introduce virtqueue_resize()
Introduce virtqueue_resize() to implement the resize of vring. Based on these, the driver can dynamically adjust the size of the vring. For example: ethtool -G. virtqueue_resize() implements resize based on the vq reset function. In case of failure to allocate a new vring, it will give up resize and use the original vring. During this process, if the re-enable reset vq fails, the vq can no longer be used. Although the probability of this situation is not high. The parameter recycle is used to recycle the buffer that is no longer used. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 72 include/linux/virtio.h | 3 ++ 2 files changed, 75 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 4860787286db..5ec43607cc15 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2542,6 +2542,78 @@ struct virtqueue *vring_create_virtqueue( } EXPORT_SYMBOL_GPL(vring_create_virtqueue); +/** + * virtqueue_resize - resize the vring of vq + * @_vq: the struct virtqueue we're talking about. + * @num: new ring num + * @recycle: callback for recycle the useless buffer + * + * When it is really necessary to create a new vring, it will set the current vq + * into the reset state. Then call the passed callback to recycle the buffer + * that is no longer used. Only after the new vring is successfully created, the + * old vring will be released. + * + * Caller must ensure we don't call this with other virtqueue operations + * at the same time (except where noted). + * + * Returns zero or a negative error. + * 0: success. + * -ENOMEM: Failed to allocate a new ring, fall back to the original ring size. + * vq can still work normally + * -EBUSY: Failed to sync with device, vq may not work properly + * -ENOENT: Transport or device not supported + * -E2BIG/-EINVAL: num error + * -EPERM: Operation not permitted + * + */ +int virtqueue_resize(struct virtqueue *_vq, u32 num, +void (*recycle)(struct virtqueue *vq, void *buf)) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + struct virtio_device *vdev = vq->vq.vdev; + bool packed; + void *buf; + int err; + + if (!vq->we_own_ring) + return -EPERM; + + if (num > vq->vq.num_max) + return -E2BIG; + + if (!num) + return -EINVAL; + + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED) ? true : false; + + if ((packed ? vq->packed.vring.num : vq->split.vring.num) == num) + return 0; + + if (!vdev->config->reset_vq) + return -ENOENT; + + if (!vdev->config->enable_reset_vq) + return -ENOENT; + + err = vdev->config->reset_vq(_vq); + if (err) + return err; + + while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL) + recycle(_vq, buf); + + if (packed) + err = virtqueue_resize_packed(_vq, num); + else + err = virtqueue_resize_split(_vq, num); + + if (vdev->config->enable_reset_vq(_vq)) + return -EBUSY; + + return err; +} +EXPORT_SYMBOL_GPL(virtqueue_resize); + /* Only available for split ring */ struct virtqueue *vring_new_virtqueue(unsigned int index, unsigned int num, diff --git a/include/linux/virtio.h b/include/linux/virtio.h index a82620032e43..1272566adec6 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -91,6 +91,9 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq); dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq); dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq); +int virtqueue_resize(struct virtqueue *vq, u32 num, +void (*recycle)(struct virtqueue *vq, void *buf)); + /** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 25/40] virtio: allow to unbreak/break virtqueue individually
This patch allows the new introduced __virtqueue_break()/__virtqueue_unbreak() to break/unbreak the virtqueue. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_ring.c | 24 include/linux/virtio.h | 3 +++ 2 files changed, 27 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5ec43607cc15..7b02be7fce67 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -2744,6 +2744,30 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq) } EXPORT_SYMBOL_GPL(virtqueue_get_vring_size); +/* + * This function should only be called by the core, not directly by the driver. + */ +void __virtqueue_break(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + /* Pairs with READ_ONCE() in virtqueue_is_broken(). */ + WRITE_ONCE(vq->broken, true); +} +EXPORT_SYMBOL_GPL(__virtqueue_break); + +/* + * This function should only be called by the core, not directly by the driver. + */ +void __virtqueue_unbreak(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + /* Pairs with READ_ONCE() in virtqueue_is_broken(). */ + WRITE_ONCE(vq->broken, false); +} +EXPORT_SYMBOL_GPL(__virtqueue_unbreak); + bool virtqueue_is_broken(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 1272566adec6..dc474a0d48d1 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -138,6 +138,9 @@ bool is_virtio_device(struct device *dev); void virtio_break_device(struct virtio_device *dev); void __virtio_unbreak_device(struct virtio_device *dev); +void __virtqueue_break(struct virtqueue *_vq); +void __virtqueue_unbreak(struct virtqueue *_vq); + void virtio_config_changed(struct virtio_device *dev); #ifdef CONFIG_PM_SLEEP int virtio_device_freeze(struct virtio_device *dev); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 26/40] virtio: queue_reset: add VIRTIO_F_RING_RESET
Added VIRTIO_F_RING_RESET, it came from here https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 This feature indicates that the driver can reset a queue individually. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- include/uapi/linux/virtio_config.h | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h index f0fb0ae021c0..3c05162bc988 100644 --- a/include/uapi/linux/virtio_config.h +++ b/include/uapi/linux/virtio_config.h @@ -52,7 +52,7 @@ * rest are per-device feature bits. */ #define VIRTIO_TRANSPORT_F_START 28 -#define VIRTIO_TRANSPORT_F_END 38 +#define VIRTIO_TRANSPORT_F_END 41 #ifndef VIRTIO_CONFIG_NO_LEGACY /* Do we get callbacks when the ring is completely used, even if we've @@ -98,4 +98,9 @@ * Does the device support Single Root I/O Virtualization? */ #define VIRTIO_F_SR_IOV37 + +/* + * This feature indicates that the driver can reset a queue individually. + */ +#define VIRTIO_F_RING_RESET40 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */ -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 28/40] virtio_pci: introduce helper to get/set queue reset
Introduce new helpers to implement queue reset and get queue reset status. https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_pci_modern_dev.c | 35 ++ include/linux/virtio_pci_modern.h | 2 ++ 2 files changed, 37 insertions(+) diff --git a/drivers/virtio/virtio_pci_modern_dev.c b/drivers/virtio/virtio_pci_modern_dev.c index fa2a9445bb18..07415654247c 100644 --- a/drivers/virtio/virtio_pci_modern_dev.c +++ b/drivers/virtio/virtio_pci_modern_dev.c @@ -3,6 +3,7 @@ #include #include #include +#include /* * vp_modern_map_capability - map a part of virtio pci capability @@ -474,6 +475,40 @@ void vp_modern_set_status(struct virtio_pci_modern_device *mdev, } EXPORT_SYMBOL_GPL(vp_modern_set_status); +/* + * vp_modern_get_queue_reset - get the queue reset status + * @mdev: the modern virtio-pci device + * @index: queue index + */ +int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) +{ + struct virtio_pci_common_cfg __iomem *cfg = mdev->common; + + vp_iowrite16(index, &cfg->queue_select); + return vp_ioread16(&cfg->queue_reset); +} +EXPORT_SYMBOL_GPL(vp_modern_get_queue_reset); + +/* + * vp_modern_set_queue_reset - reset the queue + * @mdev: the modern virtio-pci device + * @index: queue index + */ +void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index) +{ + struct virtio_pci_common_cfg __iomem *cfg = mdev->common; + + vp_iowrite16(index, &cfg->queue_select); + vp_iowrite16(1, &cfg->queue_reset); + + while (vp_ioread16(&cfg->queue_reset)) + msleep(1); + + while (vp_ioread16(&cfg->queue_enable)) + msleep(1); +} +EXPORT_SYMBOL_GPL(vp_modern_set_queue_reset); + /* * vp_modern_queue_vector - set the MSIX vector for a specific virtqueue * @mdev: the modern virtio-pci device diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index beebc7a4a31d..ded01157f864 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -134,4 +134,6 @@ void __iomem * vp_modern_map_vq_notify(struct virtio_pci_modern_device *mdev, u16 index, resource_size_t *pa); int vp_modern_probe(struct virtio_pci_modern_device *mdev); void vp_modern_remove(struct virtio_pci_modern_device *mdev); +int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index); +void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index); #endif -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 27/40] virtio_pci: struct virtio_pci_common_cfg add queue_reset
Add queue_reset in virtio_pci_common_cfg. https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Signed-off-by: Xuan Zhuo --- include/linux/virtio_pci_modern.h | 2 +- include/uapi/linux/virtio_pci.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index 9f31dde46f57..beebc7a4a31d 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -30,7 +30,7 @@ struct virtio_pci_common_cfg { __le32 queue_used_lo; /* read-write */ __le32 queue_used_hi; /* read-write */ __le16 queue_notify_data; /* read-write */ - __le16 padding; + __le16 queue_reset; /* read-write */ }; struct virtio_pci_modern_device { diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index 748b3eb62d2f..4f0a8d86cb11 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -177,6 +177,7 @@ struct virtio_pci_cfg_cap { #define VIRTIO_PCI_COMMON_Q_USEDLO 48 #define VIRTIO_PCI_COMMON_Q_USEDHI 52 #define VIRTIO_PCI_COMMON_Q_NDATA 56 +#define VIRTIO_PCI_COMMON_Q_RESET 58 #endif /* VIRTIO_PCI_NO_MODERN */ -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 29/40] virtio_pci: extract the logic of active vq for modern pci
Introduce vp_active_vq() to configure vring to backend after vq attach vring. And configure vq vector if necessary. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_pci_modern.c | 46 ++ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index e7e0b8c850f6..9041d9a41b7d 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -176,6 +176,29 @@ static void vp_reset(struct virtio_device *vdev) vp_synchronize_vectors(vdev); } +static int vp_active_vq(struct virtqueue *vq, u16 msix_vec) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); + struct virtio_pci_modern_device *mdev = &vp_dev->mdev; + unsigned long index; + + index = vq->index; + + /* activate the queue */ + vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq)); + vp_modern_queue_address(mdev, index, virtqueue_get_desc_addr(vq), + virtqueue_get_avail_addr(vq), + virtqueue_get_used_addr(vq)); + + if (msix_vec != VIRTIO_MSI_NO_VECTOR) { + msix_vec = vp_modern_queue_vector(mdev, index, msix_vec); + if (msix_vec == VIRTIO_MSI_NO_VECTOR) + return -EBUSY; + } + + return 0; +} + static u16 vp_config_vector(struct virtio_pci_device *vp_dev, u16 vector) { return vp_modern_config_vector(&vp_dev->mdev, vector); @@ -220,32 +243,19 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, vq->num_max = num; - /* activate the queue */ - vp_modern_set_queue_size(mdev, index, virtqueue_get_vring_size(vq)); - vp_modern_queue_address(mdev, index, virtqueue_get_desc_addr(vq), - virtqueue_get_avail_addr(vq), - virtqueue_get_used_addr(vq)); + err = vp_active_vq(vq, msix_vec); + if (err) + goto err; vq->priv = (void __force *)vp_modern_map_vq_notify(mdev, index, NULL); if (!vq->priv) { err = -ENOMEM; - goto err_map_notify; - } - - if (msix_vec != VIRTIO_MSI_NO_VECTOR) { - msix_vec = vp_modern_queue_vector(mdev, index, msix_vec); - if (msix_vec == VIRTIO_MSI_NO_VECTOR) { - err = -EBUSY; - goto err_assign_vector; - } + goto err; } return vq; -err_assign_vector: - if (!mdev->notify_base) - pci_iounmap(mdev->pci_dev, (void __iomem __force *)vq->priv); -err_map_notify: +err: vring_del_virtqueue(vq); return ERR_PTR(err); } -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 30/40] virtio_pci: support VIRTIO_F_RING_RESET
This patch implements virtio pci support for QUEUE RESET. Performing reset on a queue is divided into these steps: 1. notify the device to reset the queue 2. recycle the buffer submitted 3. reset the vring (may re-alloc) 4. mmap vring to device, and enable the queue This patch implements virtio_reset_vq(), virtio_enable_resetq() in the pci scenario. Signed-off-by: Xuan Zhuo --- drivers/virtio/virtio_pci_common.c | 12 +++- drivers/virtio/virtio_pci_modern.c | 96 ++ drivers/virtio/virtio_ring.c | 2 + include/linux/virtio.h | 1 + 4 files changed, 108 insertions(+), 3 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index ca51fcc9daab..ad258a9d3b9f 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -214,9 +214,15 @@ static void vp_del_vq(struct virtqueue *vq) struct virtio_pci_vq_info *info = vp_dev->vqs[vq->index]; unsigned long flags; - spin_lock_irqsave(&vp_dev->lock, flags); - list_del(&info->node); - spin_unlock_irqrestore(&vp_dev->lock, flags); + /* +* If it fails during re-enable reset vq. This way we won't rejoin +* info->node to the queue. Prevent unexpected irqs. +*/ + if (!vq->reset) { + spin_lock_irqsave(&vp_dev->lock, flags); + list_del(&info->node); + spin_unlock_irqrestore(&vp_dev->lock, flags); + } vp_dev->del_vq(info); kfree(info); diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 9041d9a41b7d..754e5e10386b 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -34,6 +34,9 @@ static void vp_transport_features(struct virtio_device *vdev, u64 features) if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) && pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV)) __virtio_set_bit(vdev, VIRTIO_F_SR_IOV); + + if (features & BIT_ULL(VIRTIO_F_RING_RESET)) + __virtio_set_bit(vdev, VIRTIO_F_RING_RESET); } /* virtio config->finalize_features() implementation */ @@ -199,6 +202,95 @@ static int vp_active_vq(struct virtqueue *vq, u16 msix_vec) return 0; } +static int vp_modern_reset_vq(struct virtqueue *vq) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); + struct virtio_pci_modern_device *mdev = &vp_dev->mdev; + struct virtio_pci_vq_info *info; + unsigned long flags; + + if (!virtio_has_feature(vq->vdev, VIRTIO_F_RING_RESET)) + return -ENOENT; + + vp_modern_set_queue_reset(mdev, vq->index); + + info = vp_dev->vqs[vq->index]; + + /* delete vq from irq handler */ + spin_lock_irqsave(&vp_dev->lock, flags); + list_del(&info->node); + spin_unlock_irqrestore(&vp_dev->lock, flags); + + INIT_LIST_HEAD(&info->node); + + /* For the case where vq has an exclusive irq, to prevent the irq from +* being received again and the pending irq, call synchronize_irq(), and +* break it. +* +* We can't use disable_irq() since it conflicts with the affinity +* managed IRQ that is used by some drivers. So this is done on top of +* IRQ hardening. +* +* In the scenario based on shared interrupts, vq will be searched from +* the queue virtqueues. Since the previous list_del() has been deleted +* from the queue, it is impossible for vq to be called in this case. +* There is no need to close the corresponding interrupt. +*/ + if (vp_dev->per_vq_vectors && info->msix_vector != VIRTIO_MSI_NO_VECTOR) { +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION + __virtqueue_break(vq); +#endif + synchronize_irq(pci_irq_vector(vp_dev->pci_dev, info->msix_vector)); + } + + vq->reset = true; + + return 0; +} + +static int vp_modern_enable_reset_vq(struct virtqueue *vq) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); + struct virtio_pci_modern_device *mdev = &vp_dev->mdev; + struct virtio_pci_vq_info *info; + unsigned long flags, index; + int err; + + if (!vq->reset) + return -EBUSY; + + index = vq->index; + info = vp_dev->vqs[index]; + + if (vp_modern_get_queue_reset(mdev, index)) + return -EBUSY; + + if (vp_modern_get_queue_enable(mdev, index)) + return -EBUSY; + + err = vp_active_vq(vq, info->msix_vector); + if (err) + return err; + + if (vq->callback) { + spin_lock_irqsave(&vp_dev->lock, flags); + list_add(&info->node, &vp_dev->virtqueues); + spin_unlock_irqrestore(&vp_dev->lock, flags); + } else { + INIT_LIST_HEAD(&info->node); + } + +#ifdef CONFIG_VIRT
[PATCH v11 31/40] virtio: find_vqs() add arg sizes
find_vqs() adds a new parameter sizes to specify the size of each vq vring. NULL as sizes means that all queues in find_vqs() use the maximum size. A value in the array is 0, which means that the corresponding queue uses the maximum size. In the split scenario, the meaning of size is the largest size, because it may be limited by memory, the virtio core will try a smaller size. And the size is power of 2. Signed-off-by: Xuan Zhuo Acked-by: Hans de Goede Reviewed-by: Mathieu Poirier --- arch/um/drivers/virtio_uml.c | 2 +- drivers/platform/mellanox/mlxbf-tmfifo.c | 1 + drivers/remoteproc/remoteproc_virtio.c | 1 + drivers/s390/virtio/virtio_ccw.c | 1 + drivers/virtio/virtio_mmio.c | 1 + drivers/virtio/virtio_pci_common.c | 2 +- drivers/virtio/virtio_pci_common.h | 2 +- drivers/virtio/virtio_pci_modern.c | 7 +-- drivers/virtio/virtio_vdpa.c | 1 + include/linux/virtio_config.h| 14 +- 10 files changed, 22 insertions(+), 10 deletions(-) diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c index e719af8bdf56..79e38afd4b91 100644 --- a/arch/um/drivers/virtio_uml.c +++ b/arch/um/drivers/virtio_uml.c @@ -1011,7 +1011,7 @@ static struct virtqueue *vu_setup_vq(struct virtio_device *vdev, static int vu_find_vqs(struct virtio_device *vdev, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], - const char * const names[], const bool *ctx, + const char * const names[], u32 sizes[], const bool *ctx, struct irq_affinity *desc) { struct virtio_uml_device *vu_dev = to_virtio_uml_device(vdev); diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c index 1ae3c56b66b0..8be13d416f48 100644 --- a/drivers/platform/mellanox/mlxbf-tmfifo.c +++ b/drivers/platform/mellanox/mlxbf-tmfifo.c @@ -928,6 +928,7 @@ static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev, struct virtqueue *vqs[], vq_callback_t *callbacks[], const char * const names[], + u32 sizes[], const bool *ctx, struct irq_affinity *desc) { diff --git a/drivers/remoteproc/remoteproc_virtio.c b/drivers/remoteproc/remoteproc_virtio.c index 0f7706e23eb9..81c4f5776109 100644 --- a/drivers/remoteproc/remoteproc_virtio.c +++ b/drivers/remoteproc/remoteproc_virtio.c @@ -158,6 +158,7 @@ static int rproc_virtio_find_vqs(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], const char * const names[], +u32 sizes[], const bool * ctx, struct irq_affinity *desc) { diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 6b86d0280d6b..72500cd2dbf5 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -635,6 +635,7 @@ static int virtio_ccw_find_vqs(struct virtio_device *vdev, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], const char * const names[], + u32 sizes[], const bool *ctx, struct irq_affinity *desc) { diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index a20d5a6b5819..5e3ba3cc7fd0 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -474,6 +474,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], const char * const names[], + u32 sizes[], const bool *ctx, struct irq_affinity *desc) { diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index ad258a9d3b9f..7ad734584823 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -396,7 +396,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs, /* the config->find_vqs() implementation */ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], - const char * const names[], const bool *ctx, + const char * const names[], u32 sizes[], const bool *ctx, struct irq_affinity *desc) { int err; diff --git a/drivers/virti
[PATCH v11 32/40] virtio_pci: support the arg sizes of find_vqs()
Virtio PCI supports new parameter sizes of find_vqs(). Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_pci_common.c | 18 ++ drivers/virtio/virtio_pci_common.h | 1 + drivers/virtio/virtio_pci_legacy.c | 6 +- drivers/virtio/virtio_pci_modern.c | 10 +++--- 4 files changed, 23 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c index 7ad734584823..00ad476a815d 100644 --- a/drivers/virtio/virtio_pci_common.c +++ b/drivers/virtio/virtio_pci_common.c @@ -174,6 +174,7 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned int index, void (*callback)(struct virtqueue *vq), const char *name, +u32 size, bool ctx, u16 msix_vec) { @@ -186,7 +187,7 @@ static struct virtqueue *vp_setup_vq(struct virtio_device *vdev, unsigned int in if (!info) return ERR_PTR(-ENOMEM); - vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, ctx, + vq = vp_dev->setup_vq(vp_dev, info, index, callback, name, size, ctx, msix_vec); if (IS_ERR(vq)) goto out_info; @@ -283,7 +284,7 @@ void vp_del_vqs(struct virtio_device *vdev) static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], - const char * const names[], bool per_vq_vectors, + const char * const names[], u32 sizes[], bool per_vq_vectors, const bool *ctx, struct irq_affinity *desc) { @@ -326,8 +327,8 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs, else msix_vec = VP_MSIX_VQ_VECTOR; vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i], -ctx ? ctx[i] : false, -msix_vec); +sizes ? sizes[i] : 0, +ctx ? ctx[i] : false, msix_vec); if (IS_ERR(vqs[i])) { err = PTR_ERR(vqs[i]); goto error_find; @@ -357,7 +358,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs, static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], - const char * const names[], const bool *ctx) + const char * const names[], u32 sizes[], const bool *ctx) { struct virtio_pci_device *vp_dev = to_vp_device(vdev); int i, err, queue_idx = 0; @@ -379,6 +380,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs, continue; } vqs[i] = vp_setup_vq(vdev, queue_idx++, callbacks[i], names[i], +sizes ? sizes[i] : 0, ctx ? ctx[i] : false, VIRTIO_MSI_NO_VECTOR); if (IS_ERR(vqs[i])) { @@ -402,15 +404,15 @@ int vp_find_vqs(struct virtio_device *vdev, unsigned int nvqs, int err; /* Try MSI-X with one vector per queue. */ - err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, true, ctx, desc); + err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, sizes, true, ctx, desc); if (!err) return 0; /* Fallback: MSI-X with one vector for config, one shared for queues. */ - err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, false, ctx, desc); + err = vp_find_vqs_msix(vdev, nvqs, vqs, callbacks, names, sizes, false, ctx, desc); if (!err) return 0; /* Finally fall back to regular interrupts. */ - return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, ctx); + return vp_find_vqs_intx(vdev, nvqs, vqs, callbacks, names, sizes, ctx); } const char *vp_bus_name(struct virtio_device *vdev) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index a5ff838b85a5..c0448378b698 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -80,6 +80,7 @@ struct virtio_pci_device { unsigned int idx, void (*callback)(struct virtqueue *vq), const char *name, + u32 size, bool ctx, u16 msix_vec); void (*del_vq)(struct virtio_pci_vq_
[PATCH v11 33/40] virtio_mmio: support the arg sizes of find_vqs()
Virtio MMIO support the new parameter sizes of find_vqs(). Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/virtio/virtio_mmio.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 5e3ba3cc7fd0..c888fee18caf 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -360,7 +360,7 @@ static void vm_synchronize_cbs(struct virtio_device *vdev) static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int index, void (*callback)(struct virtqueue *vq), - const char *name, bool ctx) + const char *name, u32 size, bool ctx) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); struct virtio_mmio_vq_info *info; @@ -395,8 +395,11 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned int in goto error_new_virtqueue; } + if (!size || size > num) + size = num; + /* Create the vring */ - vq = vring_create_virtqueue(index, num, VIRTIO_MMIO_VRING_ALIGN, vdev, + vq = vring_create_virtqueue(index, size, VIRTIO_MMIO_VRING_ALIGN, vdev, true, true, ctx, vm_notify, callback, name); if (!vq) { err = -ENOMEM; @@ -497,6 +500,7 @@ static int vm_find_vqs(struct virtio_device *vdev, unsigned int nvqs, } vqs[i] = vm_setup_vq(vdev, queue_idx++, callbacks[i], names[i], +sizes ? sizes[i] : 0, ctx ? ctx[i] : false); if (IS_ERR(vqs[i])) { vm_del_vqs(vdev); -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 34/40] virtio: add helper virtio_find_vqs_ctx_size()
Introduce helper virtio_find_vqs_ctx_size() to call find_vqs and specify the maximum size of each vq ring. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- include/linux/virtio_config.h | 12 1 file changed, 12 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 31b04ac8284b..aa6cdf353748 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -239,6 +239,18 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, ctx, desc); } +static inline +int virtio_find_vqs_ctx_size(struct virtio_device *vdev, u32 nvqs, +struct virtqueue *vqs[], +vq_callback_t *callbacks[], +const char * const names[], +u32 sizes[], +const bool *ctx, struct irq_affinity *desc) +{ + return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, sizes, + ctx, desc); +} + /** * virtio_synchronize_cbs - synchronize with virtqueue callbacks * @vdev: the device -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 35/40] virtio_net: set the default max ring size by find_vqs()
Use virtio_find_vqs_ctx_size() to specify the maximum ring size of tx, rx at the same time. | rx/tx ring size --- speed == UNKNOWN or < 10G| 1024 speed < 40G | 4096 speed >= 40G | 8192 Call virtnet_update_settings() once before calling init_vqs() to update speed. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 42 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8a5810bcb839..40532ecbe7fc 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3208,6 +3208,29 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu (unsigned int)GOOD_PACKET_LEN); } +static void virtnet_config_sizes(struct virtnet_info *vi, u32 *sizes) +{ + u32 i, rx_size, tx_size; + + if (vi->speed == SPEED_UNKNOWN || vi->speed < SPEED_1) { + rx_size = 1024; + tx_size = 1024; + + } else if (vi->speed < SPEED_4) { + rx_size = 1024 * 4; + tx_size = 1024 * 4; + + } else { + rx_size = 1024 * 8; + tx_size = 1024 * 8; + } + + for (i = 0; i < vi->max_queue_pairs; i++) { + sizes[rxq2vq(i)] = rx_size; + sizes[txq2vq(i)] = tx_size; + } +} + static int virtnet_find_vqs(struct virtnet_info *vi) { vq_callback_t **callbacks; @@ -3215,6 +3238,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) int ret = -ENOMEM; int i, total_vqs; const char **names; + u32 *sizes; bool *ctx; /* We expect 1 RX virtqueue followed by 1 TX virtqueue, followed by @@ -3242,10 +3266,15 @@ static int virtnet_find_vqs(struct virtnet_info *vi) ctx = NULL; } + sizes = kmalloc_array(total_vqs, sizeof(*sizes), GFP_KERNEL); + if (!sizes) + goto err_sizes; + /* Parameters for control virtqueue, if any */ if (vi->has_cvq) { callbacks[total_vqs - 1] = NULL; names[total_vqs - 1] = "control"; + sizes[total_vqs - 1] = 64; } /* Allocate/initialize parameters for send/receive virtqueues */ @@ -3260,8 +3289,10 @@ static int virtnet_find_vqs(struct virtnet_info *vi) ctx[rxq2vq(i)] = true; } - ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks, - names, ctx, NULL); + virtnet_config_sizes(vi, sizes); + + ret = virtio_find_vqs_ctx_size(vi->vdev, total_vqs, vqs, callbacks, + names, sizes, ctx, NULL); if (ret) goto err_find; @@ -3281,6 +3312,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi) err_find: + kfree(sizes); +err_sizes: kfree(ctx); err_ctx: kfree(names); @@ -3630,6 +3663,9 @@ static int virtnet_probe(struct virtio_device *vdev) vi->curr_queue_pairs = num_online_cpus(); vi->max_queue_pairs = max_queue_pairs; + virtnet_init_settings(dev); + virtnet_update_settings(vi); + /* Allocate/initialize the rx/tx queues, and invoke find_vqs */ err = init_vqs(vi); if (err) @@ -3642,8 +3678,6 @@ static int virtnet_probe(struct virtio_device *vdev) netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs); netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs); - virtnet_init_settings(dev); - if (virtio_has_feature(vdev, VIRTIO_NET_F_STANDBY)) { vi->failover = net_failover_create(vi->dev); if (IS_ERR(vi->failover)) { -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 37/40] virtio_net: split free_unused_bufs()
This patch separates two functions for freeing sq buf and rq buf from free_unused_bufs(). When supporting the enable/disable tx/rq queue in the future, it is necessary to support separate recovery of a sq buf or a rq buf. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 41 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 63f990bdc302..9fe222a3663a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3151,6 +3151,27 @@ static void free_receive_page_frags(struct virtnet_info *vi) put_page(vi->rq[i].alloc_frag.page); } +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf) +{ + if (!is_xdp_frame(buf)) + dev_kfree_skb(buf); + else + xdp_return_frame(ptr_to_xdp(buf)); +} + +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf) +{ + struct virtnet_info *vi = vq->vdev->priv; + int i = vq2rxq(vq); + + if (vi->mergeable_rx_bufs) + put_page(virt_to_head_page(buf)); + else if (vi->big_packets) + give_pages(&vi->rq[i], buf); + else + put_page(virt_to_head_page(buf)); +} + static void free_unused_bufs(struct virtnet_info *vi) { void *buf; @@ -3158,26 +3179,14 @@ static void free_unused_bufs(struct virtnet_info *vi) for (i = 0; i < vi->max_queue_pairs; i++) { struct virtqueue *vq = vi->sq[i].vq; - while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) { - if (!is_xdp_frame(buf)) - dev_kfree_skb(buf); - else - xdp_return_frame(ptr_to_xdp(buf)); - } + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) + virtnet_sq_free_unused_buf(vq, buf); } for (i = 0; i < vi->max_queue_pairs; i++) { struct virtqueue *vq = vi->rq[i].vq; - - while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) { - if (vi->mergeable_rx_bufs) { - put_page(virt_to_head_page(buf)); - } else if (vi->big_packets) { - give_pages(&vi->rq[i], buf); - } else { - put_page(virt_to_head_page(buf)); - } - } + while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) + virtnet_rq_free_unused_buf(vq, buf); } } -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 36/40] virtio_net: get ringparam by virtqueue_get_vring_max_size()
Use virtqueue_get_vring_max_size() in virtnet_get_ringparam() to set tx,rx_max_pending. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 40532ecbe7fc..63f990bdc302 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2254,10 +2254,10 @@ static void virtnet_get_ringparam(struct net_device *dev, { struct virtnet_info *vi = netdev_priv(dev); - ring->rx_max_pending = virtqueue_get_vring_size(vi->rq[0].vq); - ring->tx_max_pending = virtqueue_get_vring_size(vi->sq[0].vq); - ring->rx_pending = ring->rx_max_pending; - ring->tx_pending = ring->tx_max_pending; + ring->rx_max_pending = virtqueue_get_vring_max_size(vi->rq[0].vq); + ring->tx_max_pending = virtqueue_get_vring_max_size(vi->sq[0].vq); + ring->rx_pending = virtqueue_get_vring_size(vi->rq[0].vq); + ring->tx_pending = virtqueue_get_vring_size(vi->sq[0].vq); } static bool virtnet_commit_rss_command(struct virtnet_info *vi) -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 38/40] virtio_net: support rx queue resize
This patch implements the resize function of the rx queues. Based on this function, it is possible to modify the ring num of the queue. Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9fe222a3663a..6ab16fd193e5 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -278,6 +278,8 @@ struct padded_vnet_hdr { char padding[12]; }; +static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf); + static bool is_xdp_frame(void *ptr) { return (unsigned long)ptr & VIRTIO_XDP_FLAG; @@ -1846,6 +1848,26 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } +static int virtnet_rx_resize(struct virtnet_info *vi, +struct receive_queue *rq, u32 ring_num) +{ + int err, qindex; + + qindex = rq - vi->rq; + + napi_disable(&rq->napi); + + err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_buf); + if (err) + netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err); + + if (!try_fill_recv(vi, rq, GFP_KERNEL)) + schedule_delayed_work(&vi->refill, 0); + + virtnet_napi_enable(rq->vq, &rq->napi); + return err; +} + /* * Send command via the control virtqueue and check status. Commands * supported by the hypervisor, as indicated by feature bits, should -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 40/40] virtio_net: support set_ringparam
Support set_ringparam based on virtio queue reset. Users can use ethtool -G eth0 to modify the ring size of virtio-net. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang --- drivers/net/virtio_net.c | 48 1 file changed, 48 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index fd358462f802..cc554cbac431 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2330,6 +2330,53 @@ static void virtnet_get_ringparam(struct net_device *dev, ring->tx_pending = virtqueue_get_vring_size(vi->sq[0].vq); } +static int virtnet_set_ringparam(struct net_device *dev, +struct ethtool_ringparam *ring, +struct kernel_ethtool_ringparam *kernel_ring, +struct netlink_ext_ack *extack) +{ + struct virtnet_info *vi = netdev_priv(dev); + u32 rx_pending, tx_pending; + struct receive_queue *rq; + struct send_queue *sq; + int i, err; + + if (ring->rx_mini_pending || ring->rx_jumbo_pending) + return -EINVAL; + + rx_pending = virtqueue_get_vring_size(vi->rq[0].vq); + tx_pending = virtqueue_get_vring_size(vi->sq[0].vq); + + if (ring->rx_pending == rx_pending && + ring->tx_pending == tx_pending) + return 0; + + if (ring->rx_pending > virtqueue_get_vring_max_size(vi->rq[0].vq)) + return -EINVAL; + + if (ring->tx_pending > virtqueue_get_vring_max_size(vi->sq[0].vq)) + return -EINVAL; + + for (i = 0; i < vi->max_queue_pairs; i++) { + rq = vi->rq + i; + sq = vi->sq + i; + + if (ring->tx_pending != tx_pending) { + err = virtnet_tx_resize(vi, sq, ring->tx_pending); + if (err) + return err; + } + + if (ring->rx_pending != rx_pending) { + err = virtnet_rx_resize(vi, rq, ring->rx_pending); + if (err) + return err; + } + } + + return 0; +} + static bool virtnet_commit_rss_command(struct virtnet_info *vi) { struct net_device *dev = vi->dev; @@ -2817,6 +2864,7 @@ static const struct ethtool_ops virtnet_ethtool_ops = { .get_drvinfo = virtnet_get_drvinfo, .get_link = ethtool_op_get_link, .get_ringparam = virtnet_get_ringparam, + .set_ringparam = virtnet_set_ringparam, .get_strings = virtnet_get_strings, .get_sset_count = virtnet_get_sset_count, .get_ethtool_stats = virtnet_get_ethtool_stats, -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v11 39/40] virtio_net: support tx queue resize
This patch implements the resize function of the tx queues. Based on this function, it is possible to modify the ring num of the queue. Signed-off-by: Xuan Zhuo --- drivers/net/virtio_net.c | 48 1 file changed, 48 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6ab16fd193e5..fd358462f802 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -135,6 +135,9 @@ struct send_queue { struct virtnet_sq_stats stats; struct napi_struct napi; + + /* Record whether sq is in reset state. */ + bool reset; }; /* Internal representation of a receive virtqueue */ @@ -279,6 +282,7 @@ struct padded_vnet_hdr { }; static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf); +static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf); static bool is_xdp_frame(void *ptr) { @@ -1603,6 +1607,11 @@ static void virtnet_poll_cleantx(struct receive_queue *rq) return; if (__netif_tx_trylock(txq)) { + if (READ_ONCE(sq->reset)) { + __netif_tx_unlock(txq); + return; + } + do { virtqueue_disable_cb(sq->vq); free_old_xmit_skbs(sq, true); @@ -1868,6 +1877,45 @@ static int virtnet_rx_resize(struct virtnet_info *vi, return err; } +static int virtnet_tx_resize(struct virtnet_info *vi, +struct send_queue *sq, u32 ring_num) +{ + struct netdev_queue *txq; + int err, qindex; + + qindex = sq - vi->sq; + + virtnet_napi_tx_disable(&sq->napi); + + txq = netdev_get_tx_queue(vi->dev, qindex); + + /* 1. wait all ximt complete +* 2. fix the race of netif_stop_subqueue() vs netif_start_subqueue() +*/ + __netif_tx_lock_bh(txq); + + /* Prevent rx poll from accessing sq. */ + WRITE_ONCE(sq->reset, true); + + /* Prevent the upper layer from trying to send packets. */ + netif_stop_subqueue(vi->dev, qindex); + + __netif_tx_unlock_bh(txq); + + err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf); + if (err) + netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err); + + /* Memory barrier before set reset and start subqueue. */ + smp_mb(); + + WRITE_ONCE(sq->reset, false); + netif_tx_wake_queue(txq); + + virtnet_napi_tx_enable(vi, sq->vq, &sq->napi); + return err; +} + /* * Send command via the control virtqueue and check status. Commands * supported by the hypervisor, as indicated by feature bits, should -- 2.31.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization