Re: [EXT] [RFC PATCH 0/4] PCI: endpoint: Introduce a virtio-net EP function

2023-04-04 Thread Shunsuke Mie



On 2023/03/30 1:46, Frank Li wrote:

On 2023/02/08 1:02, Frank Li wrote:

Did you have chance to improve this?


Yes. I'm working on it.I'd like to submit new one in this week.


Best regards
Frank Li


Best,

Shunsuke,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 01/11] vhost: add vhost_worker pointer to vhost_virtqueue

2023-04-04 Thread Mike Christie
On 4/4/23 1:38 PM, Michael S. Tsirkin wrote:
> On Mon, Mar 27, 2023 at 09:17:07PM -0500, Mike Christie wrote:
>> This patchset allows userspace to map vqs to different workers. This
>> patch adds a worker pointer to the vq so we can store that info.
>>
>> Signed-off-by: Mike Christie 
> Thanks! Conflicts with a bunch of refactorings upstream:
> could you rebase this on my tree and repost?

I will.

> I need to queue this soon so it gets time in -next.
Are you shooting for 6.4?

I think it's ok to do this for 6.5. We are already at rc5 and to
handle Jason's issue I will need to do redo testing and that will
take me some time.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 11/11] vhost: allow userspace to create workers

2023-04-04 Thread Mike Christie
On 4/4/23 3:00 AM, Jason Wang wrote:
>>
>> -static void vhost_worker_free(struct vhost_dev *dev)
>> +static void vhost_worker_put(struct vhost_dev *dev, struct vhost_worker 
>> *worker)
>>  {
>> -   struct vhost_worker *worker = dev->worker;
>> -
>> if (!worker)
>> return;
>>
>> -   dev->worker = NULL;
>> +   if (!refcount_dec_and_test(>refcount))
>> +   return;
>> +
>> WARN_ON(!llist_empty(>work_list));
>> vhost_task_stop(worker->vtsk);
>> kfree(worker);
>>  }
>>
>> +static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
>> +{
>> +   if (vq->worker)
> 
> What happens to the pending work that queues for the old worker?

I didn't think there would be works queued at this time. I see your comment
below about my assumption about the backend being set being wrong. Will
discuss down there.


>>
>> +/* Caller must have device and virtqueue mutex */
>> +static void __vhost_vq_attach_worker(struct vhost_virtqueue *vq,
>> +struct vhost_worker *worker)
>> +{
>> +   refcount_inc(>refcount);
>> +   vhost_vq_detach_worker(vq);())
>> +   vq->worker = worker;
> 
> What happens if there's a pending flush in a specific device (e.g in
> vhost_scsi_tmf_resp_work())?

We wouldn't hit that specific case where we are running the above code and
vhost_scsi_tmf_resp_work.

Either:

1. The backend is NULL and has never been set, and so we would never have
called vhost_scsi_tmf_resp_work, because it can only be called after the
backend is set.

2. If the backed has been set and vhost_scsi_tmf_resp_work has
run or is running, then we when we would not have called 
__vhost_vq_attach_worker
from vhost_vq_attach_worker because it would see vhost_vq_get_backend
returning a non-NULL value.

If vhost_scsi later sets the backend to NULL, then vhost_scsi_clear_endpoint
will have made sure the flush has completed when the clear function returns.
It does that with the device mutex so when we run __vhost_vq_attach_worker
It will only see a vq/worker with no flushes in progress.

For the general case of can we be doing a vhost_dev_flush/vhost_work_flush_on
and __vhost_vq_attach_worker, then I thought we are ok as well because I
thought we have to currently have the device mutex when we flush so those can't
race with ioctl calls to vhost_vq_attach_worker since we hold the dev mutex
during that ioctls. Or we call flush from the file_operations release function 
so the device is closed and can't race with ioctls.

> 
> Does this mean we need to hold vq mutex when doing the flush? (which
> seems not the case of vhost_scsi_tmf_resp_work()).
> 
>> +}
>> +
>> +/* Caller must have device and virtqueue mutex */
>> +static int vhost_vq_attach_worker(struct vhost_virtqueue *vq,
>> + struct vhost_vring_worker *info)
>> +{
>> +   unsigned long index = info->worker_id;
>> +   struct vhost_dev *dev = vq->dev;
>> +   struct vhost_worker *worker;
>> +
>> +   if (!dev->use_worker)
>> +   return -EINVAL;
>> +
>> +   /*
>> +* We don't support setting a worker on an active vq to make flushing
>> +* and removal simple.
>> +*/
>> +   if (vhost_vq_get_backend(vq))
>> +   return -EBUSY;
> 
> This assumes the worker won't be started until the backend is set
> which is not true.

I can see how we get flushes before setting the backend, but I thought we are
ok because we have the device mutex held.

What are the other possible cases? Is one case we could get a
VHOST_SET_VRING_KICK before setting the backend and so vhost_poll_start starts
to allow vhost_poll_queue before the backend is set? Is there any thing else?

I'll fix this issue.


>> +
>> +/* Caller must have device mutex */
>> +static int vhost_free_worker(struct vhost_dev *dev,
>> +struct vhost_worker_state *info)
>> +{
>> +   unsigned long index = info->worker_id;
>> +   struct vhost_worker *worker;
>> +
>> +   if (!dev->use_worker)
>> +   return -EINVAL;
>> +
>> +   worker = xa_find(>worker_xa, , UINT_MAX, XA_PRESENT);
> 
> So we use int for worker_id which conflicts with UINT_MAX here?

I switched from idr in the last versions to xa last second and added this 
mistake.
Will fix.


> 
> struct vhost_worker_state {
> /*
>  * For VHOST_NEW_WORKER the kernel will return the new vhost_worker 
> id.
>  * For VHOST_FREE_WORKER this must be set to the id of the 
> vhost_worker
>  * to free.
>  */
> int worker_id;
> };
> 
> A side effect of using xa_index directly is that userspace can guess
> the xa_index of the default worker and free it here, is this intended?
I don't need the functionality. It was only something that I didn't
think mattered much, because you can only free the worker if it's not in
use by any virtqueues, so I didn't add any special code to handle it.
The user would 

Re: [PATCH v6 01/11] vhost: add vhost_worker pointer to vhost_virtqueue

2023-04-04 Thread Michael S. Tsirkin
On Mon, Mar 27, 2023 at 09:17:07PM -0500, Mike Christie wrote:
> This patchset allows userspace to map vqs to different workers. This
> patch adds a worker pointer to the vq so we can store that info.
> 
> Signed-off-by: Mike Christie 

Thanks! Conflicts with a bunch of refactorings upstream:
could you rebase this on my tree and repost?
I need to queue this soon so it gets time in -next.

> ---
>  drivers/vhost/vhost.c | 24 +---
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 4368ee9b999c..e041e116afee 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -486,6 +486,7 @@ void vhost_dev_init(struct vhost_dev *dev,
>   vq->log = NULL;
>   vq->indirect = NULL;
>   vq->heads = NULL;
> + vq->worker = NULL;
>   vq->dev = dev;
>   mutex_init(>mutex);
>   vhost_vq_reset(dev, vq);
> @@ -554,16 +555,15 @@ static void vhost_worker_free(struct vhost_dev *dev)
>   kfree(worker);
>  }
>  
> -static int vhost_worker_create(struct vhost_dev *dev)
> +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  {
>   struct vhost_worker *worker;
>   struct vhost_task *vtsk;
>   char name[TASK_COMM_LEN];
> - int ret;
>  
>   worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
>   if (!worker)
> - return -ENOMEM;
> + return NULL;
>  
>   dev->worker = worker;
>   worker->kcov_handle = kcov_common_handle();
> @@ -571,25 +571,24 @@ static int vhost_worker_create(struct vhost_dev *dev)
>   snprintf(name, sizeof(name), "vhost-%d", current->pid);
>  
>   vtsk = vhost_task_create(vhost_worker, worker, name);
> - if (!vtsk) {
> - ret = -ENOMEM;
> + if (!vtsk)
>   goto free_worker;
> - }
>  
>   worker->vtsk = vtsk;
>   vhost_task_start(vtsk);
> - return 0;
> + return worker;
>  
>  free_worker:
>   kfree(worker);
>   dev->worker = NULL;
> - return ret;
> + return NULL;
>  }
>  
>  /* Caller should have device mutex */
>  long vhost_dev_set_owner(struct vhost_dev *dev)
>  {
> - int err;
> + struct vhost_worker *worker;
> + int err, i;
>  
>   /* Is there an owner already? */
>   if (vhost_dev_has_owner(dev)) {
> @@ -600,9 +599,12 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>   vhost_attach_mm(dev);
>  
>   if (dev->use_worker) {
> - err = vhost_worker_create(dev);
> - if (err)
> + worker = vhost_worker_create(dev);
> + if (!worker)
>   goto err_worker;
> +
> + for (i = 0; i < dev->nvqs; i++)
> + dev->vqs[i]->worker = worker;
>   }
>  
>   err = vhost_dev_alloc_iovecs(dev);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 0308638cdeee..e72b665ba3a5 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -74,6 +74,7 @@ struct vhost_vring_call {
>  /* The virtqueue structure describes a queue attached to a device. */
>  struct vhost_virtqueue {
>   struct vhost_dev *dev;
> + struct vhost_worker *worker;
>  
>   /* The actual ring of buffers. */
>   struct mutex mutex;
> -- 
> 2.25.1

On Mon, Mar 27, 2023 at 09:17:08PM -0500, Mike Christie wrote:
> In the next patches each vq might have different workers so one could
> have work but others do not. For net, we only want to check specific vqs,
> so this adds a helper to check if a vq has work pending and converts
> vhost-net to use it.
> 
> Signed-off-by: Mike Christie 
> ---
>  drivers/vhost/net.c   | 2 +-
>  drivers/vhost/vhost.c | 6 +++---
>  drivers/vhost/vhost.h | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 07181cd8d52e..8ed63651b9eb 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -546,7 +546,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
>   endtime = busy_clock() + busyloop_timeout;
>  
>   while (vhost_can_busy_poll(endtime)) {
> - if (vhost_has_work(>dev)) {
> + if (vhost_vq_has_work(vq)) {
>   *busyloop_intr = true;
>   break;
>   }
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e041e116afee..6567aed69ebb 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -262,11 +262,11 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
> vhost_work *work)
>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>  
>  /* A lockless hint for busy polling code to exit the loop */
> -bool vhost_has_work(struct vhost_dev *dev)
> +bool vhost_vq_has_work(struct vhost_virtqueue *vq)
>  {
> - return dev->worker && !llist_empty(>worker->work_list);
> + return vq->worker && 

Re: [PATCH v4 01/11] lib/group_cpus: Export group_cpus_evenly()

2023-04-04 Thread Michael S. Tsirkin
On Thu, Mar 23, 2023 at 01:30:33PM +0800, Xie Yongji wrote:
> Export group_cpus_evenly() so that some modules
> can make use of it to group CPUs evenly according
> to NUMA and CPU locality.
> 
> Signed-off-by: Xie Yongji 
> Acked-by: Jason Wang 


Hi Thomas, do you ack merging this through my tree?
Thanks!

> ---
>  lib/group_cpus.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/lib/group_cpus.c b/lib/group_cpus.c
> index 9c837a35fef7..aa3f6815bb12 100644
> --- a/lib/group_cpus.c
> +++ b/lib/group_cpus.c
> @@ -426,3 +426,4 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
>   return masks;
>  }
>  #endif /* CONFIG_SMP */
> +EXPORT_SYMBOL_GPL(group_cpus_evenly);
> -- 
> 2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6] virtio: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-04 Thread Michael S. Tsirkin
On Sun, Apr 02, 2023 at 08:17:49AM +, Alvaro Karsz wrote:
> Hi Viktor,
> 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 4c3bb0ddeb9b..f9c6604352b4 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2752,6 +2752,23 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> >  }
> >  EXPORT_SYMBOL_GPL(vring_del_virtqueue);
> > 
> > +u32 vring_notification_data(struct virtqueue *_vq)
> > +{
> > +   struct vring_virtqueue *vq = to_vvq(_vq);
> > +   u16 next;
> > +
> > +   if (vq->packed_ring)
> > +   next = (vq->packed.next_avail_idx &
> > +   ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR))) |
> > +   vq->packed.avail_wrap_counter <<
> > +   VRING_PACKED_EVENT_F_WRAP_CTR;
> > +   else
> > +   next = vq->split.avail_idx_shadow;
> > +
> > +   return next << 16 | _vq->index;
> > +}
> > +EXPORT_SYMBOL_GPL(vring_notification_data);
> > +
> >  /* Manipulates transport-specific feature bits. */
> >  void vring_transport_features(struct virtio_device *vdev)
> >  {
> > @@ -2771,6 +2788,8 @@ void vring_transport_features(struct virtio_device 
> > *vdev)
> > break;
> > case VIRTIO_F_ORDER_PLATFORM:
> > break;
> > +   case VIRTIO_F_NOTIFICATION_DATA:
> > +   break;
> 
> This function is used by virtio_vdpa as well 
> (drivers/virtio/virtio_vdpa.c:virtio_vdpa_finalize_features).
> A vDPA device can offer this feature and it will be accepted, even though 
> VIRTIO_F_NOTIFICATION_DATA is not a thing for the vDPA transport at the 
> moment.
> 
> I don't know if this is bad, since offering VIRTIO_F_NOTIFICATION_DATA is 
> meaningless for a vDPA device at the moment.
> 
> I submitted a patch adding support for vDPA transport.
> https://lore.kernel.org/virtualization/20230402081034.1021886-1-alvaro.ka...@solid-run.com/T/#u

Hmm.  So it seems we need to first apply yours then this patch,
is that right? Or the other way around? What is the right way to make it not 
break bisect?
Do you mind including this patch with yours in a patchset
in the correct order?




> > default:
> > /* We don't understand this bit. */
> > __virtio_clear_bit(vdev, i);
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] vdpa/mlx5: Verify wq is a valid pointer in mlx5_vdpa_suspend

2023-04-04 Thread Michael S. Tsirkin
On Tue, Mar 28, 2023 at 10:18:10AM +0300, Eli Cohen wrote:
> mlx5_vdpa_suspend() flushes the workqueue as part of its logic. However,
> if the device has been deleted while a VM was running, the workqueue
> will be destroyed first and wq will become null. After the VM is destroyed,
> suspend can be called and will access a null pointer.
> 
> Fix it by verifying wq is not NULL.
> 
> Fixes: cae15c2ed8e6 ("vdpa/mlx5: Implement susupend virtqueue callback")
> Signed-off-by: Eli Cohen 


This conflicts with the patch for not losing link state updates.
How do you want to handle this?

> ---
> v1 -> v2:
> Fix spelling errors
> 
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 85866ace0061..b73c5943aefd 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2929,7 +2929,8 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
>   down_write(>reslock);
>   ndev->nb_registered = false;
>   mlx5_notifier_unregister(mvdev->mdev, >nb);
> - flush_workqueue(ndev->mvdev.wq);
> + if (ndev->mvdev.wq)
> + flush_workqueue(ndev->mvdev.wq);
>   for (i = 0; i < ndev->cur_num_vqs; i++) {
>   mvq = >vqs[i];
>   suspend_vq(ndev, mvq);
> -- 
> 2.38.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 9/9] vdpa_sim: add support for user VA

2023-04-04 Thread Stefano Garzarella
The new "use_va" module parameter (default: true) is used in
vdpa_alloc_device() to inform the vDPA framework that the device
supports VA.

vringh is initialized to use VA only when "use_va" is true and the
user's mm has been bound. So, only when the bus supports user VA
(e.g. vhost-vdpa).

vdpasim_mm_work_fn work is used to serialize the binding to a new
address space when the .bind_mm callback is invoked, and unbinding
when the .unbind_mm callback is invoked.

Call mmget_not_zero()/kthread_use_mm() inside the worker function
to pin the address space only as long as needed, following the
documentation of mmget() in include/linux/sched/mm.h:

  * Never use this function to pin this address space for an
  * unbounded/indefinite amount of time.

Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---

Notes:
v4:
- checked `use_va` in vdpasim_work_fn() [Jason]
- removed `va_enabled` variable now used only in the if condition
v3:
- called mmget_not_zero() before kthread_use_mm() [Jason]
  As the documentation of mmget() in include/linux/sched/mm.h says:

  * Never use this function to pin this address space for an
  * unbounded/indefinite amount of time.

  I moved mmget_not_zero/kthread_use_mm inside the worker function,
  this way we pin the address space only as long as needed.
  This is similar to what vfio_iommu_type1_dma_rw_chunk() does in
  drivers/vfio/vfio_iommu_type1.c
- simplified the mm bind/unbind [Jason]
- renamed vdpasim_worker_change_mm_sync() [Jason]
- fix commit message (s/default: false/default: true)
v2:
- `use_va` set to true by default [Eugenio]
- supported the new unbind_mm callback [Jason]
- removed the unbind_mm call in vdpasim_do_reset() [Jason]
- avoided to release the lock while call kthread_flush_work() since we
  are now using a mutex to protect the device state

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 97 +---
 2 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 4774292fba8c..3a42887d05d9 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -59,6 +59,7 @@ struct vdpasim {
struct vdpasim_virtqueue *vqs;
struct kthread_worker *worker;
struct kthread_work work;
+   struct mm_struct *mm_bound;
struct vdpasim_dev_attr dev_attr;
/* mutex to synchronize virtqueue state */
struct mutex mutex;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 2b2e439a66f7..2c706bb18897 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -35,10 +35,44 @@ module_param(max_iotlb_entries, int, 0444);
 MODULE_PARM_DESC(max_iotlb_entries,
 "Maximum number of iotlb entries for each address space. 0 
means unlimited. (default: 2048)");
 
+static bool use_va = true;
+module_param(use_va, bool, 0444);
+MODULE_PARM_DESC(use_va, "Enable/disable the device's ability to use VA");
+
 #define VDPASIM_QUEUE_ALIGN PAGE_SIZE
 #define VDPASIM_QUEUE_MAX 256
 #define VDPASIM_VENDOR_ID 0
 
+struct vdpasim_mm_work {
+   struct kthread_work work;
+   struct vdpasim *vdpasim;
+   struct mm_struct *mm_to_bind;
+   int ret;
+};
+
+static void vdpasim_mm_work_fn(struct kthread_work *work)
+{
+   struct vdpasim_mm_work *mm_work =
+   container_of(work, struct vdpasim_mm_work, work);
+   struct vdpasim *vdpasim = mm_work->vdpasim;
+
+   mm_work->ret = 0;
+
+   //TODO: should we attach the cgroup of the mm owner?
+   vdpasim->mm_bound = mm_work->mm_to_bind;
+}
+
+static void vdpasim_worker_change_mm_sync(struct vdpasim *vdpasim,
+ struct vdpasim_mm_work *mm_work)
+{
+   struct kthread_work *work = _work->work;
+
+   kthread_init_work(work, vdpasim_mm_work_fn);
+   kthread_queue_work(vdpasim->worker, work);
+
+   kthread_flush_work(work);
+}
+
 static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
 {
return container_of(vdpa, struct vdpasim, vdpa);
@@ -59,13 +93,20 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, 
unsigned int idx)
 {
struct vdpasim_virtqueue *vq = >vqs[idx];
uint16_t last_avail_idx = vq->vring.last_avail_idx;
-
-   vringh_init_iotlb(>vring, vdpasim->features, vq->num, true,
- (struct vring_desc *)(uintptr_t)vq->desc_addr,
- (struct vring_avail *)
- (uintptr_t)vq->driver_addr,
- (struct vring_used *)
- (uintptr_t)vq->device_addr);
+   struct vring_desc *desc = (struct vring_desc *)
+ (uintptr_t)vq->desc_addr;
+   struct vring_avail *avail = (struct vring_avail *)
+   

[PATCH v5 7/9] vdpa_sim: use kthread worker

2023-04-04 Thread Stefano Garzarella
Let's use our own kthread to run device jobs.
This allows us more flexibility, especially we can attach the kthread
to the user address space when vDPA uses user's VA.

Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- fix `dev` not initialized in the error path [Simon Horman]

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 19 +--
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index acee20faaf6a..ce83f9130a5d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -57,7 +57,8 @@ struct vdpasim_dev_attr {
 struct vdpasim {
struct vdpa_device vdpa;
struct vdpasim_virtqueue *vqs;
-   struct work_struct work;
+   struct kthread_worker *worker;
+   struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
/* spinlock to synchronize virtqueue state */
spinlock_t lock;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 2df5227e0b62..bd9f9054de94 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -11,8 +11,8 @@
 #include 
 #include 
 #include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -127,7 +127,7 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 static const struct vdpa_config_ops vdpasim_config_ops;
 static const struct vdpa_config_ops vdpasim_batch_config_ops;
 
-static void vdpasim_work_fn(struct work_struct *work)
+static void vdpasim_work_fn(struct kthread_work *work)
 {
struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
 
@@ -170,11 +170,17 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
 
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
-   INIT_WORK(>work, vdpasim_work_fn);
+   dev = >vdpa.dev;
+
+   kthread_init_work(>work, vdpasim_work_fn);
+   vdpasim->worker = kthread_create_worker(0, "vDPA sim worker: %s",
+   dev_attr->name);
+   if (IS_ERR(vdpasim->worker))
+   goto err_iommu;
+
spin_lock_init(>lock);
spin_lock_init(>iommu_lock);
 
-   dev = >vdpa.dev;
dev->dma_mask = >coherent_dma_mask;
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)))
goto err_iommu;
@@ -223,7 +229,7 @@ EXPORT_SYMBOL_GPL(vdpasim_create);
 
 void vdpasim_schedule_work(struct vdpasim *vdpasim)
 {
-   schedule_work(>work);
+   kthread_queue_work(vdpasim->worker, >work);
 }
 EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
 
@@ -623,7 +629,8 @@ static void vdpasim_free(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
 
-   cancel_work_sync(>work);
+   kthread_cancel_work_sync(>work);
+   kthread_destroy_worker(vdpasim->worker);
 
for (i = 0; i < vdpasim->dev_attr.nvqs; i++) {
vringh_kiov_cleanup(>vqs[i].out_iov);
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 8/9] vdpa_sim: replace the spinlock with a mutex to protect the state

2023-04-04 Thread Stefano Garzarella
The spinlock we use to protect the state of the simulator is sometimes
held for a long time (for example, when devices handle requests).

This also prevents us from calling functions that might sleep (such as
kthread_flush_work() in the next patch), and thus having to release
and retake the lock.

For these reasons, let's replace the spinlock with a mutex that gives
us more flexibility.

Suggested-by: Jason Wang 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h |  4 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 34 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  4 ++--
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  4 ++--
 4 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index ce83f9130a5d..4774292fba8c 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -60,8 +60,8 @@ struct vdpasim {
struct kthread_worker *worker;
struct kthread_work work;
struct vdpasim_dev_attr dev_attr;
-   /* spinlock to synchronize virtqueue state */
-   spinlock_t lock;
+   /* mutex to synchronize virtqueue state */
+   struct mutex mutex;
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index bd9f9054de94..2b2e439a66f7 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -178,7 +178,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
if (IS_ERR(vdpasim->worker))
goto err_iommu;
 
-   spin_lock_init(>lock);
+   mutex_init(>mutex);
spin_lock_init(>iommu_lock);
 
dev->dma_mask = >coherent_dma_mask;
@@ -286,13 +286,13 @@ static void vdpasim_set_vq_ready(struct vdpa_device 
*vdpa, u16 idx, bool ready)
struct vdpasim_virtqueue *vq = >vqs[idx];
bool old_ready;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
old_ready = vq->ready;
vq->ready = ready;
if (vq->ready && !old_ready) {
vdpasim_queue_ready(vdpasim, idx);
}
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 }
 
 static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
@@ -310,9 +310,9 @@ static int vdpasim_set_vq_state(struct vdpa_device *vdpa, 
u16 idx,
struct vdpasim_virtqueue *vq = >vqs[idx];
struct vringh *vrh = >vring;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vrh->last_avail_idx = state->split.avail_index;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -409,9 +409,9 @@ static u8 vdpasim_get_status(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
u8 status;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
status = vdpasim->status;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return status;
 }
@@ -420,19 +420,19 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, 
u8 status)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->status = status;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 }
 
 static int vdpasim_reset(struct vdpa_device *vdpa)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->status = 0;
vdpasim_do_reset(vdpasim);
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -441,9 +441,9 @@ static int vdpasim_suspend(struct vdpa_device *vdpa)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->running = false;
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -453,7 +453,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int i;
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
vdpasim->running = true;
 
if (vdpasim->pending_kick) {
@@ -464,7 +464,7 @@ static int vdpasim_resume(struct vdpa_device *vdpa)
vdpasim->pending_kick = false;
}
 
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
@@ -536,14 +536,14 @@ static int vdpasim_set_group_asid(struct vdpa_device 
*vdpa, unsigned int group,
 
iommu = >iommu[asid];
 
-   spin_lock(>lock);
+   mutex_lock(>mutex);
 
for (i = 0; i < vdpasim->dev_attr.nvqs; i++)
if (vdpasim_get_vq_group(vdpa, i) == group)
vringh_set_iotlb(>vqs[i].vring, iommu,
 >iommu_lock);
 
-   spin_unlock(>lock);
+   mutex_unlock(>mutex);
 
return 0;
 }
diff --git 

[PATCH v5 6/9] vdpa_sim: make devices agnostic for work management

2023-04-04 Thread Stefano Garzarella
Let's move work management inside the vdpa_sim core.
This way we can easily change how we manage the works, without
having to change the devices each time.

Acked-by: Eugenio Pérez Martin 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.h |  3 ++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 17 +++--
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  6 ++
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  6 ++
 4 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index 144858636c10..acee20faaf6a 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -45,7 +45,7 @@ struct vdpasim_dev_attr {
u32 ngroups;
u32 nas;
 
-   work_func_t work_fn;
+   void (*work_fn)(struct vdpasim *vdpasim);
void (*get_config)(struct vdpasim *vdpasim, void *config);
void (*set_config)(struct vdpasim *vdpasim, const void *config);
int (*get_stats)(struct vdpasim *vdpasim, u16 idx,
@@ -78,6 +78,7 @@ struct vdpasim {
 
 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *attr,
   const struct vdpa_dev_set_config *config);
+void vdpasim_schedule_work(struct vdpasim *vdpasim);
 
 /* TODO: cross-endian support */
 static inline bool vdpasim_is_little_endian(struct vdpasim *vdpasim)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index eea23c630f7c..2df5227e0b62 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -127,6 +127,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
 static const struct vdpa_config_ops vdpasim_config_ops;
 static const struct vdpa_config_ops vdpasim_batch_config_ops;
 
+static void vdpasim_work_fn(struct work_struct *work)
+{
+   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
+
+   vdpasim->dev_attr.work_fn(vdpasim);
+}
+
 struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
   const struct vdpa_dev_set_config *config)
 {
@@ -163,7 +170,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
 
vdpasim = vdpa_to_sim(vdpa);
vdpasim->dev_attr = *dev_attr;
-   INIT_WORK(>work, dev_attr->work_fn);
+   INIT_WORK(>work, vdpasim_work_fn);
spin_lock_init(>lock);
spin_lock_init(>iommu_lock);
 
@@ -214,6 +221,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr,
 }
 EXPORT_SYMBOL_GPL(vdpasim_create);
 
+void vdpasim_schedule_work(struct vdpasim *vdpasim)
+{
+   schedule_work(>work);
+}
+EXPORT_SYMBOL_GPL(vdpasim_schedule_work);
+
 static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
  u64 desc_area, u64 driver_area,
  u64 device_area)
@@ -248,7 +261,7 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 
idx)
}
 
if (vq->ready)
-   schedule_work(>work);
+   vdpasim_schedule_work(vdpasim);
 }
 
 static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
index 5117959bed8a..eb4897c8541e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -286,9 +285,8 @@ static bool vdpasim_blk_handle_req(struct vdpasim *vdpasim,
return handled;
 }
 
-static void vdpasim_blk_work(struct work_struct *work)
+static void vdpasim_blk_work(struct vdpasim *vdpasim)
 {
-   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
bool reschedule = false;
int i;
 
@@ -326,7 +324,7 @@ static void vdpasim_blk_work(struct work_struct *work)
spin_unlock(>lock);
 
if (reschedule)
-   schedule_work(>work);
+   vdpasim_schedule_work(vdpasim);
 }
 
 static void vdpasim_blk_get_config(struct vdpasim *vdpasim, void *config)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c 
b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
index 862f405362de..e61a9ecbfafe 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -192,9 +191,8 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim)
u64_stats_update_end(>cq_stats.syncp);
 }
 
-static void vdpasim_net_work(struct work_struct *work)
+static void vdpasim_net_work(struct vdpasim *vdpasim)
 {
-   struct vdpasim *vdpasim = container_of(work, struct vdpasim, work);
struct vdpasim_virtqueue *txq = >vqs[1];
struct vdpasim_virtqueue *rxq = >vqs[0];
struct vdpasim_net *net = sim_to_net(vdpasim);
@@ -260,7 +258,7 @@ static void vdpasim_net_work(struct work_struct *work)
   

[PATCH v5 5/9] vringh: support VA with iotlb

2023-04-04 Thread Stefano Garzarella
vDPA supports the possibility to use user VA in the iotlb messages.
So, let's add support for user VA in vringh to use it in the vDPA
simulators.

Acked-by: Eugenio Pérez 
Signed-off-by: Stefano Garzarella 
---

Notes:
v5:
- s/userpace/userspace/ in the vringh_init_iotlb_va doc [Simon]
v4:
- used uintptr_t for `io_addr` [Eugenio]
- added `io_addr` and `io_len` variables in iotlb_translate
- avoided overflow doing `map->addr - map->start + addr` [Jason]
- removed `is_iovec` field from struct iotlb_vec [Jason]
- added vringh_init_iotlb_va() [Jason]
v3:
- refactored avoiding code duplication [Eugenio]
v2:
- replace kmap_atomic() with kmap_local_page() [see previous patch]
- fix cast warnings when build with W=1 C=1

 include/linux/vringh.h |   9 +++
 drivers/vhost/vringh.c | 171 +
 2 files changed, 148 insertions(+), 32 deletions(-)

diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index 1991a02c6431..b4edfadf5479 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -32,6 +32,9 @@ struct vringh {
/* Can we get away with weak barriers? */
bool weak_barriers;
 
+   /* Use user's VA */
+   bool use_va;
+
/* Last available index we saw (ie. where we're up to). */
u16 last_avail_idx;
 
@@ -284,6 +287,12 @@ int vringh_init_iotlb(struct vringh *vrh, u64 features,
  struct vring_avail *avail,
  struct vring_used *used);
 
+int vringh_init_iotlb_va(struct vringh *vrh, u64 features,
+unsigned int num, bool weak_barriers,
+struct vring_desc *desc,
+struct vring_avail *avail,
+struct vring_used *used);
+
 int vringh_getdesc_iotlb(struct vringh *vrh,
 struct vringh_kiov *riov,
 struct vringh_kiov *wiov,
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 4aee230f7622..ab95160dcdd9 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1094,10 +1094,17 @@ EXPORT_SYMBOL(vringh_need_notify_kern);
 
 #if IS_REACHABLE(CONFIG_VHOST_IOTLB)
 
+struct iotlb_vec {
+   union {
+   struct iovec *iovec;
+   struct bio_vec *bvec;
+   } iov;
+   size_t count;
+};
+
 static int iotlb_translate(const struct vringh *vrh,
   u64 addr, u64 len, u64 *translated,
-  struct bio_vec iov[],
-  int iov_size, u32 perm)
+  struct iotlb_vec *ivec, u32 perm)
 {
struct vhost_iotlb_map *map;
struct vhost_iotlb *iotlb = vrh->iotlb;
@@ -1107,9 +1114,11 @@ static int iotlb_translate(const struct vringh *vrh,
spin_lock(vrh->iotlb_lock);
 
while (len > s) {
-   u64 size, pa, pfn;
+   uintptr_t io_addr;
+   size_t io_len;
+   u64 size;
 
-   if (unlikely(ret >= iov_size)) {
+   if (unlikely(ret >= ivec->count)) {
ret = -ENOBUFS;
break;
}
@@ -1124,10 +1133,22 @@ static int iotlb_translate(const struct vringh *vrh,
}
 
size = map->size - addr + map->start;
-   pa = map->addr + addr - map->start;
-   pfn = pa >> PAGE_SHIFT;
-   bvec_set_page([ret], pfn_to_page(pfn), min(len - s, size),
- pa & (PAGE_SIZE - 1));
+   io_len = min(len - s, size);
+   io_addr = map->addr - map->start + addr;
+
+   if (vrh->use_va) {
+   struct iovec *iovec = ivec->iov.iovec;
+
+   iovec[ret].iov_len = io_len;
+   iovec[ret].iov_base = (void __user *)io_addr;
+   } else {
+   u64 pfn = io_addr >> PAGE_SHIFT;
+   struct bio_vec *bvec = ivec->iov.bvec;
+
+   bvec_set_page([ret], pfn_to_page(pfn), io_len,
+ io_addr & (PAGE_SIZE - 1));
+   }
+
s += size;
addr += size;
++ret;
@@ -1146,23 +1167,36 @@ static int iotlb_translate(const struct vringh *vrh,
 static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
  void *src, size_t len)
 {
+   struct iotlb_vec ivec;
+   union {
+   struct iovec iovec[IOTLB_IOV_STRIDE];
+   struct bio_vec bvec[IOTLB_IOV_STRIDE];
+   } iov;
u64 total_translated = 0;
 
+   ivec.iov.iovec = iov.iovec;
+   ivec.count = IOTLB_IOV_STRIDE;
+
while (total_translated < len) {
-   struct bio_vec iov[IOTLB_IOV_STRIDE];
struct iov_iter iter;
u64 translated;
int ret;
 
ret = 

[PATCH v5 4/9] vringh: define the stride used for translation

2023-04-04 Thread Stefano Garzarella
Define a macro to be reused in the different parts of the code.

Useful for the next patches where we add more arrays to manage also
translations with user VA.

Suggested-by: Eugenio Perez Martin 
Signed-off-by: Stefano Garzarella 
---

Notes:
v4:
- added this patch with the changes extracted from the next patch [Eugenio]
- used _STRIDE instead of _SIZE [Eugenio]

 drivers/vhost/vringh.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 0ba3ef809e48..4aee230f7622 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1141,13 +1141,15 @@ static int iotlb_translate(const struct vringh *vrh,
return ret;
 }
 
+#define IOTLB_IOV_STRIDE 16
+
 static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
  void *src, size_t len)
 {
u64 total_translated = 0;
 
while (total_translated < len) {
-   struct bio_vec iov[16];
+   struct bio_vec iov[IOTLB_IOV_STRIDE];
struct iov_iter iter;
u64 translated;
int ret;
@@ -1180,7 +1182,7 @@ static inline int copy_to_iotlb(const struct vringh *vrh, 
void *dst,
u64 total_translated = 0;
 
while (total_translated < len) {
-   struct bio_vec iov[16];
+   struct bio_vec iov[IOTLB_IOV_STRIDE];
struct iov_iter iter;
u64 translated;
int ret;
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 3/9] vringh: replace kmap_atomic() with kmap_local_page()

2023-04-04 Thread Stefano Garzarella
kmap_atomic() is deprecated in favor of kmap_local_page() since commit
f3ba3c710ac5 ("mm/highmem: Provide kmap_local*").

With kmap_local_page() the mappings are per thread, CPU local, can take
page-faults, and can be called from any context (including interrupts).
Furthermore, the tasks can be preempted and, when they are scheduled to
run again, the kernel virtual addresses are restored and still valid.

kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels,
otherwise it only disables migration).

The code within the mappings/un-mappings in getu16_iotlb() and
putu16_iotlb() don't depend on the above-mentioned side effects of
kmap_atomic(), so that mere replacements of the old API with the new one
is all that is required (i.e., there is no need to explicitly add calls
to pagefault_disable() and/or preempt_disable()).

This commit reuses a "boiler plate" commit message from Fabio, who has
already did this change in several places.

Cc: "Fabio M. De Francesco" 
Reviewed-by: Fabio M. De Francesco 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---

Notes:
v3:
- credited Fabio for the commit message
- added reference to the commit that deprecated kmap_atomic() [Jason]
v2:
- added this patch since checkpatch.pl complained about deprecation
  of kmap_atomic() touched by next patch

 drivers/vhost/vringh.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index a1e27da54481..0ba3ef809e48 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1220,10 +1220,10 @@ static inline int getu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
 
-   kaddr = kmap_atomic(iov.bv_page);
+   kaddr = kmap_local_page(iov.bv_page);
from = kaddr + iov.bv_offset;
*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
-   kunmap_atomic(kaddr);
+   kunmap_local(kaddr);
 
return 0;
 }
@@ -1241,10 +1241,10 @@ static inline int putu16_iotlb(const struct vringh *vrh,
if (ret < 0)
return ret;
 
-   kaddr = kmap_atomic(iov.bv_page);
+   kaddr = kmap_local_page(iov.bv_page);
to = kaddr + iov.bv_offset;
WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
-   kunmap_atomic(kaddr);
+   kunmap_local(kaddr);
 
return 0;
 }
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 2/9] vhost-vdpa: use bind_mm/unbind_mm device callbacks

2023-04-04 Thread Stefano Garzarella
When the user call VHOST_SET_OWNER ioctl and the vDPA device
has `use_va` set to true, let's call the bind_mm callback.
In this way we can bind the device to the user address space
and directly use the user VA.

The unbind_mm callback is called during the release after
stopping the device.

Signed-off-by: Stefano Garzarella 
---

Notes:
v4:
- added new switch after vhost_dev_ioctl() [Jason]
v3:
- added `case VHOST_SET_OWNER` in vhost_vdpa_unlocked_ioctl() [Jason]
v2:
- call the new unbind_mm callback during the release [Jason]
- avoid to call bind_mm callback after the reset, since the device
  is not detaching it now during the reset

 drivers/vhost/vdpa.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..3824c249612f 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -219,6 +219,28 @@ static int vhost_vdpa_reset(struct vhost_vdpa *v)
return vdpa_reset(vdpa);
 }
 
+static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (!vdpa->use_va || !ops->bind_mm)
+   return 0;
+
+   return ops->bind_mm(vdpa, v->vdev.mm);
+}
+
+static void vhost_vdpa_unbind_mm(struct vhost_vdpa *v)
+{
+   struct vdpa_device *vdpa = v->vdpa;
+   const struct vdpa_config_ops *ops = vdpa->config;
+
+   if (!vdpa->use_va || !ops->unbind_mm)
+   return;
+
+   ops->unbind_mm(vdpa);
+}
+
 static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp)
 {
struct vdpa_device *vdpa = v->vdpa;
@@ -716,6 +738,17 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
break;
}
 
+   if (r)
+   goto out;
+
+   switch (cmd) {
+   case VHOST_SET_OWNER:
+   r = vhost_vdpa_bind_mm(v);
+   if (r)
+   vhost_dev_reset_owner(d, NULL);
+   break;
+   }
+out:
mutex_unlock(>mutex);
return r;
 }
@@ -1287,6 +1320,7 @@ static int vhost_vdpa_release(struct inode *inode, struct 
file *filep)
vhost_vdpa_clean_irq(v);
vhost_vdpa_reset(v);
vhost_dev_stop(>vdev);
+   vhost_vdpa_unbind_mm(v);
vhost_vdpa_config_put(v);
vhost_vdpa_cleanup(v);
mutex_unlock(>mutex);
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 0/9] vdpa_sim: add support for user VA

2023-04-04 Thread Stefano Garzarella
This series adds support for the use of user virtual addresses in the
vDPA simulator devices.

The main reason for this change is to lift the pinning of all guest memory.
Especially with virtio devices implemented in software.

The next step would be to generalize the code in vdpa-sim to allow the
implementation of in-kernel software devices. Similar to vhost, but using vDPA
so we can reuse the same software stack (e.g. in QEMU) for both HW and SW
devices.

For example, we have never merged vhost-blk, and lately there has been interest.
So it would be nice to do it directly with vDPA to reuse the same code in the
VMM for both HW and SW vDPA block devices.

The main problem (addressed by this series) was due to the pinning of all
guest memory, which thus prevented the overcommit of guest memory.

Thanks,
Stefano

Changelog listed in each patch.
v4: https://lore.kernel.org/lkml/20230324153607.46836-1-sgarz...@redhat.com/
v3: https://lore.kernel.org/lkml/20230321154228.182769-1-sgarz...@redhat.com/
v2: https://lore.kernel.org/lkml/20230302113421.174582-1-sgarz...@redhat.com/
RFC v1: 
https://lore.kernel.org/lkml/20221214163025.103075-1-sgarz...@redhat.com/

Stefano Garzarella (9):
  vdpa: add bind_mm/unbind_mm callbacks
  vhost-vdpa: use bind_mm/unbind_mm device callbacks
  vringh: replace kmap_atomic() with kmap_local_page()
  vringh: define the stride used for translation
  vringh: support VA with iotlb
  vdpa_sim: make devices agnostic for work management
  vdpa_sim: use kthread worker
  vdpa_sim: replace the spinlock with a mutex to protect the state
  vdpa_sim: add support for user VA

 drivers/vdpa/vdpa_sim/vdpa_sim.h |  11 +-
 include/linux/vdpa.h |  10 ++
 include/linux/vringh.h   |   9 ++
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 161 -
 drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  10 +-
 drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  10 +-
 drivers/vhost/vdpa.c |  34 ++
 drivers/vhost/vringh.c   | 173 ++-
 8 files changed, 340 insertions(+), 78 deletions(-)

-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v5 1/9] vdpa: add bind_mm/unbind_mm callbacks

2023-04-04 Thread Stefano Garzarella
These new optional callbacks is used to bind/unbind the device to
a specific address space so the vDPA framework can use VA when
these callbacks are implemented.

Suggested-by: Jason Wang 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- removed `struct task_struct *owner` param (unused for now, maybe
  useful to support cgroups) [Jason]
- add unbind_mm callback [Jason]

 include/linux/vdpa.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 43f59ef10cc9..369c21394284 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -290,6 +290,14 @@ struct vdpa_map_file {
  * @vdev: vdpa device
  * @idx: virtqueue index
  * Returns pointer to structure device or error 
(NULL)
+ * @bind_mm:   Bind the device to a specific address space
+ * so the vDPA framework can use VA when this
+ * callback is implemented. (optional)
+ * @vdev: vdpa device
+ * @mm: address space to bind
+ * @unbind_mm: Unbind the device from the address space
+ * bound using the bind_mm callback. (optional)
+ * @vdev: vdpa device
  * @free:  Free resources that belongs to vDPA (optional)
  * @vdev: vdpa device
  */
@@ -351,6 +359,8 @@ struct vdpa_config_ops {
int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group,
  unsigned int asid);
struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx);
+   int (*bind_mm)(struct vdpa_device *vdev, struct mm_struct *mm);
+   void (*unbind_mm)(struct vdpa_device *vdev);
 
/* Free device resources */
void (*free)(struct vdpa_device *vdev);
-- 
2.39.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-04 Thread Alvaro Karsz
> > I meant that next_wrap is for packed VQs, but I see your point, it's no 
> > clear from the comment.
> > I'll fix it in v2.
> 
> next_wrap works for split as well, spec said
> 
> "
> Without VIRTIO_F_RING_PACKED this is the most significant bit (bit 15)
> of the available index.
> "

You're right, so we can write:

 @kick_vq_with_data:Kick the virtqueue and supply extra data
(only if VIRTIO_F_NOTIFICATION_DATA is 
negotiated)
@vdev: vdpa device
@data for split virtqueue:
16 bits vqn and 16 bits next available index.
@data for packed virtqueue:
16 bits vqn, 15 least significant bits of
next available index and 1 bit next_wrap.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Xuan Zhuo
On Tue, 4 Apr 2023 16:03:49 +0800, Jason Wang  wrote:
> On Tue, Apr 4, 2023 at 3:12 PM Xuan Zhuo  wrote:
> >
> > On Tue, 4 Apr 2023 15:01:36 +0800, Jason Wang  wrote:
> > > On Tue, Apr 4, 2023 at 2:55 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 4 Apr 2023 14:35:05 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  
> > > > > > wrote:
> > > > > > > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang 
> > > > > > > >  wrote:
> > > > > > > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > At present, we have two similar logic to perform the XDP 
> > > > > > > > > > prog.
> > > > > > > > > >
> > > > > > > > > > Therefore, this PATCH separates the code of executing XDP, 
> > > > > > > > > > which is
> > > > > > > > > > conducive to later maintenance.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 142 
> > > > > > > > > > +--
> > > > > > > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c 
> > > > > > > > > > b/drivers/net/virtio_net.c
> > > > > > > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > > > > > > char padding[12];
> > > > > > > > > >  };
> > > > > > > > > >
> > > > > > > > > > +enum {
> > > > > > > > > > +   /* xdp pass */
> > > > > > > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > > > > > > +   /* drop packet. the caller needs to release the 
> > > > > > > > > > page. */
> > > > > > > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > > > > > > +   /* packet is consumed by xdp. the caller needs to 
> > > > > > > > > > do nothing. */
> > > > > > > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > > > > > > +};
> > > > > > > > >
> > > > > > > > > I'd prefer this to be done on top unless it is a must. But I 
> > > > > > > > > don't see
> > > > > > > > > any advantage of introducing this, it's partial mapping of 
> > > > > > > > > XDP action
> > > > > > > > > and it needs to be extended when XDP action is extended. (And 
> > > > > > > > > we've
> > > > > > > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > > > > > > >
> > > > > > > > No, these are the three states of buffer after XDP processing.
> > > > > > > >
> > > > > > > > * PASS: goto make skb
> > > > > > >
> > > > > > > XDP_PASS goes for this.
> > > > > > >
> > > > > > > > * DROP: we should release buffer
> > > > > > >
> > > > > > > XDP_DROP and error conditions go with this.
> > > > > > >
> > > > > > > > * CUNSUMED: xdp prog used the buffer, we do nothing
> > > > > > >
> > > > > > > XDP_TX/XDP_REDIRECTION goes for this.
> > > > > > >
> > > > > > > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > > > > > > conditions to the above three states.
> > > > > > >
> > > > > > > We can simply map error to XDP_DROP like:
> > > > > > >
> > > > > > >case XDP_TX:
> > > > > > >   stats->xdp_tx++;
> > > > > > >xdpf = xdp_convert_buff_to_frame(xdp);
> > > > > > >if (unlikely(!xdpf))
> > > > > > >return XDP_DROP;
> > > > > > >
> > > > > > > A good side effect is to avoid the xdp_xmit pointer to be passed 
> > > > > > > to
> > > > > > > the function.
> > > > > >
> > > > > >
> > > > > > So, I guess you mean this:
> > > > > >
> > > > > > switch (act) {
> > > > > > case XDP_PASS:
> > > > > > /* handle pass */
> > > > > > return skb;
> > > > > >
> > > > > > case XDP_TX:
> > > > > > *xdp_xmit |= VIRTIO_XDP_TX;
> > > > > > goto xmit;
> > > > > >
> > > > > > case XDP_REDIRECT:
> > > > > > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > > > > > goto xmit;
> > > > > >
> > > > > > case XDP_DROP:
> > > > > > default:
> > > > > > goto err_xdp;
> > > > > > }
> > > > > >
> > > > > > I have to say there is no problem from the perspective of code 
> > > > > > implementation.
> > > > >
> > > > > Note that this is the current logic where it is determined in
> > > > > receive_small() and receive_mergeable().
> > > >
> > > > Yes, but the purpose of this patches is to simplify the call.
> > >
> > > You mean simplify the receive_small()/mergeable()?
> >
> > YES.
> >
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the 
> > > > > > future, then
> > > > > > we must modify all the callers.
> 

Re: [PATCH] virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-04 Thread Jason Wang
On Tue, Apr 4, 2023 at 3:20 PM Alvaro Karsz  wrote:
>
> > > + * @kick_vq_with_data: Kick the virtqueue and supply extra data
> > > + * (only if VIRTIO_F_NOTIFICATION_DATA is 
> > > negotiated)
> > > + * @vdev: vdpa device
> > > + * @data: includes vqn, next_off and 
> > > next_wrap for
> > > + * packed virtqueues
> >
> > This needs some tweaking, VIRTIO_F_NOTIFICATION_DATA works for split
> > virtqueue as well.
> >
>
> I meant that next_wrap is for packed VQs, but I see your point, it's no clear 
> from the comment.
> I'll fix it in v2.

next_wrap works for split as well, spec said

"
Without VIRTIO_F_RING_PACKED this is the most significant bit (bit 15)
of the available index.
"

Thanks

>
> Thanks
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Jason Wang
On Tue, Apr 4, 2023 at 3:12 PM Xuan Zhuo  wrote:
>
> On Tue, 4 Apr 2023 15:01:36 +0800, Jason Wang  wrote:
> > On Tue, Apr 4, 2023 at 2:55 PM Xuan Zhuo  wrote:
> > >
> > > On Tue, 4 Apr 2023 14:35:05 +0800, Jason Wang  wrote:
> > > > On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang 
> > > > > > >  wrote:
> > > > > > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > At present, we have two similar logic to perform the XDP prog.
> > > > > > > > >
> > > > > > > > > Therefore, this PATCH separates the code of executing XDP, 
> > > > > > > > > which is
> > > > > > > > > conducive to later maintenance.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 142 
> > > > > > > > > +--
> > > > > > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c 
> > > > > > > > > b/drivers/net/virtio_net.c
> > > > > > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > > > > > char padding[12];
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > > +enum {
> > > > > > > > > +   /* xdp pass */
> > > > > > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > > > > > +   /* drop packet. the caller needs to release the page. 
> > > > > > > > > */
> > > > > > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > > > > > +   /* packet is consumed by xdp. the caller needs to do 
> > > > > > > > > nothing. */
> > > > > > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > > > > > +};
> > > > > > > >
> > > > > > > > I'd prefer this to be done on top unless it is a must. But I 
> > > > > > > > don't see
> > > > > > > > any advantage of introducing this, it's partial mapping of XDP 
> > > > > > > > action
> > > > > > > > and it needs to be extended when XDP action is extended. (And 
> > > > > > > > we've
> > > > > > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > > > > > >
> > > > > > > No, these are the three states of buffer after XDP processing.
> > > > > > >
> > > > > > > * PASS: goto make skb
> > > > > >
> > > > > > XDP_PASS goes for this.
> > > > > >
> > > > > > > * DROP: we should release buffer
> > > > > >
> > > > > > XDP_DROP and error conditions go with this.
> > > > > >
> > > > > > > * CUNSUMED: xdp prog used the buffer, we do nothing
> > > > > >
> > > > > > XDP_TX/XDP_REDIRECTION goes for this.
> > > > > >
> > > > > > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > > > > > conditions to the above three states.
> > > > > >
> > > > > > We can simply map error to XDP_DROP like:
> > > > > >
> > > > > >case XDP_TX:
> > > > > >   stats->xdp_tx++;
> > > > > >xdpf = xdp_convert_buff_to_frame(xdp);
> > > > > >if (unlikely(!xdpf))
> > > > > >return XDP_DROP;
> > > > > >
> > > > > > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > > > > > the function.
> > > > >
> > > > >
> > > > > So, I guess you mean this:
> > > > >
> > > > > switch (act) {
> > > > > case XDP_PASS:
> > > > > /* handle pass */
> > > > > return skb;
> > > > >
> > > > > case XDP_TX:
> > > > > *xdp_xmit |= VIRTIO_XDP_TX;
> > > > > goto xmit;
> > > > >
> > > > > case XDP_REDIRECT:
> > > > > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > > > > goto xmit;
> > > > >
> > > > > case XDP_DROP:
> > > > > default:
> > > > > goto err_xdp;
> > > > > }
> > > > >
> > > > > I have to say there is no problem from the perspective of code 
> > > > > implementation.
> > > >
> > > > Note that this is the current logic where it is determined in
> > > > receive_small() and receive_mergeable().
> > >
> > > Yes, but the purpose of this patches is to simplify the call.
> >
> > You mean simplify the receive_small()/mergeable()?
>
> YES.
>
>
> >
> > >
> > > >
> > > > >
> > > > > But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the 
> > > > > future, then
> > > > > we must modify all the callers.
> > > >
> > > > This is fine since we only use a single type for XDP action.
> > >
> > > a single type?
> >
> > Instead of (partial) duplicating XDP actions in the new enums.
>
>
> I think it's really misunderstand here. So your thought is these?
>
>VIRTNET_XDP_RES_PASS,
>VIRTNET_XDP_RES_TX_REDIRECT,
>VIRTNET_XDP_RES_DROP,

No, I meant the enum you 

Re: [PATCH v6 11/11] vhost: allow userspace to create workers

2023-04-04 Thread Jason Wang
On Tue, Mar 28, 2023 at 10:17 AM Mike Christie
 wrote:
>
> For vhost-scsi with 3 vqs and a workload like that tries to use those vqs
> like:
>
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=128  --numjobs=3
>
> the single vhost worker thread will become a bottlneck and we are stuck
> at around 500K IOPs no matter how many jobs, virtqueues, and CPUs are
> used.
>
> To better utilize virtqueues and available CPUs, this patch allows
> userspace to create workers and bind them to vqs. You can have N workers
> per dev and also share N workers with M vqs.
>
> With the patches and doing a worker per vq, we can scale to at least
> 16 vCPUs/vqs (that's my system limit) with the same command fio command
> above with numjobs=16:
>
> fio --filename=/dev/sdb  --direct=1 --rw=randrw --bs=4k \
> --ioengine=libaio --iodepth=64  --numjobs=16
>
> which gives around 2326K IOPs.
>
> Note that for testing I dropped depth to 64 above because the vhost/virt
> layer supports only 1024 total commands per device. And the only tuning I
> did was set LIO's emulate_pr to 0 to avoid LIO's PR lock in the main IO
> path which becomes an issue at around 12 jobs/virtqueues.
>
> Signed-off-by: Mike Christie 
> ---
>  drivers/vhost/vhost.c| 177 ---
>  drivers/vhost/vhost.h|   4 +-
>  include/uapi/linux/vhost.h   |  22 
>  include/uapi/linux/vhost_types.h |  15 +++
>  4 files changed, 204 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 1fa5e9a49092..e40699e83c6d 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -271,7 +271,11 @@ EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
>
>  void vhost_dev_flush(struct vhost_dev *dev)
>  {
> -   vhost_work_flush_on(dev->worker);
> +   struct vhost_worker *worker;
> +   unsigned long i;
> +
> +   xa_for_each(>worker_xa, i, worker)
> +   vhost_work_flush_on(worker);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_flush);
>
> @@ -489,7 +493,6 @@ void vhost_dev_init(struct vhost_dev *dev,
> dev->umem = NULL;
> dev->iotlb = NULL;
> dev->mm = NULL;
> -   dev->worker = NULL;
> dev->iov_limit = iov_limit;
> dev->weight = weight;
> dev->byte_weight = byte_weight;
> @@ -499,7 +502,7 @@ void vhost_dev_init(struct vhost_dev *dev,
> INIT_LIST_HEAD(>read_list);
> INIT_LIST_HEAD(>pending_list);
> spin_lock_init(>iotlb_lock);
> -
> +   xa_init_flags(>worker_xa, XA_FLAGS_ALLOC);
>
> for (i = 0; i < dev->nvqs; ++i) {
> vq = dev->vqs[i];
> @@ -562,32 +565,67 @@ static void vhost_detach_mm(struct vhost_dev *dev)
> dev->mm = NULL;
>  }
>
> -static void vhost_worker_free(struct vhost_dev *dev)
> +static void vhost_worker_put(struct vhost_dev *dev, struct vhost_worker 
> *worker)
>  {
> -   struct vhost_worker *worker = dev->worker;
> -
> if (!worker)
> return;
>
> -   dev->worker = NULL;
> +   if (!refcount_dec_and_test(>refcount))
> +   return;
> +
> WARN_ON(!llist_empty(>work_list));
> vhost_task_stop(worker->vtsk);
> kfree(worker);
>  }
>
> +static void vhost_vq_detach_worker(struct vhost_virtqueue *vq)
> +{
> +   if (vq->worker)

What happens to the pending work that queues for the old worker?

> +   vhost_worker_put(vq->dev, vq->worker);
> +   vq->worker = NULL;
> +}
> +
> +static void vhost_workers_free(struct vhost_dev *dev)
> +{
> +   struct vhost_worker *worker;
> +   unsigned long i;
> +
> +   if (!dev->use_worker)
> +   return;
> +
> +   for (i = 0; i < dev->nvqs; i++)
> +   vhost_vq_detach_worker(dev->vqs[i]);
> +   /*
> +* Drop the refcount taken during allocation, and handle the default
> +* worker and the cases where userspace might have crashed or was lazy
> +* and did a VHOST_NEW_WORKER but not a VHOST_FREE_WORKER.
> +*/
> +   xa_for_each(>worker_xa, i, worker) {
> +   xa_erase(>worker_xa, worker->id);
> +   vhost_worker_put(dev, worker);
> +   }
> +   xa_destroy(>worker_xa);
> +}
> +
>  static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  {
> struct vhost_worker *worker;
> struct vhost_task *vtsk;
> char name[TASK_COMM_LEN];
> +   int ret;
> +   u32 id;
>
> worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
> if (!worker)
> return NULL;
>
> -   dev->worker = worker;
> worker->kcov_handle = kcov_common_handle();
> init_llist_head(>work_list);
> +   /*
> +* We increase the refcount for the initial creation and then
> +* later each time it's attached to a virtqueue.
> +*/
> +   refcount_set(>refcount, 1);
> snprintf(name, sizeof(name), "vhost-%d", current->pid);
>
> 

Re: [PATCH resend 1/2] vdpa/snet: support getting and setting VQ state

2023-04-04 Thread Alvaro Karsz
> > So it will arrive to the pci subsystem in program order, but the pci 
> > subsystem may reorder, right?
> 
> This is not what I read from the above doc. It said "to a particular
> device will arrive in program order".

In this case, I can just remove the wmb and that's it, right?

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-04 Thread Alvaro Karsz
> > + * @kick_vq_with_data: Kick the virtqueue and supply extra data
> > + * (only if VIRTIO_F_NOTIFICATION_DATA is 
> > negotiated)
> > + * @vdev: vdpa device
> > + * @data: includes vqn, next_off and next_wrap 
> > for
> > + * packed virtqueues
> 
> This needs some tweaking, VIRTIO_F_NOTIFICATION_DATA works for split
> virtqueue as well.
> 

I meant that next_wrap is for packed VQs, but I see your point, it's no clear 
from the comment.
I'll fix it in v2.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Xuan Zhuo
On Tue, 4 Apr 2023 15:01:36 +0800, Jason Wang  wrote:
> On Tue, Apr 4, 2023 at 2:55 PM Xuan Zhuo  wrote:
> >
> > On Tue, 4 Apr 2023 14:35:05 +0800, Jason Wang  wrote:
> > > On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  
> > > > wrote:
> > > > > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  
> > > > > > wrote:
> > > > > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > At present, we have two similar logic to perform the XDP prog.
> > > > > > > >
> > > > > > > > Therefore, this PATCH separates the code of executing XDP, 
> > > > > > > > which is
> > > > > > > > conducive to later maintenance.
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 142 
> > > > > > > > +--
> > > > > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > > > > char padding[12];
> > > > > > > >  };
> > > > > > > >
> > > > > > > > +enum {
> > > > > > > > +   /* xdp pass */
> > > > > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > > > > +   /* drop packet. the caller needs to release the page. */
> > > > > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > > > > +   /* packet is consumed by xdp. the caller needs to do 
> > > > > > > > nothing. */
> > > > > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > > > > +};
> > > > > > >
> > > > > > > I'd prefer this to be done on top unless it is a must. But I 
> > > > > > > don't see
> > > > > > > any advantage of introducing this, it's partial mapping of XDP 
> > > > > > > action
> > > > > > > and it needs to be extended when XDP action is extended. (And 
> > > > > > > we've
> > > > > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > > > > >
> > > > > > No, these are the three states of buffer after XDP processing.
> > > > > >
> > > > > > * PASS: goto make skb
> > > > >
> > > > > XDP_PASS goes for this.
> > > > >
> > > > > > * DROP: we should release buffer
> > > > >
> > > > > XDP_DROP and error conditions go with this.
> > > > >
> > > > > > * CUNSUMED: xdp prog used the buffer, we do nothing
> > > > >
> > > > > XDP_TX/XDP_REDIRECTION goes for this.
> > > > >
> > > > > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > > > > conditions to the above three states.
> > > > >
> > > > > We can simply map error to XDP_DROP like:
> > > > >
> > > > >case XDP_TX:
> > > > >   stats->xdp_tx++;
> > > > >xdpf = xdp_convert_buff_to_frame(xdp);
> > > > >if (unlikely(!xdpf))
> > > > >return XDP_DROP;
> > > > >
> > > > > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > > > > the function.
> > > >
> > > >
> > > > So, I guess you mean this:
> > > >
> > > > switch (act) {
> > > > case XDP_PASS:
> > > > /* handle pass */
> > > > return skb;
> > > >
> > > > case XDP_TX:
> > > > *xdp_xmit |= VIRTIO_XDP_TX;
> > > > goto xmit;
> > > >
> > > > case XDP_REDIRECT:
> > > > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > > > goto xmit;
> > > >
> > > > case XDP_DROP:
> > > > default:
> > > > goto err_xdp;
> > > > }
> > > >
> > > > I have to say there is no problem from the perspective of code 
> > > > implementation.
> > >
> > > Note that this is the current logic where it is determined in
> > > receive_small() and receive_mergeable().
> >
> > Yes, but the purpose of this patches is to simplify the call.
>
> You mean simplify the receive_small()/mergeable()?

YES.


>
> >
> > >
> > > >
> > > > But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the 
> > > > future, then
> > > > we must modify all the callers.
> > >
> > > This is fine since we only use a single type for XDP action.
> >
> > a single type?
>
> Instead of (partial) duplicating XDP actions in the new enums.


I think it's really misunderstand here. So your thought is these?

   VIRTNET_XDP_RES_PASS,
   VIRTNET_XDP_RES_TX_REDIRECT,
   VIRTNET_XDP_RES_DROP,



>
> >
> > >
> > > > This is the benefit of using CUNSUMED.
> > >
> > > It's very hard to say, e.g if we want to support cloning in the future.
> >
> > cloning? You mean clone one new buffer.
> >
> > It is true that no matter what realization, the logic must be modified.
>
> Yes.
>
> >
> > >
> > > >
> > > > I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),

Re: [PATCH v6 04/11] vhost: take worker or vq instead of dev for flushing

2023-04-04 Thread Jason Wang
On Tue, Mar 28, 2023 at 10:17 AM Mike Christie
 wrote:
>
> This patch has the core work flush function take a worker. When we
> support multiple workers we can then flush each worker during device
> removal, stoppage, etc.
>
> Signed-off-by: Mike Christie 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vhost/vhost.c | 24 +++-
>  1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index cc2628ba9a77..6160aa1cc922 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -247,6 +247,20 @@ static void vhost_work_queue_on(struct vhost_worker 
> *worker,
> }
>  }
>
> +static void vhost_work_flush_on(struct vhost_worker *worker)
> +{
> +   struct vhost_flush_struct flush;
> +
> +   if (!worker)
> +   return;
> +
> +   init_completion(_event);
> +   vhost_work_init(, vhost_flush_work);
> +
> +   vhost_work_queue_on(worker, );
> +   wait_for_completion(_event);
> +}
> +
>  void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
>  {
> vhost_work_queue_on(dev->worker, work);
> @@ -261,15 +275,7 @@ EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
>
>  void vhost_dev_flush(struct vhost_dev *dev)
>  {
> -   struct vhost_flush_struct flush;
> -
> -   if (dev->worker) {
> -   init_completion(_event);
> -   vhost_work_init(, vhost_flush_work);
> -
> -   vhost_work_queue(dev, );
> -   wait_for_completion(_event);
> -   }
> +   vhost_work_flush_on(dev->worker);
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_flush);
>
> --
> 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 03/11] vhost: take worker or vq instead of dev for queueing

2023-04-04 Thread Jason Wang
On Tue, Mar 28, 2023 at 10:17 AM Mike Christie
 wrote:
>
> This patch has the core work queueing function take a worker for when we
> support multiple workers. It also adds a helper that takes a vq during
> queueing so modules can control which vq/worker to queue work on.
>
> This temp leaves vhost_work_queue. It will be removed when the drivers
> are converted in the next patches.
>
> Signed-off-by: Mike Christie 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vhost/vhost.c | 44 +++
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 29 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 6567aed69ebb..cc2628ba9a77 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -231,6 +231,34 @@ void vhost_poll_stop(struct vhost_poll *poll)
>  }
>  EXPORT_SYMBOL_GPL(vhost_poll_stop);
>
> +static void vhost_work_queue_on(struct vhost_worker *worker,
> +   struct vhost_work *work)
> +{
> +   if (!worker)
> +   return;
> +
> +   if (!test_and_set_bit(VHOST_WORK_QUEUED, >flags)) {
> +   /* We can only add the work to the list after we're
> +* sure it was not in the list.
> +* test_and_set_bit() implies a memory barrier.
> +*/
> +   llist_add(>node, >work_list);
> +   wake_up_process(worker->vtsk->task);
> +   }
> +}
> +
> +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
> +{
> +   vhost_work_queue_on(dev->worker, work);
> +}
> +EXPORT_SYMBOL_GPL(vhost_work_queue);
> +
> +void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *work)
> +{
> +   vhost_work_queue_on(vq->worker, work);
> +}
> +EXPORT_SYMBOL_GPL(vhost_vq_work_queue);
> +
>  void vhost_dev_flush(struct vhost_dev *dev)
>  {
> struct vhost_flush_struct flush;
> @@ -245,22 +273,6 @@ void vhost_dev_flush(struct vhost_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(vhost_dev_flush);
>
> -void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work)
> -{
> -   if (!dev->worker)
> -   return;
> -
> -   if (!test_and_set_bit(VHOST_WORK_QUEUED, >flags)) {
> -   /* We can only add the work to the list after we're
> -* sure it was not in the list.
> -* test_and_set_bit() implies a memory barrier.
> -*/
> -   llist_add(>node, >worker->work_list);
> -   wake_up_process(dev->worker->vtsk->task);
> -   }
> -}
> -EXPORT_SYMBOL_GPL(vhost_work_queue);
> -
>  /* A lockless hint for busy polling code to exit the loop */
>  bool vhost_vq_has_work(struct vhost_virtqueue *vq)
>  {
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 0dde119fb0ee..b64ee4ef387d 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -194,6 +194,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
>   struct vhost_log *log, unsigned int *log_num);
>  void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
>
> +void vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work 
> *work);
>  bool vhost_vq_has_work(struct vhost_virtqueue *vq);
>  bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
>  int vhost_vq_init_access(struct vhost_virtqueue *);
> --
> 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 02/11] vhost, vhost-net: add helper to check if vq has work

2023-04-04 Thread Jason Wang
On Tue, Mar 28, 2023 at 10:17 AM Mike Christie
 wrote:
>
> In the next patches each vq might have different workers so one could
> have work but others do not. For net, we only want to check specific vqs,
> so this adds a helper to check if a vq has work pending and converts
> vhost-net to use it.
>
> Signed-off-by: Mike Christie 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vhost/net.c   | 2 +-
>  drivers/vhost/vhost.c | 6 +++---
>  drivers/vhost/vhost.h | 2 +-
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 07181cd8d52e..8ed63651b9eb 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -546,7 +546,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
> endtime = busy_clock() + busyloop_timeout;
>
> while (vhost_can_busy_poll(endtime)) {
> -   if (vhost_has_work(>dev)) {
> +   if (vhost_vq_has_work(vq)) {
> *busyloop_intr = true;
> break;
> }
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e041e116afee..6567aed69ebb 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -262,11 +262,11 @@ void vhost_work_queue(struct vhost_dev *dev, struct 
> vhost_work *work)
>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>
>  /* A lockless hint for busy polling code to exit the loop */
> -bool vhost_has_work(struct vhost_dev *dev)
> +bool vhost_vq_has_work(struct vhost_virtqueue *vq)
>  {
> -   return dev->worker && !llist_empty(>worker->work_list);
> +   return vq->worker && !llist_empty(>worker->work_list);
>  }
> -EXPORT_SYMBOL_GPL(vhost_has_work);
> +EXPORT_SYMBOL_GPL(vhost_vq_has_work);
>
>  void vhost_poll_queue(struct vhost_poll *poll)
>  {
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index e72b665ba3a5..0dde119fb0ee 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -45,7 +45,6 @@ struct vhost_poll {
>
>  void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
>  void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
> -bool vhost_has_work(struct vhost_dev *dev);
>
>  void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
>  __poll_t mask, struct vhost_dev *dev);
> @@ -195,6 +194,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
>   struct vhost_log *log, unsigned int *log_num);
>  void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
>
> +bool vhost_vq_has_work(struct vhost_virtqueue *vq);
>  bool vhost_vq_is_setup(struct vhost_virtqueue *vq);
>  int vhost_vq_init_access(struct vhost_virtqueue *);
>  int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
> --
> 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v6 01/11] vhost: add vhost_worker pointer to vhost_virtqueue

2023-04-04 Thread Jason Wang
On Tue, Mar 28, 2023 at 10:17 AM Mike Christie
 wrote:
>
> This patchset allows userspace to map vqs to different workers. This
> patch adds a worker pointer to the vq so we can store that info.
>
> Signed-off-by: Mike Christie 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vhost/vhost.c | 24 +---
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 14 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 4368ee9b999c..e041e116afee 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -486,6 +486,7 @@ void vhost_dev_init(struct vhost_dev *dev,
> vq->log = NULL;
> vq->indirect = NULL;
> vq->heads = NULL;
> +   vq->worker = NULL;
> vq->dev = dev;
> mutex_init(>mutex);
> vhost_vq_reset(dev, vq);
> @@ -554,16 +555,15 @@ static void vhost_worker_free(struct vhost_dev *dev)
> kfree(worker);
>  }
>
> -static int vhost_worker_create(struct vhost_dev *dev)
> +static struct vhost_worker *vhost_worker_create(struct vhost_dev *dev)
>  {
> struct vhost_worker *worker;
> struct vhost_task *vtsk;
> char name[TASK_COMM_LEN];
> -   int ret;
>
> worker = kzalloc(sizeof(*worker), GFP_KERNEL_ACCOUNT);
> if (!worker)
> -   return -ENOMEM;
> +   return NULL;
>
> dev->worker = worker;
> worker->kcov_handle = kcov_common_handle();
> @@ -571,25 +571,24 @@ static int vhost_worker_create(struct vhost_dev *dev)
> snprintf(name, sizeof(name), "vhost-%d", current->pid);
>
> vtsk = vhost_task_create(vhost_worker, worker, name);
> -   if (!vtsk) {
> -   ret = -ENOMEM;
> +   if (!vtsk)
> goto free_worker;
> -   }
>
> worker->vtsk = vtsk;
> vhost_task_start(vtsk);
> -   return 0;
> +   return worker;
>
>  free_worker:
> kfree(worker);
> dev->worker = NULL;
> -   return ret;
> +   return NULL;
>  }
>
>  /* Caller should have device mutex */
>  long vhost_dev_set_owner(struct vhost_dev *dev)
>  {
> -   int err;
> +   struct vhost_worker *worker;
> +   int err, i;
>
> /* Is there an owner already? */
> if (vhost_dev_has_owner(dev)) {
> @@ -600,9 +599,12 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
> vhost_attach_mm(dev);
>
> if (dev->use_worker) {
> -   err = vhost_worker_create(dev);
> -   if (err)
> +   worker = vhost_worker_create(dev);
> +   if (!worker)
> goto err_worker;
> +
> +   for (i = 0; i < dev->nvqs; i++)
> +   dev->vqs[i]->worker = worker;
> }
>
> err = vhost_dev_alloc_iovecs(dev);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 0308638cdeee..e72b665ba3a5 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -74,6 +74,7 @@ struct vhost_vring_call {
>  /* The virtqueue structure describes a queue attached to a device. */
>  struct vhost_virtqueue {
> struct vhost_dev *dev;
> +   struct vhost_worker *worker;
>
> /* The actual ring of buffers. */
> struct mutex mutex;
> --
> 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Jason Wang
On Tue, Apr 4, 2023 at 2:55 PM Xuan Zhuo  wrote:
>
> On Tue, 4 Apr 2023 14:35:05 +0800, Jason Wang  wrote:
> > On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  wrote:
> > >
> > > On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> > > > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  
> > > > > wrote:
> > > > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo 
> > > > > >  wrote:
> > > > > > >
> > > > > > > At present, we have two similar logic to perform the XDP prog.
> > > > > > >
> > > > > > > Therefore, this PATCH separates the code of executing XDP, which 
> > > > > > > is
> > > > > > > conducive to later maintenance.
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 142 
> > > > > > > +--
> > > > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > > > char padding[12];
> > > > > > >  };
> > > > > > >
> > > > > > > +enum {
> > > > > > > +   /* xdp pass */
> > > > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > > > +   /* drop packet. the caller needs to release the page. */
> > > > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > > > +   /* packet is consumed by xdp. the caller needs to do 
> > > > > > > nothing. */
> > > > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > > > +};
> > > > > >
> > > > > > I'd prefer this to be done on top unless it is a must. But I don't 
> > > > > > see
> > > > > > any advantage of introducing this, it's partial mapping of XDP 
> > > > > > action
> > > > > > and it needs to be extended when XDP action is extended. (And we've
> > > > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > > > >
> > > > > No, these are the three states of buffer after XDP processing.
> > > > >
> > > > > * PASS: goto make skb
> > > >
> > > > XDP_PASS goes for this.
> > > >
> > > > > * DROP: we should release buffer
> > > >
> > > > XDP_DROP and error conditions go with this.
> > > >
> > > > > * CUNSUMED: xdp prog used the buffer, we do nothing
> > > >
> > > > XDP_TX/XDP_REDIRECTION goes for this.
> > > >
> > > > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > > > conditions to the above three states.
> > > >
> > > > We can simply map error to XDP_DROP like:
> > > >
> > > >case XDP_TX:
> > > >   stats->xdp_tx++;
> > > >xdpf = xdp_convert_buff_to_frame(xdp);
> > > >if (unlikely(!xdpf))
> > > >return XDP_DROP;
> > > >
> > > > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > > > the function.
> > >
> > >
> > > So, I guess you mean this:
> > >
> > > switch (act) {
> > > case XDP_PASS:
> > > /* handle pass */
> > > return skb;
> > >
> > > case XDP_TX:
> > > *xdp_xmit |= VIRTIO_XDP_TX;
> > > goto xmit;
> > >
> > > case XDP_REDIRECT:
> > > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > > goto xmit;
> > >
> > > case XDP_DROP:
> > > default:
> > > goto err_xdp;
> > > }
> > >
> > > I have to say there is no problem from the perspective of code 
> > > implementation.
> >
> > Note that this is the current logic where it is determined in
> > receive_small() and receive_mergeable().
>
> Yes, but the purpose of this patches is to simplify the call.

You mean simplify the receive_small()/mergeable()?

>
> >
> > >
> > > But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the 
> > > future, then
> > > we must modify all the callers.
> >
> > This is fine since we only use a single type for XDP action.
>
> a single type?

Instead of (partial) duplicating XDP actions in the new enums.

>
> >
> > > This is the benefit of using CUNSUMED.
> >
> > It's very hard to say, e.g if we want to support cloning in the future.
>
> cloning? You mean clone one new buffer.
>
> It is true that no matter what realization, the logic must be modified.

Yes.

>
> >
> > >
> > > I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
> > > which makes the caller not care too much about these details.
> >
> > This part I don't understand, having xdp_xmit means the caller need to
> > know whether it is xmited or redirected. The point of the enum is to
> > hide the XDP actions, but it's conflict with what xdp_xmit who want to
> > expose (part of) the XDP actions.
>
> I mean, no matter what virtnet_xdp_handler () returns? XDP_ACTION or some one 
> I
> defined, I want to hide the modification of xdp_xmit to virtnet_xdp_handler().
>
> Even if 

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Xuan Zhuo
On Tue, 4 Apr 2023 14:35:05 +0800, Jason Wang  wrote:
> On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  wrote:
> >
> > On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> > > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > At present, we have two similar logic to perform the XDP prog.
> > > > > >
> > > > > > Therefore, this PATCH separates the code of executing XDP, which is
> > > > > > conducive to later maintenance.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 142 
> > > > > > +--
> > > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > > char padding[12];
> > > > > >  };
> > > > > >
> > > > > > +enum {
> > > > > > +   /* xdp pass */
> > > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > > +   /* drop packet. the caller needs to release the page. */
> > > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > > +   /* packet is consumed by xdp. the caller needs to do 
> > > > > > nothing. */
> > > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > > +};
> > > > >
> > > > > I'd prefer this to be done on top unless it is a must. But I don't see
> > > > > any advantage of introducing this, it's partial mapping of XDP action
> > > > > and it needs to be extended when XDP action is extended. (And we've
> > > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > > >
> > > > No, these are the three states of buffer after XDP processing.
> > > >
> > > > * PASS: goto make skb
> > >
> > > XDP_PASS goes for this.
> > >
> > > > * DROP: we should release buffer
> > >
> > > XDP_DROP and error conditions go with this.
> > >
> > > > * CUNSUMED: xdp prog used the buffer, we do nothing
> > >
> > > XDP_TX/XDP_REDIRECTION goes for this.
> > >
> > > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > > conditions to the above three states.
> > >
> > > We can simply map error to XDP_DROP like:
> > >
> > >case XDP_TX:
> > >   stats->xdp_tx++;
> > >xdpf = xdp_convert_buff_to_frame(xdp);
> > >if (unlikely(!xdpf))
> > >return XDP_DROP;
> > >
> > > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > > the function.
> >
> >
> > So, I guess you mean this:
> >
> > switch (act) {
> > case XDP_PASS:
> > /* handle pass */
> > return skb;
> >
> > case XDP_TX:
> > *xdp_xmit |= VIRTIO_XDP_TX;
> > goto xmit;
> >
> > case XDP_REDIRECT:
> > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > goto xmit;
> >
> > case XDP_DROP:
> > default:
> > goto err_xdp;
> > }
> >
> > I have to say there is no problem from the perspective of code 
> > implementation.
>
> Note that this is the current logic where it is determined in
> receive_small() and receive_mergeable().

Yes, but the purpose of this patches is to simplify the call.

>
> >
> > But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the future, 
> > then
> > we must modify all the callers.
>
> This is fine since we only use a single type for XDP action.

a single type?

>
> > This is the benefit of using CUNSUMED.
>
> It's very hard to say, e.g if we want to support cloning in the future.

cloning? You mean clone one new buffer.

It is true that no matter what realization, the logic must be modified.

>
> >
> > I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
> > which makes the caller not care too much about these details.
>
> This part I don't understand, having xdp_xmit means the caller need to
> know whether it is xmited or redirected. The point of the enum is to
> hide the XDP actions, but it's conflict with what xdp_xmit who want to
> expose (part of) the XDP actions.

I mean, no matter what virtnet_xdp_handler () returns? XDP_ACTION or some one I
defined, I want to hide the modification of xdp_xmit to virtnet_xdp_handler().

Even if virtnet_xdp_handler() returns XDP_TX, we can also complete the
modification of XDP_XMIT within Virtnet_xdp_handler().


>
> > If you take into
> > account the problem of increasing the number of parameters, I advise to put 
> > it
> > in rq.
>
> I don't have strong opinion to introduce the enum,

OK, I will drop these new enums.

> what I want to say
> is, use a separated patch to do that.

Does this part refer to putting xdp_xmit in rq?

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Jason Wang
On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  wrote:
>
> On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  wrote:
> > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > At present, we have two similar logic to perform the XDP prog.
> > > > >
> > > > > Therefore, this PATCH separates the code of executing XDP, which is
> > > > > conducive to later maintenance.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo 
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 142 
> > > > > +--
> > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > char padding[12];
> > > > >  };
> > > > >
> > > > > +enum {
> > > > > +   /* xdp pass */
> > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > +   /* drop packet. the caller needs to release the page. */
> > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > +   /* packet is consumed by xdp. the caller needs to do nothing. 
> > > > > */
> > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > +};
> > > >
> > > > I'd prefer this to be done on top unless it is a must. But I don't see
> > > > any advantage of introducing this, it's partial mapping of XDP action
> > > > and it needs to be extended when XDP action is extended. (And we've
> > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > >
> > > No, these are the three states of buffer after XDP processing.
> > >
> > > * PASS: goto make skb
> >
> > XDP_PASS goes for this.
> >
> > > * DROP: we should release buffer
> >
> > XDP_DROP and error conditions go with this.
> >
> > > * CUNSUMED: xdp prog used the buffer, we do nothing
> >
> > XDP_TX/XDP_REDIRECTION goes for this.
> >
> > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > conditions to the above three states.
> >
> > We can simply map error to XDP_DROP like:
> >
> >case XDP_TX:
> >   stats->xdp_tx++;
> >xdpf = xdp_convert_buff_to_frame(xdp);
> >if (unlikely(!xdpf))
> >return XDP_DROP;
> >
> > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > the function.
>
>
> So, I guess you mean this:
>
> switch (act) {
> case XDP_PASS:
> /* handle pass */
> return skb;
>
> case XDP_TX:
> *xdp_xmit |= VIRTIO_XDP_TX;
> goto xmit;
>
> case XDP_REDIRECT:
> *xdp_xmit |= VIRTIO_XDP_REDIR;
> goto xmit;
>
> case XDP_DROP:
> default:
> goto err_xdp;
> }
>
> I have to say there is no problem from the perspective of code implementation.

Note that this is the current logic where it is determined in
receive_small() and receive_mergeable().

>
> But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the future, 
> then
> we must modify all the callers.

This is fine since we only use a single type for XDP action.

> This is the benefit of using CUNSUMED.

It's very hard to say, e.g if we want to support cloning in the future.

>
> I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
> which makes the caller not care too much about these details.

This part I don't understand, having xdp_xmit means the caller need to
know whether it is xmited or redirected. The point of the enum is to
hide the XDP actions, but it's conflict with what xdp_xmit who want to
expose (part of) the XDP actions.

> If you take into
> account the problem of increasing the number of parameters, I advise to put it
> in rq.

I don't have strong opinion to introduce the enum, what I want to say
is, use a separated patch to do that.

Thanks

>
> Thanks.
>
>
>
> >
> > >
> > > The latter two are not particularly related to XDP ACTION. And it does 
> > > not need
> > > to extend when XDP action is extended. At least I have not thought of this
> > > situation.
> >
> > What's the advantages of such indirection compared to using XDP action 
> > directly?
> >
> > Thanks
> >
> > >
> > >
> > > >
> > > > > +
> > > > >  static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void 
> > > > > *buf);
> > > > >  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void 
> > > > > *buf);
> > > > >
> > > > > @@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device 
> > > > > *dev,
> > > > > return ret;
> > > > >  }
> > > > >
> > > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct 
> > > > > xdp_buff *xdp,
> > > > > +  struct net_device *dev,
> > > > > +  

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-04 Thread Xuan Zhuo
On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  wrote:
> >
> > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  wrote:
> > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > At present, we have two similar logic to perform the XDP prog.
> > > >
> > > > Therefore, this PATCH separates the code of executing XDP, which is
> > > > conducive to later maintenance.
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/net/virtio_net.c | 142 +--
> > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > char padding[12];
> > > >  };
> > > >
> > > > +enum {
> > > > +   /* xdp pass */
> > > > +   VIRTNET_XDP_RES_PASS,
> > > > +   /* drop packet. the caller needs to release the page. */
> > > > +   VIRTNET_XDP_RES_DROP,
> > > > +   /* packet is consumed by xdp. the caller needs to do nothing. */
> > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > +};
> > >
> > > I'd prefer this to be done on top unless it is a must. But I don't see
> > > any advantage of introducing this, it's partial mapping of XDP action
> > > and it needs to be extended when XDP action is extended. (And we've
> > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> >
> > No, these are the three states of buffer after XDP processing.
> >
> > * PASS: goto make skb
>
> XDP_PASS goes for this.
>
> > * DROP: we should release buffer
>
> XDP_DROP and error conditions go with this.
>
> > * CUNSUMED: xdp prog used the buffer, we do nothing
>
> XDP_TX/XDP_REDIRECTION goes for this.
>
> So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> conditions to the above three states.
>
> We can simply map error to XDP_DROP like:
>
>case XDP_TX:
>   stats->xdp_tx++;
>xdpf = xdp_convert_buff_to_frame(xdp);
>if (unlikely(!xdpf))
>return XDP_DROP;
>
> A good side effect is to avoid the xdp_xmit pointer to be passed to
> the function.


So, I guess you mean this:

switch (act) {
case XDP_PASS:
/* handle pass */
return skb;

case XDP_TX:
*xdp_xmit |= VIRTIO_XDP_TX;
goto xmit;

case XDP_REDIRECT:
*xdp_xmit |= VIRTIO_XDP_REDIR;
goto xmit;

case XDP_DROP:
default:
goto err_xdp;
}

I have to say there is no problem from the perspective of code implementation.

But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the future, then
we must modify all the callers. This is the benefit of using CUNSUMED.

I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
which makes the caller not care too much about these details. If you take into
account the problem of increasing the number of parameters, I advise to put it
in rq.

Thanks.



>
> >
> > The latter two are not particularly related to XDP ACTION. And it does not 
> > need
> > to extend when XDP action is extended. At least I have not thought of this
> > situation.
>
> What's the advantages of such indirection compared to using XDP action 
> directly?
>
> Thanks
>
> >
> >
> > >
> > > > +
> > > >  static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void 
> > > > *buf);
> > > >  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void 
> > > > *buf);
> > > >
> > > > @@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > > > return ret;
> > > >  }
> > > >
> > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct 
> > > > xdp_buff *xdp,
> > > > +  struct net_device *dev,
> > > > +  unsigned int *xdp_xmit,
> > > > +  struct virtnet_rq_stats *stats)
> > > > +{
> > > > +   struct xdp_frame *xdpf;
> > > > +   int err;
> > > > +   u32 act;
> > > > +
> > > > +   act = bpf_prog_run_xdp(xdp_prog, xdp);
> > > > +   stats->xdp_packets++;
> > > > +
> > > > +   switch (act) {
> > > > +   case XDP_PASS:
> > > > +   return VIRTNET_XDP_RES_PASS;
> > > > +
> > > > +   case XDP_TX:
> > > > +   stats->xdp_tx++;
> > > > +   xdpf = xdp_convert_buff_to_frame(xdp);
> > > > +   if (unlikely(!xdpf))
> > > > +   return VIRTNET_XDP_RES_DROP;
> > > > +
> > > > +   err = virtnet_xdp_xmit(dev, 1, , 0);
> > > > +   if (unlikely(!err)) {
> > > > +   xdp_return_frame_rx_napi(xdpf);
> > > > +   } else if (unlikely(err < 0)) {
> > > > +