Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-03 Thread Xuan Zhuo
On Tue, 4 Apr 2023 14:35:05 +0800, Jason Wang  wrote:
> On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  wrote:
> >
> > On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> > > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  
> > > > wrote:
> > > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo 
> > > > >  wrote:
> > > > > >
> > > > > > At present, we have two similar logic to perform the XDP prog.
> > > > > >
> > > > > > Therefore, this PATCH separates the code of executing XDP, which is
> > > > > > conducive to later maintenance.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 142 
> > > > > > +--
> > > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > > char padding[12];
> > > > > >  };
> > > > > >
> > > > > > +enum {
> > > > > > +   /* xdp pass */
> > > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > > +   /* drop packet. the caller needs to release the page. */
> > > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > > +   /* packet is consumed by xdp. the caller needs to do 
> > > > > > nothing. */
> > > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > > +};
> > > > >
> > > > > I'd prefer this to be done on top unless it is a must. But I don't see
> > > > > any advantage of introducing this, it's partial mapping of XDP action
> > > > > and it needs to be extended when XDP action is extended. (And we've
> > > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > > >
> > > > No, these are the three states of buffer after XDP processing.
> > > >
> > > > * PASS: goto make skb
> > >
> > > XDP_PASS goes for this.
> > >
> > > > * DROP: we should release buffer
> > >
> > > XDP_DROP and error conditions go with this.
> > >
> > > > * CUNSUMED: xdp prog used the buffer, we do nothing
> > >
> > > XDP_TX/XDP_REDIRECTION goes for this.
> > >
> > > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > > conditions to the above three states.
> > >
> > > We can simply map error to XDP_DROP like:
> > >
> > >case XDP_TX:
> > >   stats->xdp_tx++;
> > >xdpf = xdp_convert_buff_to_frame(xdp);
> > >if (unlikely(!xdpf))
> > >return XDP_DROP;
> > >
> > > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > > the function.
> >
> >
> > So, I guess you mean this:
> >
> > switch (act) {
> > case XDP_PASS:
> > /* handle pass */
> > return skb;
> >
> > case XDP_TX:
> > *xdp_xmit |= VIRTIO_XDP_TX;
> > goto xmit;
> >
> > case XDP_REDIRECT:
> > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > goto xmit;
> >
> > case XDP_DROP:
> > default:
> > goto err_xdp;
> > }
> >
> > I have to say there is no problem from the perspective of code 
> > implementation.
>
> Note that this is the current logic where it is determined in
> receive_small() and receive_mergeable().

Yes, but the purpose of this patches is to simplify the call.

>
> >
> > But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the future, 
> > then
> > we must modify all the callers.
>
> This is fine since we only use a single type for XDP action.

a single type?

>
> > This is the benefit of using CUNSUMED.
>
> It's very hard to say, e.g if we want to support cloning in the future.

cloning? You mean clone one new buffer.

It is true that no matter what realization, the logic must be modified.

>
> >
> > I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
> > which makes the caller not care too much about these details.
>
> This part I don't understand, having xdp_xmit means the caller need to
> know whether it is xmited or redirected. The point of the enum is to
> hide the XDP actions, but it's conflict with what xdp_xmit who want to
> expose (part of) the XDP actions.

I mean, no matter what virtnet_xdp_handler () returns? XDP_ACTION or some one I
defined, I want to hide the modification of xdp_xmit to virtnet_xdp_handler().

Even if virtnet_xdp_handler() returns XDP_TX, we can also complete the
modification of XDP_XMIT within Virtnet_xdp_handler().


>
> > If you take into
> > account the problem of increasing the number of parameters, I advise to put 
> > it
> > in rq.
>
> I don't have strong opinion to introduce the enum,

OK, I will drop these new enums.

> what I want to say
> is, use a separated patch to do that.

Does this part refer to putting xdp_xmit in rq?

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-03 Thread Jason Wang
On Tue, Apr 4, 2023 at 2:22 PM Xuan Zhuo  wrote:
>
> On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> > On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  wrote:
> > > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > At present, we have two similar logic to perform the XDP prog.
> > > > >
> > > > > Therefore, this PATCH separates the code of executing XDP, which is
> > > > > conducive to later maintenance.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo 
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 142 
> > > > > +--
> > > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > > char padding[12];
> > > > >  };
> > > > >
> > > > > +enum {
> > > > > +   /* xdp pass */
> > > > > +   VIRTNET_XDP_RES_PASS,
> > > > > +   /* drop packet. the caller needs to release the page. */
> > > > > +   VIRTNET_XDP_RES_DROP,
> > > > > +   /* packet is consumed by xdp. the caller needs to do nothing. 
> > > > > */
> > > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > > +};
> > > >
> > > > I'd prefer this to be done on top unless it is a must. But I don't see
> > > > any advantage of introducing this, it's partial mapping of XDP action
> > > > and it needs to be extended when XDP action is extended. (And we've
> > > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> > >
> > > No, these are the three states of buffer after XDP processing.
> > >
> > > * PASS: goto make skb
> >
> > XDP_PASS goes for this.
> >
> > > * DROP: we should release buffer
> >
> > XDP_DROP and error conditions go with this.
> >
> > > * CUNSUMED: xdp prog used the buffer, we do nothing
> >
> > XDP_TX/XDP_REDIRECTION goes for this.
> >
> > So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> > conditions to the above three states.
> >
> > We can simply map error to XDP_DROP like:
> >
> >case XDP_TX:
> >   stats->xdp_tx++;
> >xdpf = xdp_convert_buff_to_frame(xdp);
> >if (unlikely(!xdpf))
> >return XDP_DROP;
> >
> > A good side effect is to avoid the xdp_xmit pointer to be passed to
> > the function.
>
>
> So, I guess you mean this:
>
> switch (act) {
> case XDP_PASS:
> /* handle pass */
> return skb;
>
> case XDP_TX:
> *xdp_xmit |= VIRTIO_XDP_TX;
> goto xmit;
>
> case XDP_REDIRECT:
> *xdp_xmit |= VIRTIO_XDP_REDIR;
> goto xmit;
>
> case XDP_DROP:
> default:
> goto err_xdp;
> }
>
> I have to say there is no problem from the perspective of code implementation.

Note that this is the current logic where it is determined in
receive_small() and receive_mergeable().

>
> But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the future, 
> then
> we must modify all the callers.

This is fine since we only use a single type for XDP action.

> This is the benefit of using CUNSUMED.

It's very hard to say, e.g if we want to support cloning in the future.

>
> I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
> which makes the caller not care too much about these details.

This part I don't understand, having xdp_xmit means the caller need to
know whether it is xmited or redirected. The point of the enum is to
hide the XDP actions, but it's conflict with what xdp_xmit who want to
expose (part of) the XDP actions.

> If you take into
> account the problem of increasing the number of parameters, I advise to put it
> in rq.

I don't have strong opinion to introduce the enum, what I want to say
is, use a separated patch to do that.

Thanks

>
> Thanks.
>
>
>
> >
> > >
> > > The latter two are not particularly related to XDP ACTION. And it does 
> > > not need
> > > to extend when XDP action is extended. At least I have not thought of this
> > > situation.
> >
> > What's the advantages of such indirection compared to using XDP action 
> > directly?
> >
> > Thanks
> >
> > >
> > >
> > > >
> > > > > +
> > > > >  static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void 
> > > > > *buf);
> > > > >  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void 
> > > > > *buf);
> > > > >
> > > > > @@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device 
> > > > > *dev,
> > > > > return ret;
> > > > >  }
> > > > >
> > > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct 
> > > > > xdp_buff *xdp,
> > > > > +  struct net_device *dev,
> > > > > +  uns

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-03 Thread Xuan Zhuo
On Tue, 4 Apr 2023 13:04:02 +0800, Jason Wang  wrote:
> On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  wrote:
> >
> > On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  wrote:
> > > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > At present, we have two similar logic to perform the XDP prog.
> > > >
> > > > Therefore, this PATCH separates the code of executing XDP, which is
> > > > conducive to later maintenance.
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/net/virtio_net.c | 142 +--
> > > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index bb426958cdd4..72b9d6ee4024 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > > char padding[12];
> > > >  };
> > > >
> > > > +enum {
> > > > +   /* xdp pass */
> > > > +   VIRTNET_XDP_RES_PASS,
> > > > +   /* drop packet. the caller needs to release the page. */
> > > > +   VIRTNET_XDP_RES_DROP,
> > > > +   /* packet is consumed by xdp. the caller needs to do nothing. */
> > > > +   VIRTNET_XDP_RES_CONSUMED,
> > > > +};
> > >
> > > I'd prefer this to be done on top unless it is a must. But I don't see
> > > any advantage of introducing this, it's partial mapping of XDP action
> > > and it needs to be extended when XDP action is extended. (And we've
> > > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
> >
> > No, these are the three states of buffer after XDP processing.
> >
> > * PASS: goto make skb
>
> XDP_PASS goes for this.
>
> > * DROP: we should release buffer
>
> XDP_DROP and error conditions go with this.
>
> > * CUNSUMED: xdp prog used the buffer, we do nothing
>
> XDP_TX/XDP_REDIRECTION goes for this.
>
> So t virtnet_xdp_handler() just maps XDP ACTION plus the error
> conditions to the above three states.
>
> We can simply map error to XDP_DROP like:
>
>case XDP_TX:
>   stats->xdp_tx++;
>xdpf = xdp_convert_buff_to_frame(xdp);
>if (unlikely(!xdpf))
>return XDP_DROP;
>
> A good side effect is to avoid the xdp_xmit pointer to be passed to
> the function.


So, I guess you mean this:

switch (act) {
case XDP_PASS:
/* handle pass */
return skb;

case XDP_TX:
*xdp_xmit |= VIRTIO_XDP_TX;
goto xmit;

case XDP_REDIRECT:
*xdp_xmit |= VIRTIO_XDP_REDIR;
goto xmit;

case XDP_DROP:
default:
goto err_xdp;
}

I have to say there is no problem from the perspective of code implementation.

But if the a new ACTION liking XDP_TX,XDP_REDIRECT is added in the future, then
we must modify all the callers. This is the benefit of using CUNSUMED.

I think it is a good advantage to put xdp_xmit in virtnet_xdp_handler(),
which makes the caller not care too much about these details. If you take into
account the problem of increasing the number of parameters, I advise to put it
in rq.

Thanks.



>
> >
> > The latter two are not particularly related to XDP ACTION. And it does not 
> > need
> > to extend when XDP action is extended. At least I have not thought of this
> > situation.
>
> What's the advantages of such indirection compared to using XDP action 
> directly?
>
> Thanks
>
> >
> >
> > >
> > > > +
> > > >  static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void 
> > > > *buf);
> > > >  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void 
> > > > *buf);
> > > >
> > > > @@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > > > return ret;
> > > >  }
> > > >
> > > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct 
> > > > xdp_buff *xdp,
> > > > +  struct net_device *dev,
> > > > +  unsigned int *xdp_xmit,
> > > > +  struct virtnet_rq_stats *stats)
> > > > +{
> > > > +   struct xdp_frame *xdpf;
> > > > +   int err;
> > > > +   u32 act;
> > > > +
> > > > +   act = bpf_prog_run_xdp(xdp_prog, xdp);
> > > > +   stats->xdp_packets++;
> > > > +
> > > > +   switch (act) {
> > > > +   case XDP_PASS:
> > > > +   return VIRTNET_XDP_RES_PASS;
> > > > +
> > > > +   case XDP_TX:
> > > > +   stats->xdp_tx++;
> > > > +   xdpf = xdp_convert_buff_to_frame(xdp);
> > > > +   if (unlikely(!xdpf))
> > > > +   return VIRTNET_XDP_RES_DROP;
> > > > +
> > > > +   err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> > > > +   if (unlikely(!err)) {
> > > > +   xdp_return_frame_rx_napi(xdpf);
> > > > +   } else if (unlikely(err < 0)) {
> > > > +   trace_xd

Re: [PATCH] virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support

2023-04-03 Thread Jason Wang
On Sun, Apr 2, 2023 at 4:10 PM Alvaro Karsz  wrote:
>
> Add VIRTIO_F_NOTIFICATION_DATA support for vDPA transport.
> If this feature is negotiated, the driver passes extra data when kicking
> a virtqueue.
>
> A device that offers this feature needs to implement the
> kick_vq_with_data callback.
>
> kick_vq_with_data receives the vDPA device and data.
> data includes the vqn, next_off and next_wrap for packed virtqueues.
>
> This patch follows a patch [1] by Viktor Prutyanov which adds support
> for the MMIO, channel I/O and modern PCI transports.
>
> This patch needs to be applied on top of Viktor's patch.
>
> [1] https://lore.kernel.org/lkml/20230324195029.2410503-1-vik...@daynix.com/
>
> Signed-off-by: Alvaro Karsz 
> ---
>  drivers/virtio/virtio_vdpa.c | 20 ++--
>  include/linux/vdpa.h |  6 ++
>  2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index d7f5af62dda..bdaf30f7fbf 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -112,6 +112,17 @@ static bool virtio_vdpa_notify(struct virtqueue *vq)
> return true;
>  }
>
> +static bool virtio_vdpa_notify_with_data(struct virtqueue *vq)
> +{
> +   struct vdpa_device *vdpa = vd_get_vdpa(vq->vdev);
> +   const struct vdpa_config_ops *ops = vdpa->config;
> +   u32 data = vring_notification_data(vq);
> +
> +   ops->kick_vq_with_data(vdpa, data);
> +
> +   return true;
> +}
> +
>  static irqreturn_t virtio_vdpa_config_cb(void *private)
>  {
> struct virtio_vdpa_device *vd_dev = private;
> @@ -138,6 +149,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
> int index,
> struct device *dma_dev;
> const struct vdpa_config_ops *ops = vdpa->config;
> struct virtio_vdpa_vq_info *info;
> +   bool (*notify)(struct virtqueue *vq);
> struct vdpa_callback cb;
> struct virtqueue *vq;
> u64 desc_addr, driver_addr, device_addr;
> @@ -154,6 +166,11 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, 
> unsigned int index,
> if (index >= vdpa->nvqs)
> return ERR_PTR(-ENOENT);
>
> +   if (__virtio_test_bit(vdev, VIRTIO_F_NOTIFICATION_DATA))
> +   notify = virtio_vdpa_notify_with_data;
> +   else
> +   notify = virtio_vdpa_notify;
> +
> /* Queue shouldn't already be set up. */
> if (ops->get_vq_ready(vdpa, index))
> return ERR_PTR(-ENOENT);
> @@ -183,8 +200,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned 
> int index,
> dma_dev = vdpa_get_dma_dev(vdpa);
> vq = vring_create_virtqueue_dma(index, max_num, align, vdev,
> true, may_reduce_num, ctx,
> -   virtio_vdpa_notify, callback,
> -   name, dma_dev);
> +   notify, callback, name, dma_dev);
> if (!vq) {
> err = -ENOMEM;
> goto error_new_virtqueue;
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 43f59ef10cc..a83bb0501c5 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -143,6 +143,11 @@ struct vdpa_map_file {
>   * @kick_vq:   Kick the virtqueue
>   * @vdev: vdpa device
>   * @idx: virtqueue index
> + * @kick_vq_with_data: Kick the virtqueue and supply extra data
> + * (only if VIRTIO_F_NOTIFICATION_DATA is 
> negotiated)
> + * @vdev: vdpa device
> + * @data: includes vqn, next_off and next_wrap 
> for
> + * packed virtqueues

This needs some tweaking, VIRTIO_F_NOTIFICATION_DATA works for split
virtqueue as well.

Thanks

>   * @set_vq_cb: Set the interrupt callback function for
>   * a virtqueue
>   * @vdev: vdpa device
> @@ -300,6 +305,7 @@ struct vdpa_config_ops {
>   u64 device_area);
> void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
> void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
> +   void (*kick_vq_with_data)(struct vdpa_device *vdev, u32 data);
> void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>   struct vdpa_callback *cb);
> void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
> --
> 2.34.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 3/8] virtio_net: introduce virtnet_xdp_handler() to seprate the logic of run xdp

2023-04-03 Thread Jason Wang
On Mon, Apr 3, 2023 at 12:17 PM Xuan Zhuo  wrote:
>
> On Mon, 3 Apr 2023 10:43:03 +0800, Jason Wang  wrote:
> > On Tue, Mar 28, 2023 at 8:04 PM Xuan Zhuo  
> > wrote:
> > >
> > > At present, we have two similar logic to perform the XDP prog.
> > >
> > > Therefore, this PATCH separates the code of executing XDP, which is
> > > conducive to later maintenance.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/net/virtio_net.c | 142 +--
> > >  1 file changed, 75 insertions(+), 67 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index bb426958cdd4..72b9d6ee4024 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -301,6 +301,15 @@ struct padded_vnet_hdr {
> > > char padding[12];
> > >  };
> > >
> > > +enum {
> > > +   /* xdp pass */
> > > +   VIRTNET_XDP_RES_PASS,
> > > +   /* drop packet. the caller needs to release the page. */
> > > +   VIRTNET_XDP_RES_DROP,
> > > +   /* packet is consumed by xdp. the caller needs to do nothing. */
> > > +   VIRTNET_XDP_RES_CONSUMED,
> > > +};
> >
> > I'd prefer this to be done on top unless it is a must. But I don't see
> > any advantage of introducing this, it's partial mapping of XDP action
> > and it needs to be extended when XDP action is extended. (And we've
> > already had: VIRTIO_XDP_REDIR and VIRTIO_XDP_TX ...)
>
> No, these are the three states of buffer after XDP processing.
>
> * PASS: goto make skb

XDP_PASS goes for this.

> * DROP: we should release buffer

XDP_DROP and error conditions go with this.

> * CUNSUMED: xdp prog used the buffer, we do nothing

XDP_TX/XDP_REDIRECTION goes for this.

So t virtnet_xdp_handler() just maps XDP ACTION plus the error
conditions to the above three states.

We can simply map error to XDP_DROP like:

   case XDP_TX:
  stats->xdp_tx++;
   xdpf = xdp_convert_buff_to_frame(xdp);
   if (unlikely(!xdpf))
   return XDP_DROP;

A good side effect is to avoid the xdp_xmit pointer to be passed to
the function.

>
> The latter two are not particularly related to XDP ACTION. And it does not 
> need
> to extend when XDP action is extended. At least I have not thought of this
> situation.

What's the advantages of such indirection compared to using XDP action directly?

Thanks

>
>
> >
> > > +
> > >  static void virtnet_rq_free_unused_buf(struct virtqueue *vq, void *buf);
> > >  static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
> > >
> > > @@ -789,6 +798,59 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > > return ret;
> > >  }
> > >
> > > +static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, struct 
> > > xdp_buff *xdp,
> > > +  struct net_device *dev,
> > > +  unsigned int *xdp_xmit,
> > > +  struct virtnet_rq_stats *stats)
> > > +{
> > > +   struct xdp_frame *xdpf;
> > > +   int err;
> > > +   u32 act;
> > > +
> > > +   act = bpf_prog_run_xdp(xdp_prog, xdp);
> > > +   stats->xdp_packets++;
> > > +
> > > +   switch (act) {
> > > +   case XDP_PASS:
> > > +   return VIRTNET_XDP_RES_PASS;
> > > +
> > > +   case XDP_TX:
> > > +   stats->xdp_tx++;
> > > +   xdpf = xdp_convert_buff_to_frame(xdp);
> > > +   if (unlikely(!xdpf))
> > > +   return VIRTNET_XDP_RES_DROP;
> > > +
> > > +   err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> > > +   if (unlikely(!err)) {
> > > +   xdp_return_frame_rx_napi(xdpf);
> > > +   } else if (unlikely(err < 0)) {
> > > +   trace_xdp_exception(dev, xdp_prog, act);
> > > +   return VIRTNET_XDP_RES_DROP;
> > > +   }
> > > +
> > > +   *xdp_xmit |= VIRTIO_XDP_TX;
> > > +   return VIRTNET_XDP_RES_CONSUMED;
> > > +
> > > +   case XDP_REDIRECT:
> > > +   stats->xdp_redirects++;
> > > +   err = xdp_do_redirect(dev, xdp, xdp_prog);
> > > +   if (err)
> > > +   return VIRTNET_XDP_RES_DROP;
> > > +
> > > +   *xdp_xmit |= VIRTIO_XDP_REDIR;
> > > +   return VIRTNET_XDP_RES_CONSUMED;
> > > +
> > > +   default:
> > > +   bpf_warn_invalid_xdp_action(dev, xdp_prog, act);
> > > +   fallthrough;
> > > +   case XDP_ABORTED:
> > > +   trace_xdp_exception(dev, xdp_prog, act);
> > > +   fallthrough;
> > > +   case XDP_DROP:
> > > +   return VIRTNET_XDP_RES_DROP;
> > > +   }
> > > +}
> > > +
> > >  static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> > >  {
> > > return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
> > > @@ -876,7 +938,6 @@ static struct sk_buff *receive_small(struct 
> > >

Re: [PATCH resend 1/2] vdpa/snet: support getting and setting VQ state

2023-04-03 Thread Jason Wang
On Mon, Apr 3, 2023 at 5:18 PM Alvaro Karsz  wrote:
>
> Hi Jason,
>
> > > +   /* Overwrite the control register with the new buffer size (in 4B 
> > > words) */
> > > +   snet_write_ctrl(regs, buf_words);
> > > +   /* Use a memory barrier, this must be written before the opcode 
> > > register. */
> > > +   wmb();
> >
> >
> > At least you need to use smp_wmb() but if you want to serialize MMIO
> > writes you can simply use spinlocks after the removal of mmiowb
> > work[1].
> >
>
> I'm not sure how a spinlock can help in this case.
> The entire control mechanism is protected by the spinlock snet->ctrl_lock, so 
> multiple CPUs won't use it simultaneously.
>
> > Note that Documentation/memory-barriers.txt said:
> >
> > 1. All readX() and writeX() accesses to the same peripheral are 
> > ordered
> >with respect to each other. This ensures that MMIO register 
> > accesses
> >by the same CPU thread to a particular device will arrive in 
> > program
> >order.
> >
>
> So it will arrive to the pci subsystem in program order, but the pci 
> subsystem may reorder, right?

This is not what I read from the above doc. It said "to a particular
device will arrive in program order".

> I can just issue a read after the writes, something like:
>
> write_control
> read_control
> write op
> read op
>
> What do you think?

This should work.

Thanks

>
> > > +   /* Clear the control register - clear the error code if previous 
> > > control operation failed */
> > > +   snet_write_ctrl(regs, 0);
> > > +
> > > +   /* Write opcode and VQ idx */
> > > +   val = opcode | (vq_idx << 16);
> > > +   snet_write_op(regs, val);
> >
> > I guess we need to serialize two writes here as well.
>
> Same here.
>
> Thanks
>
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 1/1] vdpa/mlx5: Support interrupt bypassing

2023-04-03 Thread Jason Wang


在 2023/4/4 00:20, Eli Cohen 写道:

Add support for generation of interrupts from the device directly to the
VM to the VCPU thus avoiding the overhead on the host CPU.

When supported, the driver will attempt to allocate vectors for each
data virtqueue. If a vector for a virtqueue cannot be provided it will
use the QP mode where notifications go through the driver.

In addition, we add a shutdown callback to make sure allocated
interrupts are released in case of shutdown to allow clean shutdown.

Signed-off-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
  drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
  2 files changed, 144 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 520646ae7fa0..215a46cf8a98 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -83,6 +83,7 @@ struct mlx5_vq_restore_info {
u64 driver_addr;
u16 avail_index;
u16 used_index;
+   struct msi_map map;
bool ready;
bool restore;
  };
@@ -118,6 +119,7 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+   struct msi_map map;



Just spot that this structure is defined in mlx5_vnet.c but it's nothing 
net specific. I think it's better to move it with the interrupt 
bypassing logic here to core/ (we can do it in the future).



  
  	/* keep last in the struct */

struct mlx5_vq_restore_info ri;
@@ -792,6 +794,13 @@ static bool counters_supported(const struct mlx5_vdpa_dev 
*mvdev)
   BIT_ULL(MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS);
  }
  
+static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev)

+{
+   return (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, event_mode) &
+   (1 << MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE) &&
+   pci_msix_can_alloc_dyn(mvdev->mdev->pdev));
+}
+
  static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
  {
int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
@@ -829,9 +838,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
if (vq_is_tx(mvq->index))
MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, 
ndev->res.tisn);
  
-	MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);

+   if (mvq->map.virq) {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->map.index);
+   } else {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, 
mvq->fwqp.mqp.qpn);
+   }
+
MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
-   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
 !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
@@ -1174,6 +1189,32 @@ static void counter_set_dealloc(struct mlx5_vdpa_net 
*ndev, struct mlx5_vdpa_vir
mlx5_vdpa_warn(&ndev->mvdev, "dealloc counter set 0x%x\n", 
mvq->counter_set_id);
  }
  
+static void alloc_vector(struct mlx5_vdpa_net *ndev,

+struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++) {
+   if (!irqp->entries[i].usecount) {
+   irqp->entries[i].usecount++;



If we can't get a usercount which is greater than 1, it might be better 
to use boolean instead.




+   mvq->map = irqp->entries[i].map;
+   return;
+   }
+   }
+}
+
+static void dealloc_vector(struct mlx5_vdpa_net *ndev,
+  struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++)
+   if (mvq->map.virq == irqp->entries[i].map.virq)
+   irqp->entries[i].usecount--;
+}
+
  static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
  {
u16 idx = mvq->index;
@@ -1203,27 +1244,31 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
  
  	err = counter_set_alloc(ndev, mvq);

if (err)
-   goto err_counter;
+   goto err_connect;
  
+	alloc_vector(ndev, mvq);

err = create_virtqueue(ndev, mvq);
if (err)
-   goto err_connect;
+   goto err_vq;
  
  	if (mvq->ready) {

err = modify_virtqueue(ndev, mvq, 
MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err) {
mlx5_vdpa_warn(&ndev

Re: [PATCH v3] vdpa/mlx5: Add and remove debugfs in setup/teardown driver

2023-04-03 Thread Jason Wang
On Mon, Apr 3, 2023 at 7:41 PM Eli Cohen  wrote:
>
> The right place to add the debugfs create is in
> setup_driver() and remove it in teardown_driver().
>
> Current code adds the debugfs when creating the device but resetting a
> device will remove the debugfs subtree and subsequent set_driver will
> not be able to create the files since the debugfs pointer is NULL.
>
> Fixes: 294221004322 ("vdpa/mlx5: Add debugfs subtree")
> Signed-off-by: Eli Cohen 

Acked-by: Jason Wang 

Thanks

>
> v3 -> v4:
> Fix error flow in setup_driver()
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 9df073b0dd56..9db9e5421485 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2491,10 +2491,11 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
> err = 0;
> goto out;
> }
> +   mlx5_vdpa_add_debugfs(ndev);
> err = setup_virtqueues(mvdev);
> if (err) {
> mlx5_vdpa_warn(mvdev, "setup_virtqueues\n");
> -   goto out;
> +   goto err_setup;
> }
>
> err = create_rqt(ndev);
> @@ -2524,6 +2525,8 @@ static int setup_driver(struct mlx5_vdpa_dev *mvdev)
> destroy_rqt(ndev);
>  err_rqt:
> teardown_virtqueues(ndev);
> +err_setup:
> +   mlx5_vdpa_remove_debugfs(ndev->debugfs);
>  out:
> return err;
>  }
> @@ -2537,6 +2540,8 @@ static void teardown_driver(struct mlx5_vdpa_net *ndev)
> if (!ndev->setup)
> return;
>
> +   mlx5_vdpa_remove_debugfs(ndev->debugfs);
> +   ndev->debugfs = NULL;
> teardown_steering(ndev);
> destroy_tir(ndev);
> destroy_rqt(ndev);
> @@ -3287,7 +3292,6 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
> *v_mdev, const char *name,
> if (err)
> goto err_reg;
>
> -   mlx5_vdpa_add_debugfs(ndev);
> mgtdev->ndev = ndev;
> return 0;
>
> --
> 2.38.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4] vdpa/mlx5: Avoid losing link state updates

2023-04-03 Thread Jason Wang
On Mon, Apr 3, 2023 at 7:41 PM Eli Cohen  wrote:
>
> Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
> negotiated. However, link state updates could be received before feature
> negotiation was completed , therefore causing link state events to be
> lost, possibly leaving the link state down.
>
> Modify the code so link state notifier is registered after DRIVER_OK was
> negotiated and carry the registration only if
> VIRTIO_NET_F_STATUS was negotiated.  Unregister the notifier when the
> device is reset.
>
> Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on 
> feature bits")
> Signed-off-by: Eli Cohen 
>
> v3 -> v4:
> move registartion to mlx5_vdpa_set_status()
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 203 +-
>  1 file changed, 114 insertions(+), 89 deletions(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 9db9e5421485..3388f5f90036 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -2322,6 +2322,112 @@ static void update_cvq_info(struct mlx5_vdpa_dev 
> *mvdev)
> }
>  }
>
> +static u8 query_vport_state(struct mlx5_core_dev *mdev, u8 opmod, u16 vport)
> +{
> +   u32 out[MLX5_ST_SZ_DW(query_vport_state_out)] = {};
> +   u32 in[MLX5_ST_SZ_DW(query_vport_state_in)] = {};
> +   int err;
> +
> +   MLX5_SET(query_vport_state_in, in, opcode, 
> MLX5_CMD_OP_QUERY_VPORT_STATE);
> +   MLX5_SET(query_vport_state_in, in, op_mod, opmod);
> +   MLX5_SET(query_vport_state_in, in, vport_number, vport);
> +   if (vport)
> +   MLX5_SET(query_vport_state_in, in, other_vport, 1);
> +
> +   err = mlx5_cmd_exec_inout(mdev, query_vport_state, in, out);
> +   if (err)
> +   return 0;
> +
> +   return MLX5_GET(query_vport_state_out, out, state);
> +}
> +
> +static bool get_link_state(struct mlx5_vdpa_dev *mvdev)
> +{
> +   if (query_vport_state(mvdev->mdev, 
> MLX5_VPORT_STATE_OP_MOD_VNIC_VPORT, 0) ==
> +   VPORT_STATE_UP)
> +   return true;
> +
> +   return false;
> +}
> +
> +static void update_carrier(struct work_struct *work)
> +{
> +   struct mlx5_vdpa_wq_ent *wqent;
> +   struct mlx5_vdpa_dev *mvdev;
> +   struct mlx5_vdpa_net *ndev;
> +
> +   wqent = container_of(work, struct mlx5_vdpa_wq_ent, work);
> +   mvdev = wqent->mvdev;
> +   ndev = to_mlx5_vdpa_ndev(mvdev);
> +   if (get_link_state(mvdev))
> +   ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, 
> VIRTIO_NET_S_LINK_UP);
> +   else
> +   ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, 
> ~VIRTIO_NET_S_LINK_UP);
> +
> +   if (ndev->config_cb.callback)
> +   ndev->config_cb.callback(ndev->config_cb.private);
> +
> +   kfree(wqent);
> +}
> +
> +static int queue_link_work(struct mlx5_vdpa_net *ndev)
> +{
> +   struct mlx5_vdpa_wq_ent *wqent;
> +
> +   wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC);
> +   if (!wqent)
> +   return -ENOMEM;
> +
> +   wqent->mvdev = &ndev->mvdev;
> +   INIT_WORK(&wqent->work, update_carrier);
> +   queue_work(ndev->mvdev.wq, &wqent->work);
> +   return 0;
> +}
> +
> +static int event_handler(struct notifier_block *nb, unsigned long event, 
> void *param)
> +{
> +   struct mlx5_vdpa_net *ndev = container_of(nb, struct mlx5_vdpa_net, 
> nb);
> +   struct mlx5_eqe *eqe = param;
> +   int ret = NOTIFY_DONE;
> +
> +   if (event == MLX5_EVENT_TYPE_PORT_CHANGE) {
> +   switch (eqe->sub_type) {
> +   case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
> +   case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE:
> +   if (queue_link_work(ndev))
> +   return NOTIFY_DONE;
> +
> +   ret = NOTIFY_OK;
> +   break;
> +   default:
> +   return NOTIFY_DONE;
> +   }
> +   return ret;
> +   }
> +   return ret;
> +}
> +
> +static void register_link_notifier(struct mlx5_vdpa_net *ndev)
> +{
> +   if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_STATUS)))
> +   return;
> +
> +   ndev->nb.notifier_call = event_handler;
> +   mlx5_notifier_register(ndev->mvdev.mdev, &ndev->nb);
> +   ndev->nb_registered = true;
> +   queue_link_work(ndev);
> +}
> +
> +static void unregister_link_notifier(struct mlx5_vdpa_net *ndev)
> +{
> +   if (!ndev->nb_registered)
> +   return;
> +
> +   ndev->nb_registered = false;
> +   mlx5_notifier_unregister(ndev->mvdev.mdev, &ndev->nb);
> +   flush_workqueue(ndev->mvdev.wq);
> +}
> +
>  static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 
> features)
>  {
> struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> @@ -2335,6 +2441,7 @@ static int mlx5_vdpa_set_driver_features(struct 
> vdpa_device *vdev, u64 

Re: [PATCH 1/1] vdpa/mlx5: Support interrupt bypassing

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 07:20:39PM +0300, Eli Cohen wrote:
> Add support for generation of interrupts from the device directly to the
> VM to the VCPU thus avoiding the overhead on the host CPU.
> 
> When supported, the driver will attempt to allocate vectors for each
> data virtqueue. If a vector for a virtqueue cannot be provided it will
> use the QP mode where notifications go through the driver.
> 
> In addition, we add a shutdown callback to make sure allocated
> interrupts are released in case of shutdown to allow clean shutdown.
> 
> Signed-off-by: Eli Cohen 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
>  drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
>  2 files changed, 144 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 520646ae7fa0..215a46cf8a98 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -83,6 +83,7 @@ struct mlx5_vq_restore_info {
>   u64 driver_addr;
>   u16 avail_index;
>   u16 used_index;
> + struct msi_map map;
>   bool ready;
>   bool restore;
>  };
> @@ -118,6 +119,7 @@ struct mlx5_vdpa_virtqueue {
>   u16 avail_idx;
>   u16 used_idx;
>   int fw_state;
> + struct msi_map map;
>  
>   /* keep last in the struct */
>   struct mlx5_vq_restore_info ri;
> @@ -792,6 +794,13 @@ static bool counters_supported(const struct 
> mlx5_vdpa_dev *mvdev)
>  BIT_ULL(MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS);
>  }
>  
> +static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev)
> +{
> + return (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, event_mode) &
> + (1 << MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE) &&
> + pci_msix_can_alloc_dyn(mvdev->mdev->pdev));

Don't add () around return value. too many () just obscures the logic.


> +}
> +
>  static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct 
> mlx5_vdpa_virtqueue *mvq)
>  {
>   int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
> @@ -829,9 +838,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtque
>   if (vq_is_tx(mvq->index))
>   MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, 
> ndev->res.tisn);
>  
> - MLX5_SET(virtio_q, vq_ctx, event_mode, 
> MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
> + if (mvq->map.virq) {
> + MLX5_SET(virtio_q, vq_ctx, event_mode, 
> MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE);
> + MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->map.index);
> + } else {
> + MLX5_SET(virtio_q, vq_ctx, event_mode, 
> MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
> + MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, 
> mvq->fwqp.mqp.qpn);
> + }
> +
>   MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
> - MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
>   MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
>   MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
>!!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
> @@ -1174,6 +1189,32 @@ static void counter_set_dealloc(struct mlx5_vdpa_net 
> *ndev, struct mlx5_vdpa_vir
>   mlx5_vdpa_warn(&ndev->mvdev, "dealloc counter set 0x%x\n", 
> mvq->counter_set_id);
>  }
>  
> +static void alloc_vector(struct mlx5_vdpa_net *ndev,
> +  struct mlx5_vdpa_virtqueue *mvq)
> +{
> + struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
> + int i;
> +
> + for (i = 0; i < irqp->num_ent; i++) {
> + if (!irqp->entries[i].usecount) {
> + irqp->entries[i].usecount++;
> + mvq->map = irqp->entries[i].map;
> + return;
> + }
> + }
> +}
> +
> +static void dealloc_vector(struct mlx5_vdpa_net *ndev,
> +struct mlx5_vdpa_virtqueue *mvq)
> +{
> + struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
> + int i;
> +
> + for (i = 0; i < irqp->num_ent; i++)
> + if (mvq->map.virq == irqp->entries[i].map.virq)
> + irqp->entries[i].usecount--;
> +}
> +
>  static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
> *mvq)
>  {
>   u16 idx = mvq->index;
> @@ -1203,27 +1244,31 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, 
> struct mlx5_vdpa_virtqueue *mvq)
>  
>   err = counter_set_alloc(ndev, mvq);
>   if (err)
> - goto err_counter;
> + goto err_connect;
>  
> + alloc_vector(ndev, mvq);
>   err = create_virtqueue(ndev, mvq);
>   if (err)
> - goto err_connect;
> + goto err_vq;
>  
>   if (mvq->ready) {
>   err = modify_virtqueue(ndev, mvq, 
> MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
>   if (err) {
>   mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready 
> vq idx %d(%d)\n",
> 

Re: [PATCH 0/1] posted interrtups support for mlx5

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 02:09:46PM -0400, Michael S. Tsirkin wrote:
> On Mon, Apr 03, 2023 at 01:00:16PM -0400, Parav Pandit wrote:
> > 
> > 
> > On 4/3/2023 12:20 PM, Eli Cohen wrote:
> > > Hi,
> > > 
> > > the single patch in this series adds support for posted interrupts in
> > > mlx5.
> > > 
> > > It depends on the patch series already accpted can be seen here:
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20230324231341.29808-1-sa...@kernel.org/
> > > 
> > > git pull  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
> > > tags/mlx5-updates-2023-03-20
> > > 
> > > Eli Cohen (1):
> > >vdpa/mlx5: Support interrupt bypassing
> > > 
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
> > >   2 files changed, 144 insertions(+), 9 deletions(-)
> > > 
> > In subject
> > 
> > s/interrtups/interrupts/
> 
> In fact if we are splitting hairs, this is a reduced relative (full form
> would be "support of posted interrupts") so it should not use a plural
> form. Also pls use a prefix in the patch subject. In the end:
>   mlx5: posted interrupt support
> would be better.

Having said that, it's 0/1 so typos do not matter, they are not
in commit log.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/1] posted interrtups support for mlx5

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 01:00:16PM -0400, Parav Pandit wrote:
> 
> 
> On 4/3/2023 12:20 PM, Eli Cohen wrote:
> > Hi,
> > 
> > the single patch in this series adds support for posted interrupts in
> > mlx5.
> > 
> > It depends on the patch series already accpted can be seen here:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20230324231341.29808-1-sa...@kernel.org/
> > 
> > git pull  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
> > tags/mlx5-updates-2023-03-20
> > 
> > Eli Cohen (1):
> >vdpa/mlx5: Support interrupt bypassing
> > 
> >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
> >   drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
> >   2 files changed, 144 insertions(+), 9 deletions(-)
> > 
> In subject
> 
> s/interrtups/interrupts/

In fact if we are splitting hairs, this is a reduced relative (full form
would be "support of posted interrupts") so it should not use a plural
form. Also pls use a prefix in the patch subject. In the end:
mlx5: posted interrupt support
would be better.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] [PATCH v2 1/2] virtio-blk: migrate to the latest patchset version

2023-04-03 Thread Michael S. Tsirkin
On Thu, Mar 30, 2023 at 05:49:52PM -0400, Dmitry Fomichev wrote:
> The merged patch series to support zoned block devices in virtio-blk
> is not the most up to date version. The merged patch can be found at
> 
> https://lore.kernel.org/linux-block/20221016034127.330942-3-dmitry.fomic...@wdc.com/
> 
> , but the latest and reviewed version is
> 
> https://lore.kernel.org/linux-block/20221110053952.3378990-3-dmitry.fomic...@wdc.com/

What happened here is that it was not sent to correct people or lists.

v2 happened to draw my attention by chance, I missed the
interface change and I did not see later ones and merged v2.

To: Jens Axboe ,
linux-bl...@vger.kernel.org,
Damien Le Moal ,
Stefan Hajnoczi ,
Hannes Reinecke , Sam Li 
Cc: virtio-...@lists.oasis-open.org,
Dmitry Fomichev 

while:

$ ./scripts/get_maintainer.pl -f drivers/block/virtio_blk.c
"Michael S. Tsirkin"  (maintainer:VIRTIO CORE AND NET DRIVERS)
Jason Wang  (maintainer:VIRTIO CORE AND NET DRIVERS)
Paolo Bonzini  (reviewer:VIRTIO BLOCK AND SCSI DRIVERS)
Stefan Hajnoczi  (reviewer:VIRTIO BLOCK AND SCSI DRIVERS)
Jens Axboe  (maintainer:BLOCK LAYER)
virtualization@lists.linux-foundation.org (open list:VIRTIO CORE AND NET 
DRIVERS)
linux-bl...@vger.kernel.org (open list:BLOCK LAYER)
linux-ker...@vger.kernel.org (open list)




> The differences between the two are mostly cleanups, but there is one
> change that is very important in terms of compatibility with the
> approved virtio-zbd specification.
> 
> Before it was approved, the OASIS virtio spec had a change in
> VIRTIO_BLK_T_ZONE_APPEND request layout that is not reflected in the
> current virtio-blk driver code. In the running code, the status is
> the first byte of the in-header that is followed by some pad bytes
> and the u64 that carries the sector at which the data has been written
> to the zone back to the driver, aka the append sector.
> 
> This layout turned out to be problematic for implementing in QEMU and
> the request status byte has been eventually made the last byte of the
> in-header. The current code doesn't expect that and this causes the
> append sector value always come as zero to the block layer. This needs
> to be fixed ASAP.
> 
> Fixes: 95bfec41bd3d ("virtio-blk: add support for zoned block devices")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Dmitry Fomichev 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 
> ---
>  drivers/block/virtio_blk.c  | 238 +---
>  include/uapi/linux/virtio_blk.h |  18 +--
>  2 files changed, 166 insertions(+), 90 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 2723eede6f21..4f0dbbb3d4a5 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -96,16 +96,14 @@ struct virtblk_req {
>  
>   /*
>* The zone append command has an extended in header.
> -  * The status field in zone_append_in_hdr must have
> -  * the same offset in virtblk_req as the non-zoned
> -  * status field above.
> +  * The status field in zone_append_in_hdr must always
> +  * be the last byte.
>*/
>   struct {
> + __virtio64 sector;
>   u8 status;
> - u8 reserved[7];
> - __le64 append_sector;
> - } zone_append_in_hdr;
> - };
> + } zone_append;
> + } in_hdr;
>  
>   size_t in_hdr_len;
>  
> @@ -154,7 +152,7 @@ static int virtblk_add_req(struct virtqueue *vq, struct 
> virtblk_req *vbr)
>   sgs[num_out + num_in++] = vbr->sg_table.sgl;
>   }
>  
> - sg_init_one(&in_hdr, &vbr->status, vbr->in_hdr_len);
> + sg_init_one(&in_hdr, &vbr->in_hdr.status, vbr->in_hdr_len);
>   sgs[num_out + num_in++] = &in_hdr;
>  
>   return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
> @@ -242,11 +240,14 @@ static blk_status_t virtblk_setup_cmd(struct 
> virtio_device *vdev,
> struct request *req,
> struct virtblk_req *vbr)
>  {
> - size_t in_hdr_len = sizeof(vbr->status);
> + size_t in_hdr_len = sizeof(vbr->in_hdr.status);
>   bool unmap = false;
>   u32 type;
>   u64 sector = 0;
>  
> + if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED) && op_is_zone_mgmt(req_op(req)))
> + return BLK_STS_NOTSUPP;
> +
>   /* Set fields for all request types */
>   vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req));
>  
> @@ -287,7 +288,7 @@ static blk_status_t virtblk_setup_cmd(struct 
> virtio_device *vdev,
>   case REQ_OP_ZONE_APPEND:
>   type = VIRTIO_BLK_T_ZONE_APPEND;
>   sector = blk_rq_pos(req);
> - in_hdr_len = sizeof(vbr->zone_append_in_hdr);
> + in_hdr_len = sizeof(vbr->in_hdr.zone_append);
>  

Re: [PATCH v2 1/2] virtio-blk: migrate to the latest patchset version

2023-04-03 Thread Michael S. Tsirkin
On Mon, Apr 03, 2023 at 08:15:26AM -0700, Christoph Hellwig wrote:
> What is "migrate to the latest patchset version" even supposed to mean?
> 
> I don't think that belongs into a kernel commit description.

I think this should be something like "update to latest interface version".

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/1] posted interrtups support for mlx5

2023-04-03 Thread Parav Pandit via Virtualization




On 4/3/2023 12:20 PM, Eli Cohen wrote:

Hi,

the single patch in this series adds support for posted interrupts in
mlx5.

It depends on the patch series already accpted can be seen here:
https://patchwork.kernel.org/project/netdevbpf/patch/20230324231341.29808-1-sa...@kernel.org/

git pull  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
tags/mlx5-updates-2023-03-20

Eli Cohen (1):
   vdpa/mlx5: Support interrupt bypassing

  drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
  drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
  2 files changed, 144 insertions(+), 9 deletions(-)


In subject

s/interrtups/interrupts/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vdpa/mlx5: Support interrupt bypassing

2023-04-03 Thread Parav Pandit via Virtualization




On 4/3/2023 12:20 PM, Eli Cohen wrote:

Add support for generation of interrupts from the device directly to the
VM to the VCPU thus avoiding the overhead on the host CPU.

When supported, the driver will attempt to allocate vectors for each
data virtqueue. If a vector for a virtqueue cannot be provided it will
use the QP mode where notifications go through the driver.

In addition, we add a shutdown callback to make sure allocated
interrupts are released in case of shutdown to allow clean shutdown.

Signed-off-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
  drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
  2 files changed, 144 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 520646ae7fa0..215a46cf8a98 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -83,6 +83,7 @@ struct mlx5_vq_restore_info {
u64 driver_addr;
u16 avail_index;
u16 used_index;
+   struct msi_map map;
bool ready;
bool restore;
  };
@@ -118,6 +119,7 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+   struct msi_map map;
  
  	/* keep last in the struct */

struct mlx5_vq_restore_info ri;
@@ -792,6 +794,13 @@ static bool counters_supported(const struct mlx5_vdpa_dev 
*mvdev)
   BIT_ULL(MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS);
  }
  
+static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev)

+{
Better to have const struct. It makes it clear that it is read only API 
and there is no accidental modification of hca cap or other fields.



+   return (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, event_mode) &
+   (1 << MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE) &&
+   pci_msix_can_alloc_dyn(mvdev->mdev->pdev));
+}
+
  static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
  {
int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
@@ -829,9 +838,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
if (vq_is_tx(mvq->index))
MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, 
ndev->res.tisn);
  
-	MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);

+   if (mvq->map.virq) {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->map.index);
+   } else {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, 
mvq->fwqp.mqp.qpn);
+   }
+
MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
-   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
 !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
@@ -1174,6 +1189,32 @@ static void counter_set_dealloc(struct mlx5_vdpa_net 
*ndev, struct mlx5_vdpa_vir
mlx5_vdpa_warn(&ndev->mvdev, "dealloc counter set 0x%x\n", 
mvq->counter_set_id);
  }
  
+static void alloc_vector(struct mlx5_vdpa_net *ndev,

+struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++) {
+   if (!irqp->entries[i].usecount) {

The usage appears to be boolean.
So better to rename it from usecount to -> used.
used = true/false;

In future, if you plan to reuse the same vector, may be at that point 
usecount makes more sense.



+   irqp->entries[i].usecount++;
+   mvq->map = irqp->entries[i].map;
+   return;
+   }
+   }
+}
+
+static void dealloc_vector(struct mlx5_vdpa_net *ndev,
+  struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = &ndev->irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++)
+   if (mvq->map.virq == irqp->entries[i].map.virq)
+   irqp->entries[i].usecount--;
You should add return here too like alloc to not go over rest of the 
entries once a specific vq is taken care.



+}
+
  static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
  {
u16 idx = mvq->index;
@@ -1203,27 +1244,31 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
  
  	err = counter_set_alloc(ndev, mvq);

if (err)
-   goto err_counter;
+   goto err_connect;
  
+	alloc_vector(ndev, mvq);

err = create_virtqueue(ndev, mvq);
if (err)
-   goto err_connect;
+   goto err_vq;
  
  	if (mvq->ready)

Re: [PATCH v2 1/2] virtio-blk: migrate to the latest patchset version

2023-04-03 Thread Christoph Hellwig
What is "migrate to the latest patchset version" even supposed to mean?

I don't think that belongs into a kernel commit description.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] drm/virtio: Support sync objects

2023-04-03 Thread Emil Velikov
On Mon, 3 Apr 2023 at 14:00, Emil Velikov  wrote:

> > > I think we should zero num_(in|out)_syncobjs when the respective parse
> > > fails. Otherwise we get one "cleanup" within the parse function itself
> > > and a second during the cleanup_submit. Haven't looked at it too closely
> > > but I suspect that will trigger an UAF or two.
> >
> > There are checks for NULL pointers in the code that will prevent the
> > UAF.  I'll add zeroing of the nums for more consistency.
> >
>
> Rght the drm_syncobj is attached to the encapsulating struct
> virtio_gpu_submit _only_ on success.
> By clearing the num variables,  the NULL checks will no longer be
> needed ... in case you'd want to drop that.
>
> Either way - even as-is the code is safe.
>

Err or not. The NULL check itself will cause NULL pointer deref.

In more detail: in/out syncobjs are memset() to NULL in
virtio_gpu_init_submit(). The virtio_gpu_parse_(|post_)deps() will
fail and leave them unchanged. Then virtio_gpu_free_syncobjs() and
virtio_gpu_reset_syncobjs() will trigger a NULL ptr deref, because
they are accessing the elements of a the (NULL) array.

Apart from the num_(in|out)_syncobjcs=0, I would drop the NULL checks
- they give a false sense of security IMHO.

-Emil
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 2/2] drm/virtio: Support sync objects

2023-04-03 Thread Emil Velikov
On Sun, 2 Apr 2023 at 18:45, Dmitry Osipenko
 wrote:
>
> On 3/30/23 20:24, Emil Velikov wrote:
> > Hi Dmitry,
> >
> > Have you considered creating a few DRM helpers for this functionality?
> >
> > AFAICT this is the third driver which supports syncobj timelines and
> > looking at one of the implementations ... it is not great. Note that
> > this suggestion is _not_ a blocker.
>
> Would like to see a third driver starting to use the exactly same
> drm_execbuffer_syncobj struct because UABI part isn't generic, though
> it's a replica of the MSM driver for now.
>
> The virtio-gpu is only at the beginning of starting to use sync objects,
> compared to MSM driver. Will be better to defer the generalization until
> virtio-gpu will become more mature, like maybe after a year since the
> time virtio userspace will start using sync objects, IMO.
>

I wasn't talking about generic UAPI, but having drm helpers instead.
The former (as you pointed out) would need time to crystallize. While
the latter can be done even today.

> ...
> >> +static void virtio_gpu_reset_syncobjs(struct drm_syncobj **syncobjs,
> >> +  uint32_t nr_syncobjs)
> >> +{
> >> +uint32_t i;
> >> +
> >> +for (i = 0; i < nr_syncobjs; i++) {
> >> +if (syncobjs[i])
> >> +drm_syncobj_replace_fence(syncobjs[i], NULL);
> >
> > Side note: the drm_syncobj_put() called immediately after also calls
> > replace/reset fence internally. Although reading from the docs, I'm not
> > sure if relying on that is a wise move.
> >
> > Just thought I'd point it out.
>
> The drm_syncobj_put() doesn't call replace/reset fence until syncobj is
> freed. We drop the old fence for active/alive in-syncobj here after
> handling the fence-wait, this makes syncobj reusable, otherwise
> userpsace would have to re-create syncobjs after each submission.
>

I see, thanks.

> >>
> >> +ret = virtio_gpu_parse_deps(&submit);
> >> +if (ret)
> >> +goto cleanup;
> >> +
> >> +ret = virtio_gpu_parse_post_deps(&submit);
> >> +if (ret)
> >> +goto cleanup;
> >> +
> >
> > I think we should zero num_(in|out)_syncobjs when the respective parse
> > fails. Otherwise we get one "cleanup" within the parse function itself
> > and a second during the cleanup_submit. Haven't looked at it too closely
> > but I suspect that will trigger an UAF or two.
>
> There are checks for NULL pointers in the code that will prevent the
> UAF.  I'll add zeroing of the nums for more consistency.
>

Rght the drm_syncobj is attached to the encapsulating struct
virtio_gpu_submit _only_ on success.
By clearing the num variables,  the NULL checks will no longer be
needed ... in case you'd want to drop that.

Either way - even as-is the code is safe.

> >>  ret = virtio_gpu_install_out_fence_fd(&submit);
> >>  if (ret)
> >>  goto cleanup;
> >> @@ -294,6 +512,7 @@ int virtio_gpu_execbuffer_ioctl(struct drm_device 
> >> *dev, void *data,
> >>  goto cleanup;
> >>
> >>  virtio_gpu_submit(&submit);
> >> +virtio_gpu_process_post_deps(&submit);
> >
> > Any particular reason why the virtio_gpu_reset_syncobjs is deferred to
> > virtio_gpu_cleanup_submit(). Having it just above the process_post_deps
> > (similar to msm) allows the reader to get closure about the in syncobjs.
> >
> > This is just personal preference, so don't read too much into it.
>
> The job submission path should be short as possible in general.
> Technically, virtio_gpu_process_post_deps() should be fast, but since
> I'm not 100% sure about all the corner cases, it's better to hold until
> job is sent out.
>

Ack, thanks again

-Emil
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v4 2/3] vsock: return errors other than -ENOMEM to socket

2023-04-03 Thread Stefano Garzarella

On Sun, Apr 02, 2023 at 09:16:46PM +0300, Arseniy Krasnov wrote:

This removes behaviour, where error code returned from any transport
was always switched to ENOMEM. This works in the same way as:
commit
c43170b7e157 ("vsock: return errors other than -ENOMEM to socket"),
but for receive calls.

Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/af_vsock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)


LGTM!

Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 5f2dda35c980..413407bb646c 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -2043,7 +2043,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct 
msghdr *msg,

read = transport->stream_dequeue(vsk, msg, len - copied, flags);
if (read < 0) {
-   err = -ENOMEM;
+   err = read;
break;
}

@@ -2094,7 +2094,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, 
struct msghdr *msg,
msg_len = transport->seqpacket_dequeue(vsk, msg, flags);

if (msg_len < 0) {
-   err = -ENOMEM;
+   err = msg_len;
goto out;
}

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v4 1/3] vsock/vmci: convert VMCI error code to -ENOMEM on receive

2023-04-03 Thread Stefano Garzarella

On Sun, Apr 02, 2023 at 09:15:49PM +0300, Arseniy Krasnov wrote:

This adds conversion of VMCI specific error code to general -ENOMEM. It
is preparation for the next patch, which changes af_vsock.c behaviour
on receive to pass value returned from transport to the user.

Signed-off-by: Arseniy Krasnov 
Reviewed-by: Vishnu Dasa 
---
net/vmw_vsock/vmci_transport.c | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)


LGTM!

Reviewed-by: Stefano Garzarella 



diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 36eb16a40745..a5375c97f5b0 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -1831,10 +1831,17 @@ static ssize_t vmci_transport_stream_dequeue(
size_t len,
int flags)
{
+   ssize_t err;
+
if (flags & MSG_PEEK)
-   return vmci_qpair_peekv(vmci_trans(vsk)->qpair, msg, len, 0);
+   err = vmci_qpair_peekv(vmci_trans(vsk)->qpair, msg, len, 0);
else
-   return vmci_qpair_dequev(vmci_trans(vsk)->qpair, msg, len, 0);
+   err = vmci_qpair_dequev(vmci_trans(vsk)->qpair, msg, len, 0);
+
+   if (err < 0)
+   err = -ENOMEM;
+
+   return err;
}

static ssize_t vmci_transport_stream_enqueue(
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v4 0/3] Add support for sockmap to vsock.

2023-04-03 Thread Stefano Garzarella

On Fri, Mar 31, 2023 at 06:06:10PM -0700, John Fastabend wrote:

Bobby Eshleman wrote:

We're testing usage of vsock as a way to redirect guest-local UDS
requests to the host and this patch series greatly improves the
performance of such a setup.

Compared to copying packets via userspace, this improves throughput by
121% in basic testing.

Tested as follows.

Setup: guest unix dgram sender -> guest vsock redirector -> host vsock
   server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
  "socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
  redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs

*Note: these numbers are from RFC v1

Only the virtio transport has been tested. The loopback transport was
used in writing bpf/selftests, but not thoroughly tested otherwise.

This series requires the skb patch.


Appears reasonable to me although I didn't review internals of all
the af_vsock stuff. I see it got merged great.


Thanks for checking!



One nit, I have a series coming shortly to pull the tests out of
the sockmap_listen and into a sockmap_vsock because I don't think they
belong in _listen but that is just a refactor.



LGTM!

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/5] vDPA/ifcvf: implement immediate initialization mechanism

2023-04-03 Thread Zhu, Lingshan



On 4/3/2023 1:28 PM, Jason Wang wrote:

On Fri, Mar 31, 2023 at 8:49 PM Zhu Lingshan  wrote:

Formerly, ifcvf driver has implemented a lazy-initialization mechanism
for the virtqueues and other config space contents,
it would store all configurations that passed down from the userspace,
then load them to the device config space upon DRIVER_OK.

This can not serve live migration, so this series implement an
immediate initialization mechanism, which means rather than the
former store-load process, the virtio operations like vq ops
would take immediate actions by access the virtio registers.

Is there any chance that ifcvf can use virtio_pci_modern_dev library?

Then we don't need to duplicate the codes.

Note that pds_vdpa will be the second user for virtio_pci_modern_dev
library (and the first vDPA parent to use that library).

Yes I agree this library can help a lot for a standard virtio pci device.
But this change would be huge, its like require to change every line of
the driver. For example current driver functions work on the adapter and
ifcvf_hw, if we wants to reuse the lib, we need the driver work on 
struct virtio_pci_modern_device.

Almost need to re-write the driver.

Can we plan this huge change in following series?

Thanks,
Zhu Lingshan


Thanks


This series also implement irq synchronization in the reset
routine

Zhu Lingshan (5):
   virt queue ops take immediate actions
   get_driver_features from virito registers
   retire ifcvf_start_datapath and ifcvf_add_status
   synchronize irqs in the reset routine
   a vendor driver should not set _CONFIG_S_FAILED

  drivers/vdpa/ifcvf/ifcvf_base.c | 162 +++-
  drivers/vdpa/ifcvf/ifcvf_base.h |  16 ++--
  drivers/vdpa/ifcvf/ifcvf_main.c |  97 ---
  3 files changed, 122 insertions(+), 153 deletions(-)

--
2.39.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH resend 1/2] vdpa/snet: support getting and setting VQ state

2023-04-03 Thread Alvaro Karsz
> I'm not sure how a spinlock can help in this case.
> The entire control mechanism is protected by the spinlock snet->ctrl_lock, so 
> multiple CPUs won't use it simultaneously.

Sorry, snet->ctrl_lock is a mutex, not a spinlock.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH resend 1/2] vdpa/snet: support getting and setting VQ state

2023-04-03 Thread Alvaro Karsz
Hi Jason,

> > +   /* Overwrite the control register with the new buffer size (in 4B 
> > words) */
> > +   snet_write_ctrl(regs, buf_words);
> > +   /* Use a memory barrier, this must be written before the opcode 
> > register. */
> > +   wmb();
> 
> 
> At least you need to use smp_wmb() but if you want to serialize MMIO
> writes you can simply use spinlocks after the removal of mmiowb
> work[1].
> 

I'm not sure how a spinlock can help in this case.
The entire control mechanism is protected by the spinlock snet->ctrl_lock, so 
multiple CPUs won't use it simultaneously.

> Note that Documentation/memory-barriers.txt said:
> 
> 1. All readX() and writeX() accesses to the same peripheral are 
> ordered
>with respect to each other. This ensures that MMIO register 
> accesses
>by the same CPU thread to a particular device will arrive in 
> program
>order.
> 

So it will arrive to the pci subsystem in program order, but the pci subsystem 
may reorder, right?
I can just issue a read after the writes, something like:

write_control
read_control
write op
read op

What do you think?

> > +   /* Clear the control register - clear the error code if previous 
> > control operation failed */
> > +   snet_write_ctrl(regs, 0);
> > +
> > +   /* Write opcode and VQ idx */
> > +   val = opcode | (vq_idx << 16);
> > +   snet_write_op(regs, val);
> 
> I guess we need to serialize two writes here as well.

Same here.

Thanks
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Patch v3] vdpa/mlx5: Avoid losing link state updates

2023-04-03 Thread Jason Wang
On Mon, Apr 3, 2023 at 2:47 PM Eli Cohen  wrote:
>
>
> On 03/04/2023 8:01, Jason Wang wrote:
> > On Sun, Apr 2, 2023 at 10:15 PM Eli Cohen  wrote:
> >> Current code ignores link state updates if VIRTIO_NET_F_STATUS was not
> >> negotiated. However, link state updates could be received before feature
> >> negotiation was completed , therefore causing link state events to be
> >> lost, possibly leaving the link state down.
> >>
> >> Modify the code so link state notifier is registered only when
> >> VIRTIO_NET_F_STATUS flips from 0 to 1 and unregister it on driver reset
> >> or suspend.
> >>
> >> Fixes: 033779a708f0 ("vdpa/mlx5: make MTU/STATUS presence conditional on 
> >> feature bits")
> >> Signed-off-by: Eli Cohen 
> >> ---
> >> v2 -> v3
> >> Only register the link event notifier when VIRTIO_NET_F_STATUS is
> >> negotiated.
> >>
> >>   drivers/vdpa/mlx5/net/mlx5_vnet.c | 200 +-
> >>   1 file changed, 112 insertions(+), 88 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> >> b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> index 317cef9b7813..9b1432e22540 100644
> >> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> >> @@ -2322,10 +2322,115 @@ static void update_cvq_info(struct mlx5_vdpa_dev 
> >> *mvdev)
> >>  }
> >>   }
> >>
> >> +static u8 query_vport_state(struct mlx5_core_dev *mdev, u8 opmod, u16 
> >> vport)
> >> +{
> >> +   u32 out[MLX5_ST_SZ_DW(query_vport_state_out)] = {};
> >> +   u32 in[MLX5_ST_SZ_DW(query_vport_state_in)] = {};
> >> +   int err;
> >> +
> >> +   MLX5_SET(query_vport_state_in, in, opcode, 
> >> MLX5_CMD_OP_QUERY_VPORT_STATE);
> >> +   MLX5_SET(query_vport_state_in, in, op_mod, opmod);
> >> +   MLX5_SET(query_vport_state_in, in, vport_number, vport);
> >> +   if (vport)
> >> +   MLX5_SET(query_vport_state_in, in, other_vport, 1);
> >> +
> >> +   err = mlx5_cmd_exec_inout(mdev, query_vport_state, in, out);
> >> +   if (err)
> >> +   return 0;
> >> +
> >> +   return MLX5_GET(query_vport_state_out, out, state);
> >> +}
> >> +
> >> +static bool get_link_state(struct mlx5_vdpa_dev *mvdev)
> >> +{
> >> +   if (query_vport_state(mvdev->mdev, 
> >> MLX5_VPORT_STATE_OP_MOD_VNIC_VPORT, 0) ==
> >> +   VPORT_STATE_UP)
> >> +   return true;
> >> +
> >> +   return false;
> >> +}
> >> +
> >> +static void update_carrier(struct work_struct *work)
> >> +{
> >> +   struct mlx5_vdpa_wq_ent *wqent;
> >> +   struct mlx5_vdpa_dev *mvdev;
> >> +   struct mlx5_vdpa_net *ndev;
> >> +
> >> +   wqent = container_of(work, struct mlx5_vdpa_wq_ent, work);
> >> +   mvdev = wqent->mvdev;
> >> +   ndev = to_mlx5_vdpa_ndev(mvdev);
> >> +   if (get_link_state(mvdev))
> >> +   ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, 
> >> VIRTIO_NET_S_LINK_UP);
> >> +   else
> >> +   ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, 
> >> ~VIRTIO_NET_S_LINK_UP);
> >> +
> >> +   if (ndev->nb_registered && ndev->config_cb.callback)
> > It looks to me nb_registered is accessed without synchronization. Or
> > we don't even need to check that if we do:
> >
> > unregister();
> > flush_workqueue();
> >
> > which has been done in unregister_link_notifier().
> >
> >> +   ndev->config_cb.callback(ndev->config_cb.private);
> >> +
> >> +   kfree(wqent);
> >> +}
> >> +
> >> +static int queue_link_work(struct mlx5_vdpa_net *ndev)
> >> +{
> >> +   struct mlx5_vdpa_wq_ent *wqent;
> >> +
> >> +   wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC);
> >> +   if (!wqent)
> >> +   return -ENOMEM;
> >> +
> >> +   wqent->mvdev = &ndev->mvdev;
> >> +   INIT_WORK(&wqent->work, update_carrier);
> >> +   queue_work(ndev->mvdev.wq, &wqent->work);
> >> +   return 0;
> >> +}
> >> +
> >> +static int event_handler(struct notifier_block *nb, unsigned long event, 
> >> void *param)
> >> +{
> >> +   struct mlx5_vdpa_net *ndev = container_of(nb, struct 
> >> mlx5_vdpa_net, nb);
> >> +   struct mlx5_eqe *eqe = param;
> >> +   int ret = NOTIFY_DONE;
> >> +
> >> +   if (event == MLX5_EVENT_TYPE_PORT_CHANGE) {
> >> +   switch (eqe->sub_type) {
> >> +   case MLX5_PORT_CHANGE_SUBTYPE_DOWN:
> >> +   case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE:
> >> +   if (queue_link_work(ndev))
> >> +   return NOTIFY_DONE;
> >> +
> >> +   ret = NOTIFY_OK;
> >> +   break;
> >> +   default:
> >> +   return NOTIFY_DONE;
> >> +   }
> >> +   return ret;
> >> +   }
> >> +   return ret;
> >> +}
> >> +
> >> +static void register_link_notifier(struct mlx5_vdpa_net *ndev)
> >> +{
> >> +   ndev->nb.notifier_call = event_handler;
> >> +   mlx5_notifier_register(ndev->mvdev.mdev, &ndev->nb);
> >> +   ndev->nb