Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value

2022-02-03 Thread Michael S. Tsirkin
On Thu, Feb 03, 2022 at 10:51:19AM +, Cristian Marussi wrote:
> On Tue, Feb 01, 2022 at 01:27:38PM -0500, Michael S. Tsirkin wrote:
> > Looks correct, thanks. Some minor comments below:
> > 
> 
> Hi Michael,
> 
> thanks for the feedback.
> 
> > On Tue, Feb 01, 2022 at 05:15:55PM +, Cristian Marussi wrote:
> > > Exported API virtqueue_poll() can be used to support polling mode 
> > > operation
> > > on top of virtio layer if needed; currently the parameter last_used_idx is
> > > the opaque value that needs to be passed to the virtqueue_poll() function
> > > to check if there are new pending used buffers in the queue: such opaque
> > > value would have been previously obtained by a call to the API function
> > > virtqueue_enable_cb_prepare().
> > > 
> > > Since such opaque value is indeed containing simply a snapshot in time of
> > > the internal
> > 
> > to add: 16 bit
> > 
> > > last_used_index (roughly), it is possible that,
> > 
> > to add here: 
> > 
> > if another thread calls virtqueue_add_*()
> > at the same time (which existing drivers don't do,
> > but does not seem to be documented as prohibited anywhere), and
> > 
> > > if exactly
> > > 2**16 buffers are marked as used between two successive calls to
> > > virtqueue_poll(), the caller is fooled into thinking that nothing is
> > > pending (ABA problem).
> > > Keep a full fledged internal wraps counter
> > 
> > s/full fledged/a 16 bit/
> > 
> > since I don't see why is a 16 bit counter full but not e.g. a 32 bit one
> > 
> .. :D I wanted to stress the fact that this being a 16bits counter has a
> higher rollover than a 1-bit one wrap_counter already used...but indeed
> all are just counters at the end, it's justthe wrapround that changes...
> 
> I'll fix.
> 
> > > per virtqueue and embed it into
> > > the upper 16bits of the returned opaque value, so that the above scenario
> > > can be detected transparently by virtqueue_poll(): this way each single
> > > possible last_used_idx value is really belonging to a different wrap.
> > 
> > Just to add here: the ABA problem can in theory still happen but
> > now that's after 2^32 requests, which seems sufficient in practice.
> > 
> 
> Sure, I'll fix the commit message as above advised.
> 
> > > Cc: "Michael S. Tsirkin" 
> > > Cc: Igor Skalkin 
> > > Cc: Peter Hilber 
> > > Cc: virtualization@lists.linux-foundation.org
> > > Signed-off-by: Cristian Marussi 
> > > ---
> > > Still no perf data on this, I was wondering what exactly to measure in
> > > term of perf metrics to evaluate the impact of the rolling vq->wraps
> > > counter.
> > > ---
> > >  drivers/virtio/virtio_ring.c | 51 +---
> > >  1 file changed, 47 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 00f64f2f8b72..613ec0503509 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -12,6 +12,8 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > > +#include 
> > >  #include 
> > >  
> > >  static bool force_used_validation = false;
> > > @@ -69,6 +71,17 @@ module_param(force_used_validation, bool, 0444);
> > >  #define LAST_ADD_TIME_INVALID(vq)
> > >  #endif
> > >  
> > > +#define VRING_IDX_MASK   GENMASK(15, 0)
> > > +#define VRING_GET_IDX(opaque)\
> > > + ((u16)FIELD_GET(VRING_IDX_MASK, (opaque)))
> > > +
> > > +#define VRING_WRAPS_MASK GENMASK(31, 16)
> > > +#define VRING_GET_WRAPS(opaque)  \
> > > + ((u16)FIELD_GET(VRING_WRAPS_MASK, (opaque)))
> > > +
> > > +#define VRING_BUILD_OPAQUE(idx, wraps)   \
> > > + (FIELD_PREP(VRING_WRAPS_MASK, (wraps)) | ((idx) & VRING_IDX_MASK))
> > > +
> > 
> > Maybe prefix with VRING_POLL_  since that is the only user.
> > 
> 
> I'll do.
> 
> > 
> > >  struct vring_desc_state_split {
> > >   void *data; /* Data for callback. */
> > >   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > @@ -117,6 +130,8 @@ struct vring_virtqueue {
> > >   /* Last used index we've seen. */
> > >   u16 last_used_idx;
> > >  
> > > + u16 wraps;
> > > +
> > >   /* Hint for event idx: already triggered no need to disable. */
> > >   bool event_triggered;
> > >  
> > > @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct 
> > > virtqueue *_vq,
> > >   ret = vq->split.desc_state[i].data;
> > >   detach_buf_split(vq, i, ctx);
> > >   vq->last_used_idx++;
> > > + if (unlikely(!vq->last_used_idx))
> > > + vq->wraps++;
> > >   /* If we expect an interrupt for the next entry, tell host
> > >* by writing event index and flush out the write before
> > >* the read in the next get_buf call. */
> > 
> > So most drivers don't call virtqueue_poll.
> > Concerned about the overhead here: another option is
> > with a flag that will have to be set whenever a driver
> > wants to 

Re: [PATCH v2 3/9] [RFC] virtio_ring: Embed a wrap counter in opaque poll index value

2022-02-01 Thread Michael S. Tsirkin
Looks correct, thanks. Some minor comments below:

On Tue, Feb 01, 2022 at 05:15:55PM +, Cristian Marussi wrote:
> Exported API virtqueue_poll() can be used to support polling mode operation
> on top of virtio layer if needed; currently the parameter last_used_idx is
> the opaque value that needs to be passed to the virtqueue_poll() function
> to check if there are new pending used buffers in the queue: such opaque
> value would have been previously obtained by a call to the API function
> virtqueue_enable_cb_prepare().
> 
> Since such opaque value is indeed containing simply a snapshot in time of
> the internal

to add: 16 bit

> last_used_index (roughly), it is possible that,

to add here: 

if another thread calls virtqueue_add_*()
at the same time (which existing drivers don't do,
but does not seem to be documented as prohibited anywhere), and

> if exactly
> 2**16 buffers are marked as used between two successive calls to
> virtqueue_poll(), the caller is fooled into thinking that nothing is
> pending (ABA problem).
> Keep a full fledged internal wraps counter

s/full fledged/a 16 bit/

since I don't see why is a 16 bit counter full but not e.g. a 32 bit one

> per virtqueue and embed it into
> the upper 16bits of the returned opaque value, so that the above scenario
> can be detected transparently by virtqueue_poll(): this way each single
> possible last_used_idx value is really belonging to a different wrap.

Just to add here: the ABA problem can in theory still happen but
now that's after 2^32 requests, which seems sufficient in practice.

> Cc: "Michael S. Tsirkin" 
> Cc: Igor Skalkin 
> Cc: Peter Hilber 
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Cristian Marussi 
> ---
> Still no perf data on this, I was wondering what exactly to measure in
> term of perf metrics to evaluate the impact of the rolling vq->wraps
> counter.
> ---
>  drivers/virtio/virtio_ring.c | 51 +---
>  1 file changed, 47 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 00f64f2f8b72..613ec0503509 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  static bool force_used_validation = false;
> @@ -69,6 +71,17 @@ module_param(force_used_validation, bool, 0444);
>  #define LAST_ADD_TIME_INVALID(vq)
>  #endif
>  
> +#define VRING_IDX_MASK   GENMASK(15, 0)
> +#define VRING_GET_IDX(opaque)\
> + ((u16)FIELD_GET(VRING_IDX_MASK, (opaque)))
> +
> +#define VRING_WRAPS_MASK GENMASK(31, 16)
> +#define VRING_GET_WRAPS(opaque)  \
> + ((u16)FIELD_GET(VRING_WRAPS_MASK, (opaque)))
> +
> +#define VRING_BUILD_OPAQUE(idx, wraps)   \
> + (FIELD_PREP(VRING_WRAPS_MASK, (wraps)) | ((idx) & VRING_IDX_MASK))
> +

Maybe prefix with VRING_POLL_  since that is the only user.


>  struct vring_desc_state_split {
>   void *data; /* Data for callback. */
>   struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> @@ -117,6 +130,8 @@ struct vring_virtqueue {
>   /* Last used index we've seen. */
>   u16 last_used_idx;
>  
> + u16 wraps;
> +
>   /* Hint for event idx: already triggered no need to disable. */
>   bool event_triggered;
>  
> @@ -806,6 +821,8 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue 
> *_vq,
>   ret = vq->split.desc_state[i].data;
>   detach_buf_split(vq, i, ctx);
>   vq->last_used_idx++;
> + if (unlikely(!vq->last_used_idx))
> + vq->wraps++;
>   /* If we expect an interrupt for the next entry, tell host
>* by writing event index and flush out the write before
>* the read in the next get_buf call. */

So most drivers don't call virtqueue_poll.
Concerned about the overhead here: another option is
with a flag that will have to be set whenever a driver
wants to use virtqueue_poll.
Could you pls do a quick perf test e.g. using tools/virtio/
to see what's faster?



> @@ -1508,6 +1525,7 @@ static void *virtqueue_get_buf_ctx_packed(struct 
> virtqueue *_vq,
>   if (unlikely(vq->last_used_idx >= vq->packed.vring.num)) {
>   vq->last_used_idx -= vq->packed.vring.num;
>   vq->packed.used_wrap_counter ^= 1;
> + vq->wraps++;
>   }
>  
>   /*
> @@ -1744,6 +1762,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   vq->weak_barriers = weak_barriers;
>   vq->broken = false;
>   vq->last_used_idx = 0;
> + vq->wraps = 0;
>   vq->event_triggered = false;
>   vq->num_added = 0;
>   vq->packed_ring = true;
> @@ -2092,13 +2111,17 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>   */
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  {
>