Re: [Intel-gfx] [PATCH] drm/i915/perf: don't read head/tail pointers outside critical section

2020-03-30 Thread Dixit, Ashutosh
On Mon, 30 Mar 2020 09:38:23 -0700, Chris Wilson wrote:
>
> Quoting Dixit, Ashutosh (2020-03-30 16:55:32)
> > On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
> > >
> > > Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > > > Reading or writing those fields should only happen under
> > > > stream->oa_buffer.ptr_lock.
> > >
> > > Writing, yes. Reading as a pair, sure. There are other ways you can
> > > ensure that the tail/head are read as one, but fair enough.
> >
> > Sorry but I am trying to understand exactly what the purpose of
> > stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
> > consumer situation where producer updates tail and consumer updates
> > head. Since both are u32's can't those operations be done without requiring
> > a lock?
>
> > > > spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> > > >
> > > > -   return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > > > -   stream->oa_buffer.head - gtt_offset) >= 
> > > > report_size;
> > > > +   return pollin;
> >
> > In what way is the original code incorrect? As I mentioned head is u32 and
> > can be read atomically without requiring a lock? We had deliberately moved
> > this code outside the lock so as to pick up the the latest value of head if
> > it had been updated in the consumer (read).
>
> It's the pair of reads here. What's the synchronisation between the read
> of tail/head with the update? There's no sync between the reads so
> order is not determined here.
>
> So we may see the head updated for an old tail, and so think we have
> plenty to report, when in fact there's none (or someother convolution).
>
> Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then
> consume the data between head/tail (with the write doing the smp_wmb()
> after updating the data and before moving the tail). [So the normal
> usage of barriers is around access to one of tail/head (the other is
> under your control) and the shared contents.]

Ok, thanks for explanantion Chris, I think I understand how barriers are
used with ring buffers but I am still not sure if the previous code was
incorrect. It is almost consumer side code running in the producer
thread. Anyway, let's just go with this patch for now:

Acked-by: Ashutosh Dixit 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: don't read head/tail pointers outside critical section

2020-03-30 Thread Chris Wilson
Quoting Dixit, Ashutosh (2020-03-30 16:55:32)
> On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
> >
> > Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > > Reading or writing those fields should only happen under
> > > stream->oa_buffer.ptr_lock.
> >
> > Writing, yes. Reading as a pair, sure. There are other ways you can
> > ensure that the tail/head are read as one, but fair enough.
> 
> Sorry but I am trying to understand exactly what the purpose of
> stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
> consumer situation where producer updates tail and consumer updates
> head. Since both are u32's can't those operations be done without requiring
> a lock?

> > > spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> > >
> > > -   return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > > -   stream->oa_buffer.head - gtt_offset) >= 
> > > report_size;
> > > +   return pollin;
> 
> In what way is the original code incorrect? As I mentioned head is u32 and
> can be read atomically without requiring a lock? We had deliberately moved
> this code outside the lock so as to pick up the the latest value of head if
> it had been updated in the consumer (read).

It's the pair of reads here. What's the synchronisation between the read
of tail/head with the update? There's no sync between the reads so
order is not determined here.

So we may see the head updated for an old tail, and so think we have
plenty to report, when in fact there's none (or someother convolution).

Normal ringbuffer is to sample the head/tail pointers, smp_rmb(), then
consume the data between head/tail (with the write doing the smp_wmb()
after updating the data and before moving the tail). [So the normal
usage of barriers is around access to one of tail/head (the other is
under your control) and the shared contents.]
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: don't read head/tail pointers outside critical section

2020-03-30 Thread Dixit, Ashutosh
On Mon, 30 Mar 2020 03:09:20 -0700, Chris Wilson wrote:
>
> Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> > Reading or writing those fields should only happen under
> > stream->oa_buffer.ptr_lock.
>
> Writing, yes. Reading as a pair, sure. There are other ways you can
> ensure that the tail/head are read as one, but fair enough.

Sorry but I am trying to understand exactly what the purpose of
stream->oa_buffer.ptr_lock is? This is a classic ring buffer producer
consumer situation where producer updates tail and consumer updates
head. Since both are u32's can't those operations be done without requiring
a lock?

>
> > Signed-off-by: Lionel Landwerlin 
> > Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
> > ---
> >  drivers/gpu/drm/i915/i915_perf.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> > b/drivers/gpu/drm/i915/i915_perf.c
> > index c74ebac50015..ec9421f02ebd 100644
> > --- a/drivers/gpu/drm/i915/i915_perf.c
> > +++ b/drivers/gpu/drm/i915/i915_perf.c
> > @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> > u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> > int report_size = stream->oa_buffer.format_size;
> > unsigned long flags;
> > +   bool pollin;
> > u32 hw_tail;
> > u64 now;
> >
> > @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct 
> > i915_perf_stream *stream)
> > stream->oa_buffer.aging_timestamp = now;
> > }
> >
> > +   pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > + stream->oa_buffer.head - gtt_offset) >= 
> > report_size;
> > +
> > +
>
> Bonus \n
>
> Reviewed-by: Chris Wilson 
>
> > spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
> >
> > -   return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> > -   stream->oa_buffer.head - gtt_offset) >= report_size;
> > +   return pollin;

In what way is the original code incorrect? As I mentioned head is u32 and
can be read atomically without requiring a lock? We had deliberately moved
this code outside the lock so as to pick up the the latest value of head if
it had been updated in the consumer (read).
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915/perf: don't read head/tail pointers outside critical section

2020-03-30 Thread Chris Wilson
Quoting Lionel Landwerlin (2020-03-30 10:14:11)
> Reading or writing those fields should only happen under
> stream->oa_buffer.ptr_lock.

Writing, yes. Reading as a pair, sure. There are other ways you can
ensure that the tail/head are read as one, but fair enough.

> Signed-off-by: Lionel Landwerlin 
> Fixes: d1df41eb72ef ("drm/i915/perf: rework aging tail workaround")
> ---
>  drivers/gpu/drm/i915/i915_perf.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
> b/drivers/gpu/drm/i915/i915_perf.c
> index c74ebac50015..ec9421f02ebd 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -463,6 +463,7 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
> u32 gtt_offset = i915_ggtt_offset(stream->oa_buffer.vma);
> int report_size = stream->oa_buffer.format_size;
> unsigned long flags;
> +   bool pollin;
> u32 hw_tail;
> u64 now;
>  
> @@ -532,10 +533,13 @@ static bool oa_buffer_check_unlocked(struct 
> i915_perf_stream *stream)
> stream->oa_buffer.aging_timestamp = now;
> }
>  
> +   pollin = OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> + stream->oa_buffer.head - gtt_offset) >= report_size;
> +
> +

Bonus \n

Reviewed-by: Chris Wilson 

> spin_unlock_irqrestore(>oa_buffer.ptr_lock, flags);
>  
> -   return OA_TAKEN(stream->oa_buffer.tail - gtt_offset,
> -   stream->oa_buffer.head - gtt_offset) >= report_size;
> +   return pollin;

You could always leave the calculation here, and just have the read
inside.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx