[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-07 Thread Mario Kleiner
On 05/07/2015 01:56 PM, Peter Hurley wrote:
> On 05/06/2015 04:56 AM, Daniel Vetter wrote:
>> On Tue, May 05, 2015 at 11:57:42AM -0400, Peter Hurley wrote:
>>> On 05/05/2015 11:42 AM, Daniel Vetter wrote:
 On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
> On 05/04/2015 12:52 AM, Mario Kleiner wrote:
>> On 04/16/2015 03:03 PM, Daniel Vetter wrote:
>>> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
 On 04/15/2015 01:31 PM, Daniel Vetter wrote:
> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
>> Hi Daniel,
>>
>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>>> This was a bit too much cargo-culted, so lets make it solid:
>>> - vblank->count doesn't need to be an atomic, writes are always done
>>> under the protection of dev->vblank_time_lock. Switch to an 
>>> unsigned
>>> long instead and update comments. Note that atomic_read is just 
>>> a
>>> normal read of a volatile variable, so no need to audit all the
>>> read-side access specifically.
>>>
>>> - The barriers for the vblank counter seqlock weren't complete: The
>>> read-side was missing the first barrier between the counter 
>>> read and
>>> the timestamp read, it only had a barrier between the ts and the
>>> counter read. We need both.
>>>
>>> - Barriers weren't properly documented. Since barriers only work if
>>> you have them on boths sides of the transaction it's prudent to
>>> reference where the other side is. To avoid duplicating the
>>> write-side comment 3 times extract a little store_vblank() 
>>> helper.
>>> In that helper also assert that we do indeed hold
>>> dev->vblank_time_lock, since in some cases the lock is acquired 
>>> a
>>> few functions up in the callchain.
>>>
>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath 
>>> to
>>> the vblank_wait ioctl.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Mario Kleiner 
>>> Cc: Ville Syrjälä 
>>> Cc: Michel Dänzer 
>>> Signed-off-by: Daniel Vetter 
>>> ---
>>>drivers/gpu/drm/drm_irq.c | 92 
>>> ---
>>>include/drm/drmP.h|  8 +++--
>>>2 files changed, 54 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index c8a34476570a..23bfbc61a494 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
>>> drm_vblank_offdelay, int, 0600);
>>>module_param_named(timestamp_precision_usec, 
>>> drm_timestamp_precision, int, 0600);
>>>module_param_named(timestamp_monotonic, drm_timestamp_monotonic, 
>>> int, 0600);
>>>
>>> +static void store_vblank(struct drm_device *dev, int crtc,
>>> + unsigned vblank_count_inc,
>>> + struct timeval *t_vblank)
>>> +{
>>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>>> +u32 tslot;
>>> +
>>> +assert_spin_locked(&dev->vblank_time_lock);
>>> +
>>> +if (t_vblank) {
>>> +tslot = vblank->count + vblank_count_inc;
>>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>>> +}
>>> +
>>> +/*
>>> + * vblank timestamp updates are protected on the write side 
>>> with
>>> + * vblank_time_lock, but on the read side done locklessly 
>>> using a
>>> + * sequence-lock on the vblank counter. Ensure correct 
>>> ordering using
>>> + * memory barrriers. We need the barrier both before and also 
>>> after the
>>> + * counter update to synchronize with the next timestamp write.
>>> + * The read-side barriers for this are in 
>>> drm_vblank_count_and_time.
>>> + */
>>> +smp_wmb();
>>> +vblank->count += vblank_count_inc;
>>> +smp_wmb();
>>
>> The comment and the code are each self-contradictory.
>>
>> If vblank->count writes are always protected by vblank_time_lock 
>> (something I
>> did not verify but that the comment above asserts), then the 
>> trailing write
>> barrier is not required (and the assertion that it is in the comment 
>> is incorrect).
>>
>> A spin unlock operation is always a write barrier.
>
> Hm yeah. Otoh to me that's bordering on "code too clever for my own 
>>

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-07 Thread Peter Hurley
On 05/06/2015 04:56 AM, Daniel Vetter wrote:
> On Tue, May 05, 2015 at 11:57:42AM -0400, Peter Hurley wrote:
>> On 05/05/2015 11:42 AM, Daniel Vetter wrote:
>>> On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
 On 05/04/2015 12:52 AM, Mario Kleiner wrote:
> On 04/16/2015 03:03 PM, Daniel Vetter wrote:
>> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
>>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
 On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> Hi Daniel,
>
> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>> This was a bit too much cargo-culted, so lets make it solid:
>> - vblank->count doesn't need to be an atomic, writes are always done
>>under the protection of dev->vblank_time_lock. Switch to an 
>> unsigned
>>long instead and update comments. Note that atomic_read is just a
>>normal read of a volatile variable, so no need to audit all the
>>read-side access specifically.
>>
>> - The barriers for the vblank counter seqlock weren't complete: The
>>read-side was missing the first barrier between the counter read 
>> and
>>the timestamp read, it only had a barrier between the ts and the
>>counter read. We need both.
>>
>> - Barriers weren't properly documented. Since barriers only work if
>>you have them on boths sides of the transaction it's prudent to
>>reference where the other side is. To avoid duplicating the
>>write-side comment 3 times extract a little store_vblank() helper.
>>In that helper also assert that we do indeed hold
>>dev->vblank_time_lock, since in some cases the lock is acquired a
>>few functions up in the callchain.
>>
>> Spotted while reviewing a patch from Chris Wilson to add a fastpath 
>> to
>> the vblank_wait ioctl.
>>
>> Cc: Chris Wilson 
>> Cc: Mario Kleiner 
>> Cc: Ville Syrjälä 
>> Cc: Michel Dänzer 
>> Signed-off-by: Daniel Vetter 
>> ---
>>   drivers/gpu/drm/drm_irq.c | 92 
>> ---
>>   include/drm/drmP.h|  8 +++--
>>   2 files changed, 54 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index c8a34476570a..23bfbc61a494 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
>> drm_vblank_offdelay, int, 0600);
>>   module_param_named(timestamp_precision_usec, 
>> drm_timestamp_precision, int, 0600);
>>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, 
>> int, 0600);
>>
>> +static void store_vblank(struct drm_device *dev, int crtc,
>> + unsigned vblank_count_inc,
>> + struct timeval *t_vblank)
>> +{
>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>> +u32 tslot;
>> +
>> +assert_spin_locked(&dev->vblank_time_lock);
>> +
>> +if (t_vblank) {
>> +tslot = vblank->count + vblank_count_inc;
>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>> +}
>> +
>> +/*
>> + * vblank timestamp updates are protected on the write side with
>> + * vblank_time_lock, but on the read side done locklessly using 
>> a
>> + * sequence-lock on the vblank counter. Ensure correct ordering 
>> using
>> + * memory barrriers. We need the barrier both before and also 
>> after the
>> + * counter update to synchronize with the next timestamp write.
>> + * The read-side barriers for this are in 
>> drm_vblank_count_and_time.
>> + */
>> +smp_wmb();
>> +vblank->count += vblank_count_inc;
>> +smp_wmb();
>
> The comment and the code are each self-contradictory.
>
> If vblank->count writes are always protected by vblank_time_lock 
> (something I
> did not verify but that the comment above asserts), then the trailing 
> write
> barrier is not required (and the assertion that it is in the comment 
> is incorrect).
>
> A spin unlock operation is always a write barrier.

 Hm yeah. Otoh to me that's bordering on "code too clever for my own 
 good".
 That the spinlock is held I can assure. That no one goes around and 
 does
 multiple vblank updates (because somehow that code raced with the hw
 itself) I can't easily assur

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-06 Thread Daniel Vetter
On Tue, May 05, 2015 at 11:57:42AM -0400, Peter Hurley wrote:
> On 05/05/2015 11:42 AM, Daniel Vetter wrote:
> > On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
> >> On 05/04/2015 12:52 AM, Mario Kleiner wrote:
> >>> On 04/16/2015 03:03 PM, Daniel Vetter wrote:
>  On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
> > On 04/15/2015 01:31 PM, Daniel Vetter wrote:
> >> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> >>> Hi Daniel,
> >>>
> >>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>  This was a bit too much cargo-culted, so lets make it solid:
>  - vblank->count doesn't need to be an atomic, writes are always done
> under the protection of dev->vblank_time_lock. Switch to an 
>  unsigned
> long instead and update comments. Note that atomic_read is just a
> normal read of a volatile variable, so no need to audit all the
> read-side access specifically.
> 
>  - The barriers for the vblank counter seqlock weren't complete: The
> read-side was missing the first barrier between the counter read 
>  and
> the timestamp read, it only had a barrier between the ts and the
> counter read. We need both.
> 
>  - Barriers weren't properly documented. Since barriers only work if
> you have them on boths sides of the transaction it's prudent to
> reference where the other side is. To avoid duplicating the
> write-side comment 3 times extract a little store_vblank() helper.
> In that helper also assert that we do indeed hold
> dev->vblank_time_lock, since in some cases the lock is acquired a
> few functions up in the callchain.
> 
>  Spotted while reviewing a patch from Chris Wilson to add a fastpath 
>  to
>  the vblank_wait ioctl.
> 
>  Cc: Chris Wilson 
>  Cc: Mario Kleiner 
>  Cc: Ville Syrjälä 
>  Cc: Michel Dänzer 
>  Signed-off-by: Daniel Vetter 
>  ---
>    drivers/gpu/drm/drm_irq.c | 92 
>  ---
>    include/drm/drmP.h|  8 +++--
>    2 files changed, 54 insertions(+), 46 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>  index c8a34476570a..23bfbc61a494 100644
>  --- a/drivers/gpu/drm/drm_irq.c
>  +++ b/drivers/gpu/drm/drm_irq.c
>  @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
>  drm_vblank_offdelay, int, 0600);
>    module_param_named(timestamp_precision_usec, 
>  drm_timestamp_precision, int, 0600);
>    module_param_named(timestamp_monotonic, drm_timestamp_monotonic, 
>  int, 0600);
> 
>  +static void store_vblank(struct drm_device *dev, int crtc,
>  + unsigned vblank_count_inc,
>  + struct timeval *t_vblank)
>  +{
>  +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>  +u32 tslot;
>  +
>  +assert_spin_locked(&dev->vblank_time_lock);
>  +
>  +if (t_vblank) {
>  +tslot = vblank->count + vblank_count_inc;
>  +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>  +}
>  +
>  +/*
>  + * vblank timestamp updates are protected on the write side with
>  + * vblank_time_lock, but on the read side done locklessly using 
>  a
>  + * sequence-lock on the vblank counter. Ensure correct ordering 
>  using
>  + * memory barrriers. We need the barrier both before and also 
>  after the
>  + * counter update to synchronize with the next timestamp write.
>  + * The read-side barriers for this are in 
>  drm_vblank_count_and_time.
>  + */
>  +smp_wmb();
>  +vblank->count += vblank_count_inc;
>  +smp_wmb();
> >>>
> >>> The comment and the code are each self-contradictory.
> >>>
> >>> If vblank->count writes are always protected by vblank_time_lock 
> >>> (something I
> >>> did not verify but that the comment above asserts), then the trailing 
> >>> write
> >>> barrier is not required (and the assertion that it is in the comment 
> >>> is incorrect).
> >>>
> >>> A spin unlock operation is always a write barrier.
> >>
> >> Hm yeah. Otoh to me that's bordering on "code too clever for my own 
> >> good".
> >> That the spinlock is held I can assure. That no one goes around and 
> >> does
> >> multiple vblank updates (because somehow that code raced with the hw
> >> itself) I can't easily assure with a simple assert or something 
> >> si

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-05 Thread Daniel Vetter
On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
> On 05/04/2015 12:52 AM, Mario Kleiner wrote:
> > On 04/16/2015 03:03 PM, Daniel Vetter wrote:
> >> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
> >>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
>  On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> > Hi Daniel,
> >
> > On 04/15/2015 03:17 AM, Daniel Vetter wrote:
> >> This was a bit too much cargo-culted, so lets make it solid:
> >> - vblank->count doesn't need to be an atomic, writes are always done
> >>under the protection of dev->vblank_time_lock. Switch to an unsigned
> >>long instead and update comments. Note that atomic_read is just a
> >>normal read of a volatile variable, so no need to audit all the
> >>read-side access specifically.
> >>
> >> - The barriers for the vblank counter seqlock weren't complete: The
> >>read-side was missing the first barrier between the counter read and
> >>the timestamp read, it only had a barrier between the ts and the
> >>counter read. We need both.
> >>
> >> - Barriers weren't properly documented. Since barriers only work if
> >>you have them on boths sides of the transaction it's prudent to
> >>reference where the other side is. To avoid duplicating the
> >>write-side comment 3 times extract a little store_vblank() helper.
> >>In that helper also assert that we do indeed hold
> >>dev->vblank_time_lock, since in some cases the lock is acquired a
> >>few functions up in the callchain.
> >>
> >> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> >> the vblank_wait ioctl.
> >>
> >> Cc: Chris Wilson 
> >> Cc: Mario Kleiner 
> >> Cc: Ville Syrjälä 
> >> Cc: Michel Dänzer 
> >> Signed-off-by: Daniel Vetter 
> >> ---
> >>   drivers/gpu/drm/drm_irq.c | 92 
> >> ---
> >>   include/drm/drmP.h|  8 +++--
> >>   2 files changed, 54 insertions(+), 46 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >> index c8a34476570a..23bfbc61a494 100644
> >> --- a/drivers/gpu/drm/drm_irq.c
> >> +++ b/drivers/gpu/drm/drm_irq.c
> >> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
> >> drm_vblank_offdelay, int, 0600);
> >>   module_param_named(timestamp_precision_usec, 
> >> drm_timestamp_precision, int, 0600);
> >>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, 
> >> int, 0600);
> >>
> >> +static void store_vblank(struct drm_device *dev, int crtc,
> >> + unsigned vblank_count_inc,
> >> + struct timeval *t_vblank)
> >> +{
> >> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> >> +u32 tslot;
> >> +
> >> +assert_spin_locked(&dev->vblank_time_lock);
> >> +
> >> +if (t_vblank) {
> >> +tslot = vblank->count + vblank_count_inc;
> >> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> >> +}
> >> +
> >> +/*
> >> + * vblank timestamp updates are protected on the write side with
> >> + * vblank_time_lock, but on the read side done locklessly using a
> >> + * sequence-lock on the vblank counter. Ensure correct ordering 
> >> using
> >> + * memory barrriers. We need the barrier both before and also 
> >> after the
> >> + * counter update to synchronize with the next timestamp write.
> >> + * The read-side barriers for this are in 
> >> drm_vblank_count_and_time.
> >> + */
> >> +smp_wmb();
> >> +vblank->count += vblank_count_inc;
> >> +smp_wmb();
> >
> > The comment and the code are each self-contradictory.
> >
> > If vblank->count writes are always protected by vblank_time_lock 
> > (something I
> > did not verify but that the comment above asserts), then the trailing 
> > write
> > barrier is not required (and the assertion that it is in the comment is 
> > incorrect).
> >
> > A spin unlock operation is always a write barrier.
> 
>  Hm yeah. Otoh to me that's bordering on "code too clever for my own 
>  good".
>  That the spinlock is held I can assure. That no one goes around and does
>  multiple vblank updates (because somehow that code raced with the hw
>  itself) I can't easily assure with a simple assert or something similar.
>  It's not the case right now, but that can changes.
> >>>
> >>> The algorithm would be broken if multiple updates for the same vblank
> >>> count were allowed; that's why it checks to see if the vblank count has
> >>> not advanced before storing a new timestamp.
> >>>
> >>> Otherwise, the read side would not be able to determine that the
> >>> timestamp is valid 

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-05 Thread Peter Hurley
On 05/05/2015 11:57 AM, Peter Hurley wrote:
> On 05/05/2015 11:42 AM, Daniel Vetter wrote:
>> I'm also somewhat confused about how you to a line across both cpus for
>> barriers because barriers only have cpu-local effects (which is why we
>> always need a barrier on both ends of a transaction).

I'm sorry if my barrier notation confuses you; I find that it clearly
identifies matching pairs.

Also, there is a distinction between "can be visible" and "must be visible";
the load and stores themselves are not cpu-local.

Regards,
Peter Hurley




[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-05 Thread Peter Hurley
On 05/05/2015 11:42 AM, Daniel Vetter wrote:
> On Tue, May 05, 2015 at 10:36:24AM -0400, Peter Hurley wrote:
>> On 05/04/2015 12:52 AM, Mario Kleiner wrote:
>>> On 04/16/2015 03:03 PM, Daniel Vetter wrote:
 On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
>>> Hi Daniel,
>>>
>>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
 This was a bit too much cargo-culted, so lets make it solid:
 - vblank->count doesn't need to be an atomic, writes are always done
under the protection of dev->vblank_time_lock. Switch to an unsigned
long instead and update comments. Note that atomic_read is just a
normal read of a volatile variable, so no need to audit all the
read-side access specifically.

 - The barriers for the vblank counter seqlock weren't complete: The
read-side was missing the first barrier between the counter read and
the timestamp read, it only had a barrier between the ts and the
counter read. We need both.

 - Barriers weren't properly documented. Since barriers only work if
you have them on boths sides of the transaction it's prudent to
reference where the other side is. To avoid duplicating the
write-side comment 3 times extract a little store_vblank() helper.
In that helper also assert that we do indeed hold
dev->vblank_time_lock, since in some cases the lock is acquired a
few functions up in the callchain.

 Spotted while reviewing a patch from Chris Wilson to add a fastpath to
 the vblank_wait ioctl.

 Cc: Chris Wilson 
 Cc: Mario Kleiner 
 Cc: Ville Syrjälä 
 Cc: Michel Dänzer 
 Signed-off-by: Daniel Vetter 
 ---
   drivers/gpu/drm/drm_irq.c | 92 
 ---
   include/drm/drmP.h|  8 +++--
   2 files changed, 54 insertions(+), 46 deletions(-)

 diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
 index c8a34476570a..23bfbc61a494 100644
 --- a/drivers/gpu/drm/drm_irq.c
 +++ b/drivers/gpu/drm/drm_irq.c
 @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
 drm_vblank_offdelay, int, 0600);
   module_param_named(timestamp_precision_usec, 
 drm_timestamp_precision, int, 0600);
   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, 
 int, 0600);

 +static void store_vblank(struct drm_device *dev, int crtc,
 + unsigned vblank_count_inc,
 + struct timeval *t_vblank)
 +{
 +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
 +u32 tslot;
 +
 +assert_spin_locked(&dev->vblank_time_lock);
 +
 +if (t_vblank) {
 +tslot = vblank->count + vblank_count_inc;
 +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
 +}
 +
 +/*
 + * vblank timestamp updates are protected on the write side with
 + * vblank_time_lock, but on the read side done locklessly using a
 + * sequence-lock on the vblank counter. Ensure correct ordering 
 using
 + * memory barrriers. We need the barrier both before and also 
 after the
 + * counter update to synchronize with the next timestamp write.
 + * The read-side barriers for this are in 
 drm_vblank_count_and_time.
 + */
 +smp_wmb();
 +vblank->count += vblank_count_inc;
 +smp_wmb();
>>>
>>> The comment and the code are each self-contradictory.
>>>
>>> If vblank->count writes are always protected by vblank_time_lock 
>>> (something I
>>> did not verify but that the comment above asserts), then the trailing 
>>> write
>>> barrier is not required (and the assertion that it is in the comment is 
>>> incorrect).
>>>
>>> A spin unlock operation is always a write barrier.
>>
>> Hm yeah. Otoh to me that's bordering on "code too clever for my own 
>> good".
>> That the spinlock is held I can assure. That no one goes around and does
>> multiple vblank updates (because somehow that code raced with the hw
>> itself) I can't easily assure with a simple assert or something similar.
>> It's not the case right now, but that can changes.
>
> The algorithm would be broken if multiple updates for the same vblank
> count were allowed; that's why it checks to see if the vblank count has
> not advanced before storing a new timestamp.
>
> Otherwise, the read side would not be able

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-05 Thread Peter Hurley
On 05/04/2015 12:52 AM, Mario Kleiner wrote:
> On 04/16/2015 03:03 PM, Daniel Vetter wrote:
>> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
>>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
 On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> Hi Daniel,
>
> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>> This was a bit too much cargo-culted, so lets make it solid:
>> - vblank->count doesn't need to be an atomic, writes are always done
>>under the protection of dev->vblank_time_lock. Switch to an unsigned
>>long instead and update comments. Note that atomic_read is just a
>>normal read of a volatile variable, so no need to audit all the
>>read-side access specifically.
>>
>> - The barriers for the vblank counter seqlock weren't complete: The
>>read-side was missing the first barrier between the counter read and
>>the timestamp read, it only had a barrier between the ts and the
>>counter read. We need both.
>>
>> - Barriers weren't properly documented. Since barriers only work if
>>you have them on boths sides of the transaction it's prudent to
>>reference where the other side is. To avoid duplicating the
>>write-side comment 3 times extract a little store_vblank() helper.
>>In that helper also assert that we do indeed hold
>>dev->vblank_time_lock, since in some cases the lock is acquired a
>>few functions up in the callchain.
>>
>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
>> the vblank_wait ioctl.
>>
>> Cc: Chris Wilson 
>> Cc: Mario Kleiner 
>> Cc: Ville Syrjälä 
>> Cc: Michel Dänzer 
>> Signed-off-by: Daniel Vetter 
>> ---
>>   drivers/gpu/drm/drm_irq.c | 92 
>> ---
>>   include/drm/drmP.h|  8 +++--
>>   2 files changed, 54 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index c8a34476570a..23bfbc61a494 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
>> drm_vblank_offdelay, int, 0600);
>>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
>> int, 0600);
>>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
>> 0600);
>>
>> +static void store_vblank(struct drm_device *dev, int crtc,
>> + unsigned vblank_count_inc,
>> + struct timeval *t_vblank)
>> +{
>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>> +u32 tslot;
>> +
>> +assert_spin_locked(&dev->vblank_time_lock);
>> +
>> +if (t_vblank) {
>> +tslot = vblank->count + vblank_count_inc;
>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>> +}
>> +
>> +/*
>> + * vblank timestamp updates are protected on the write side with
>> + * vblank_time_lock, but on the read side done locklessly using a
>> + * sequence-lock on the vblank counter. Ensure correct ordering 
>> using
>> + * memory barrriers. We need the barrier both before and also after 
>> the
>> + * counter update to synchronize with the next timestamp write.
>> + * The read-side barriers for this are in drm_vblank_count_and_time.
>> + */
>> +smp_wmb();
>> +vblank->count += vblank_count_inc;
>> +smp_wmb();
>
> The comment and the code are each self-contradictory.
>
> If vblank->count writes are always protected by vblank_time_lock 
> (something I
> did not verify but that the comment above asserts), then the trailing 
> write
> barrier is not required (and the assertion that it is in the comment is 
> incorrect).
>
> A spin unlock operation is always a write barrier.

 Hm yeah. Otoh to me that's bordering on "code too clever for my own good".
 That the spinlock is held I can assure. That no one goes around and does
 multiple vblank updates (because somehow that code raced with the hw
 itself) I can't easily assure with a simple assert or something similar.
 It's not the case right now, but that can changes.
>>>
>>> The algorithm would be broken if multiple updates for the same vblank
>>> count were allowed; that's why it checks to see if the vblank count has
>>> not advanced before storing a new timestamp.
>>>
>>> Otherwise, the read side would not be able to determine that the
>>> timestamp is valid by double-checking that the vblank count has not
>>> changed.
>>>
>>> And besides, even if the code looped without dropping the spinlock,
>>> the correct write order would still be observed because it would still
>>> be executing on the same cpu.
>>>
>>> My objection to the write memory 

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-05-04 Thread Mario Kleiner
On 04/16/2015 03:03 PM, Daniel Vetter wrote:
> On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
>> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
>>> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
 Hi Daniel,

 On 04/15/2015 03:17 AM, Daniel Vetter wrote:
> This was a bit too much cargo-culted, so lets make it solid:
> - vblank->count doesn't need to be an atomic, writes are always done
>under the protection of dev->vblank_time_lock. Switch to an unsigned
>long instead and update comments. Note that atomic_read is just a
>normal read of a volatile variable, so no need to audit all the
>read-side access specifically.
>
> - The barriers for the vblank counter seqlock weren't complete: The
>read-side was missing the first barrier between the counter read and
>the timestamp read, it only had a barrier between the ts and the
>counter read. We need both.
>
> - Barriers weren't properly documented. Since barriers only work if
>you have them on boths sides of the transaction it's prudent to
>reference where the other side is. To avoid duplicating the
>write-side comment 3 times extract a little store_vblank() helper.
>In that helper also assert that we do indeed hold
>dev->vblank_time_lock, since in some cases the lock is acquired a
>few functions up in the callchain.
>
> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> the vblank_wait ioctl.
>
> Cc: Chris Wilson 
> Cc: Mario Kleiner 
> Cc: Ville Syrjälä 
> Cc: Michel Dänzer 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/drm_irq.c | 92 
> ---
>   include/drm/drmP.h|  8 +++--
>   2 files changed, 54 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..23bfbc61a494 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
> drm_vblank_offdelay, int, 0600);
>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
> int, 0600);
>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
> 0600);
>
> +static void store_vblank(struct drm_device *dev, int crtc,
> +  unsigned vblank_count_inc,
> +  struct timeval *t_vblank)
> +{
> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> + u32 tslot;
> +
> + assert_spin_locked(&dev->vblank_time_lock);
> +
> + if (t_vblank) {
> + tslot = vblank->count + vblank_count_inc;
> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> + }
> +
> + /*
> +  * vblank timestamp updates are protected on the write side with
> +  * vblank_time_lock, but on the read side done locklessly using a
> +  * sequence-lock on the vblank counter. Ensure correct ordering using
> +  * memory barrriers. We need the barrier both before and also after the
> +  * counter update to synchronize with the next timestamp write.
> +  * The read-side barriers for this are in drm_vblank_count_and_time.
> +  */
> + smp_wmb();
> + vblank->count += vblank_count_inc;
> + smp_wmb();

 The comment and the code are each self-contradictory.

 If vblank->count writes are always protected by vblank_time_lock 
 (something I
 did not verify but that the comment above asserts), then the trailing write
 barrier is not required (and the assertion that it is in the comment is 
 incorrect).

 A spin unlock operation is always a write barrier.
>>>
>>> Hm yeah. Otoh to me that's bordering on "code too clever for my own good".
>>> That the spinlock is held I can assure. That no one goes around and does
>>> multiple vblank updates (because somehow that code raced with the hw
>>> itself) I can't easily assure with a simple assert or something similar.
>>> It's not the case right now, but that can changes.
>>
>> The algorithm would be broken if multiple updates for the same vblank
>> count were allowed; that's why it checks to see if the vblank count has
>> not advanced before storing a new timestamp.
>>
>> Otherwise, the read side would not be able to determine that the
>> timestamp is valid by double-checking that the vblank count has not
>> changed.
>>
>> And besides, even if the code looped without dropping the spinlock,
>> the correct write order would still be observed because it would still
>> be executing on the same cpu.
>>
>> My objection to the write memory barrier is not about optimization;
>> it's about correct code.
>
> Well diff=0 is not allowed, I guess I could enforce this with some
> WARN_ON. And I still think my point of non-local correctness is solid.
> Wit

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Daniel Vetter
On Thu, Apr 16, 2015 at 08:30:55AM -0400, Peter Hurley wrote:
> On 04/15/2015 01:31 PM, Daniel Vetter wrote:
> > On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> >> Hi Daniel,
> >>
> >> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
> >>> This was a bit too much cargo-culted, so lets make it solid:
> >>> - vblank->count doesn't need to be an atomic, writes are always done
> >>>   under the protection of dev->vblank_time_lock. Switch to an unsigned
> >>>   long instead and update comments. Note that atomic_read is just a
> >>>   normal read of a volatile variable, so no need to audit all the
> >>>   read-side access specifically.
> >>>
> >>> - The barriers for the vblank counter seqlock weren't complete: The
> >>>   read-side was missing the first barrier between the counter read and
> >>>   the timestamp read, it only had a barrier between the ts and the
> >>>   counter read. We need both.
> >>>
> >>> - Barriers weren't properly documented. Since barriers only work if
> >>>   you have them on boths sides of the transaction it's prudent to
> >>>   reference where the other side is. To avoid duplicating the
> >>>   write-side comment 3 times extract a little store_vblank() helper.
> >>>   In that helper also assert that we do indeed hold
> >>>   dev->vblank_time_lock, since in some cases the lock is acquired a
> >>>   few functions up in the callchain.
> >>>
> >>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> >>> the vblank_wait ioctl.
> >>>
> >>> Cc: Chris Wilson 
> >>> Cc: Mario Kleiner 
> >>> Cc: Ville Syrjälä 
> >>> Cc: Michel Dänzer 
> >>> Signed-off-by: Daniel Vetter 
> >>> ---
> >>>  drivers/gpu/drm/drm_irq.c | 92 
> >>> ---
> >>>  include/drm/drmP.h|  8 +++--
> >>>  2 files changed, 54 insertions(+), 46 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >>> index c8a34476570a..23bfbc61a494 100644
> >>> --- a/drivers/gpu/drm/drm_irq.c
> >>> +++ b/drivers/gpu/drm/drm_irq.c
> >>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
> >>> drm_vblank_offdelay, int, 0600);
> >>>  module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
> >>> int, 0600);
> >>>  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
> >>> 0600);
> >>>  
> >>> +static void store_vblank(struct drm_device *dev, int crtc,
> >>> +  unsigned vblank_count_inc,
> >>> +  struct timeval *t_vblank)
> >>> +{
> >>> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> >>> + u32 tslot;
> >>> +
> >>> + assert_spin_locked(&dev->vblank_time_lock);
> >>> +
> >>> + if (t_vblank) {
> >>> + tslot = vblank->count + vblank_count_inc;
> >>> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> >>> + }
> >>> +
> >>> + /*
> >>> +  * vblank timestamp updates are protected on the write side with
> >>> +  * vblank_time_lock, but on the read side done locklessly using a
> >>> +  * sequence-lock on the vblank counter. Ensure correct ordering using
> >>> +  * memory barrriers. We need the barrier both before and also after the
> >>> +  * counter update to synchronize with the next timestamp write.
> >>> +  * The read-side barriers for this are in drm_vblank_count_and_time.
> >>> +  */
> >>> + smp_wmb();
> >>> + vblank->count += vblank_count_inc;
> >>> + smp_wmb();
> >>
> >> The comment and the code are each self-contradictory.
> >>
> >> If vblank->count writes are always protected by vblank_time_lock 
> >> (something I
> >> did not verify but that the comment above asserts), then the trailing write
> >> barrier is not required (and the assertion that it is in the comment is 
> >> incorrect).
> >>
> >> A spin unlock operation is always a write barrier.
> > 
> > Hm yeah. Otoh to me that's bordering on "code too clever for my own good".
> > That the spinlock is held I can assure. That no one goes around and does
> > multiple vblank updates (because somehow that code raced with the hw
> > itself) I can't easily assure with a simple assert or something similar.
> > It's not the case right now, but that can changes.
> 
> The algorithm would be broken if multiple updates for the same vblank
> count were allowed; that's why it checks to see if the vblank count has
> not advanced before storing a new timestamp.
> 
> Otherwise, the read side would not be able to determine that the
> timestamp is valid by double-checking that the vblank count has not
> changed.
> 
> And besides, even if the code looped without dropping the spinlock,
> the correct write order would still be observed because it would still
> be executing on the same cpu.
> 
> My objection to the write memory barrier is not about optimization;
> it's about correct code.

Well diff=0 is not allowed, I guess I could enforce this with some
WARN_ON. And I still think my point of non-local correctness is solid.
With the smp_wmb() removed the following still works correctly:

spin_lock(vblank_ti

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Daniel Vetter
On Thu, Apr 16, 2015 at 05:00:02AM -0400, Peter Hurley wrote:
> On 04/16/2015 02:39 AM, Mario Kleiner wrote:
> > I think i'm still not getting something about why the compiler would
> > be allowed to reorder in this way in absence of the additional
> > smp_rmb? Or is that barrier required for other archs which are less
> > strongly ordered?
> 
> Apologies for the confusion; I missed that it was data-dependent load.

Oh right missed that too. Well, alpha can do anything, we'd need at least
a read_barrier_depends(); here. Tbh not sure that's worth it since with
the plain smp_rmb() we match the seqlock code exactly.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Daniel Vetter
This was a bit too much cargo-culted, so lets make it solid:
- vblank->count doesn't need to be an atomic, writes are always done
  under the protection of dev->vblank_time_lock. Switch to an unsigned
  long instead and update comments. Note that atomic_read is just a
  normal read of a volatile variable, so no need to audit all the
  read-side access specifically.

- The barriers for the vblank counter seqlock weren't complete: The
  read-side was missing the first barrier between the counter read and
  the timestamp read, it only had a barrier between the ts and the
  counter read. We need both.

- Barriers weren't properly documented. Since barriers only work if
  you have them on boths sides of the transaction it's prudent to
  reference where the other side is. To avoid duplicating the
  write-side comment 3 times extract a little store_vblank() helper.
  In that helper also assert that we do indeed hold
  dev->vblank_time_lock, since in some cases the lock is acquired a
  few functions up in the callchain.

Spotted while reviewing a patch from Chris Wilson to add a fastpath to
the vblank_wait ioctl.

v2: Add comment to better explain how store_vblank works, suggested by
Chris.

v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
implicit barrier in the spin_unlock. But that can only be proven by
auditing all callers and my point in extracting this little helper was
to localize all the locking into just one place. Hence I think that
additional optimization is too risky.

v4: Use u32 for consistency, suggested by Mario.

Cc: Chris Wilson 
Cc: Mario Kleiner 
Cc: Ville Syrjälä 
Cc: Michel Dänzer 
Cc: Peter Hurley 
Reviewed-by: Chris Wilson  (v3)
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_irq.c | 95 +--
 include/drm/drmP.h|  8 +++-
 2 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8a34476570a..d567f031892d 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 
0600);
 module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
0600);
 module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);

+static void store_vblank(struct drm_device *dev, int crtc,
+u32 vblank_count_inc,
+struct timeval *t_vblank)
+{
+   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
+   u32 tslot;
+
+   assert_spin_locked(&dev->vblank_time_lock);
+
+   if (t_vblank) {
+   /* All writers hold the spinlock, but readers are serialized by
+* the latching of vblank->count below.
+*/
+   tslot = vblank->count + vblank_count_inc;
+   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
+   }
+
+   /*
+* vblank timestamp updates are protected on the write side with
+* vblank_time_lock, but on the read side done locklessly using a
+* sequence-lock on the vblank counter. Ensure correct ordering using
+* memory barrriers. We need the barrier both before and also after the
+* counter update to synchronize with the next timestamp write.
+* The read-side barriers for this are in drm_vblank_count_and_time.
+*/
+   smp_wmb();
+   vblank->count += vblank_count_inc;
+   smp_wmb();
+}
+
 /**
  * drm_update_vblank_count - update the master vblank counter
  * @dev: DRM device
@@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
drm_timestamp_monotonic, int, 0600);
 static void drm_update_vblank_count(struct drm_device *dev, int crtc)
 {
struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
-   u32 cur_vblank, diff, tslot;
+   u32 cur_vblank, diff;
bool rc;
struct timeval t_vblank;

@@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
*dev, int crtc)
if (diff == 0)
return;

-   /* Reinitialize corresponding vblank timestamp if high-precision query
-* available. Skip this step if query unsupported or failed. Will
-* reinitialize delayed at next vblank interrupt in that case.
+   /*
+* Only reinitialize corresponding vblank timestamp if high-precision 
query
+* available and didn't fail. Will reinitialize delayed at next vblank
+* interrupt in that case.
 */
-   if (rc) {
-   tslot = atomic_read(&vblank->count) + diff;
-   vblanktimestamp(dev, crtc, tslot) = t_vblank;
-   }
-
-   smp_mb__before_atomic();
-   atomic_add(diff, &vblank->count);
-   smp_mb__after_atomic();
+   store_vblank(dev, crtc, diff, rc ? &t_vblank : NULL);
 }

 /*
@@ -218,7 +242,7 @@ static void vblank_disable_and_save(struct drm_device *dev, 
int crtc)
/* Compute time difference to stored timestamp of la

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Daniel Vetter
On Wed, Apr 15, 2015 at 11:26:37PM +0200, Mario Kleiner wrote:
> A couple of questions to educate me and one review comment.
> 
> On 04/15/2015 07:34 PM, Daniel Vetter wrote:
> >This was a bit too much cargo-culted, so lets make it solid:
> >- vblank->count doesn't need to be an atomic, writes are always done
> >   under the protection of dev->vblank_time_lock. Switch to an unsigned
> >   long instead and update comments. Note that atomic_read is just a
> >   normal read of a volatile variable, so no need to audit all the
> >   read-side access specifically.
> >
> >- The barriers for the vblank counter seqlock weren't complete: The
> >   read-side was missing the first barrier between the counter read and
> >   the timestamp read, it only had a barrier between the ts and the
> >   counter read. We need both.
> >
> >- Barriers weren't properly documented. Since barriers only work if
> >   you have them on boths sides of the transaction it's prudent to
> >   reference where the other side is. To avoid duplicating the
> >   write-side comment 3 times extract a little store_vblank() helper.
> >   In that helper also assert that we do indeed hold
> >   dev->vblank_time_lock, since in some cases the lock is acquired a
> >   few functions up in the callchain.
> >
> >Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> >the vblank_wait ioctl.
> >
> >v2: Add comment to better explain how store_vblank works, suggested by
> >Chris.
> >
> >v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
> >implicit barrier in the spin_unlock. But that can only be proven by
> >auditing all callers and my point in extracting this little helper was
> >to localize all the locking into just one place. Hence I think that
> >additional optimization is too risky.
> >
> >Cc: Chris Wilson 
> >Cc: Mario Kleiner 
> >Cc: Ville Syrjälä 
> >Cc: Michel Dänzer 
> >Cc: Peter Hurley 
> >Signed-off-by: Daniel Vetter 
> >---
> >  drivers/gpu/drm/drm_irq.c | 95 
> > +--
> >  include/drm/drmP.h|  8 +++-
> >  2 files changed, 57 insertions(+), 46 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> >index c8a34476570a..8694b77d0002 100644
> >--- a/drivers/gpu/drm/drm_irq.c
> >+++ b/drivers/gpu/drm/drm_irq.c
> >@@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> >int, 0600);
> >  module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> > 0600);
> >  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
> > 0600);
> >
> >+static void store_vblank(struct drm_device *dev, int crtc,
> >+ unsigned vblank_count_inc,
> >+ struct timeval *t_vblank)
> >+{
> >+struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> >+u32 tslot;
> >+
> >+assert_spin_locked(&dev->vblank_time_lock);
> >+
> >+if (t_vblank) {
> >+/* All writers hold the spinlock, but readers are serialized by
> >+ * the latching of vblank->count below.
> >+ */
> >+tslot = vblank->count + vblank_count_inc;
> >+vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> >+}
> >+
> >+/*
> >+ * vblank timestamp updates are protected on the write side with
> >+ * vblank_time_lock, but on the read side done locklessly using a
> >+ * sequence-lock on the vblank counter. Ensure correct ordering using
> >+ * memory barrriers. We need the barrier both before and also after the
> >+ * counter update to synchronize with the next timestamp write.
> >+ * The read-side barriers for this are in drm_vblank_count_and_time.
> >+ */
> >+smp_wmb();
> >+vblank->count += vblank_count_inc;
> >+smp_wmb();
> >+}
> >+
> >  /**
> >   * drm_update_vblank_count - update the master vblank counter
> >   * @dev: DRM device
> >@@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
> >drm_timestamp_monotonic, int, 0600);
> >  static void drm_update_vblank_count(struct drm_device *dev, int crtc)
> >  {
> > struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> >-u32 cur_vblank, diff, tslot;
> >+u32 cur_vblank, diff;
> > bool rc;
> > struct timeval t_vblank;
> >
> >@@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
> >*dev, int crtc)
> > if (diff == 0)
> > return;
> >
> >-/* Reinitialize corresponding vblank timestamp if high-precision query
> >- * available. Skip this step if query unsupported or failed. Will
> >- * reinitialize delayed at next vblank interrupt in that case.
> >+/*
> >+ * Only reinitialize corresponding vblank timestamp if high-precision 
> >query
> >+ * available and didn't fail. Will reinitialize delayed at next vblank
> >+ * interrupt in that case.
> >  */
> >-if (rc) {
> >-tslot = atomic_read(&vblank->count) + diff;
> >-vblanktimestamp(dev, crtc, tslot) = t_

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Peter Hurley
On 04/16/2015 02:39 AM, Mario Kleiner wrote:
> On 04/16/2015 03:29 AM, Peter Hurley wrote:
>> On 04/15/2015 05:26 PM, Mario Kleiner wrote:

>> Because the time scales for these events don't require that level of
>> resolution; consider how much code has to get executed between a
>> hardware vblank irq triggering and the vblank counter being updated.
>>
>> Realistically, the only relevant requirement is that the timestamp
>> match the counter.
>>
> 
> Yes that is the really important part. A msec delay would possibly matter for 
> some timing sensitive apps like mine - some more exotic displays run at 200 
> Hz, and some apps need to synchronize to the vblank not strictly for 
> graphics. But i assume potential delays here are more on the order of a few 
> microseconds if some pending loads from the cache would get reordered for 
> overall efficiency?

I'd be surprised if the delay were as much as 1 us.

The latency to return to userspace significantly dwarfs any observable
effects having missed the vblank count update by 1 instruction.

Regards,
Peter Hurley


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Mario Kleiner
On 04/16/2015 03:29 AM, Peter Hurley wrote:
> On 04/15/2015 05:26 PM, Mario Kleiner wrote:
>> A couple of questions to educate me and one review comment.
>>
>> On 04/15/2015 07:34 PM, Daniel Vetter wrote:
>>> This was a bit too much cargo-culted, so lets make it solid:
>>> - vblank->count doesn't need to be an atomic, writes are always done
>>> under the protection of dev->vblank_time_lock. Switch to an unsigned
>>> long instead and update comments. Note that atomic_read is just a
>>> normal read of a volatile variable, so no need to audit all the
>>> read-side access specifically.
>>>
>>> - The barriers for the vblank counter seqlock weren't complete: The
>>> read-side was missing the first barrier between the counter read and
>>> the timestamp read, it only had a barrier between the ts and the
>>> counter read. We need both.
>>>
>>> - Barriers weren't properly documented. Since barriers only work if
>>> you have them on boths sides of the transaction it's prudent to
>>> reference where the other side is. To avoid duplicating the
>>> write-side comment 3 times extract a little store_vblank() helper.
>>> In that helper also assert that we do indeed hold
>>> dev->vblank_time_lock, since in some cases the lock is acquired a
>>> few functions up in the callchain.
>>>
>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
>>> the vblank_wait ioctl.
>>>
>>> v2: Add comment to better explain how store_vblank works, suggested by
>>> Chris.
>>>
>>> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
>>> implicit barrier in the spin_unlock. But that can only be proven by
>>> auditing all callers and my point in extracting this little helper was
>>> to localize all the locking into just one place. Hence I think that
>>> additional optimization is too risky.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Mario Kleiner 
>>> Cc: Ville Syrjälä 
>>> Cc: Michel Dänzer 
>>> Cc: Peter Hurley 
>>> Signed-off-by: Daniel Vetter 
>>> ---
>>>drivers/gpu/drm/drm_irq.c | 95 
>>> +--
>>>include/drm/drmP.h|  8 +++-
>>>2 files changed, 57 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index c8a34476570a..8694b77d0002 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
>>> int, 0600);
>>>module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
>>> int, 0600);
>>>module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
>>> 0600);
>>>
>>> +static void store_vblank(struct drm_device *dev, int crtc,
>>> + unsigned vblank_count_inc,
>>> + struct timeval *t_vblank)
>>> +{
>>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>>> +u32 tslot;
>>> +
>>> +assert_spin_locked(&dev->vblank_time_lock);
>>> +
>>> +if (t_vblank) {
>>> +/* All writers hold the spinlock, but readers are serialized by
>>> + * the latching of vblank->count below.
>>> + */
>>> +tslot = vblank->count + vblank_count_inc;
>>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>>> +}
>>> +
>>> +/*
>>> + * vblank timestamp updates are protected on the write side with
>>> + * vblank_time_lock, but on the read side done locklessly using a
>>> + * sequence-lock on the vblank counter. Ensure correct ordering using
>>> + * memory barrriers. We need the barrier both before and also after the
>>> + * counter update to synchronize with the next timestamp write.
>>> + * The read-side barriers for this are in drm_vblank_count_and_time.
>>> + */
>>> +smp_wmb();
>>> +vblank->count += vblank_count_inc;
>>> +smp_wmb();
>>> +}
>>> +
>>>/**
>>> * drm_update_vblank_count - update the master vblank counter
>>> * @dev: DRM device
>>> @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
>>> drm_timestamp_monotonic, int, 0600);
>>>static void drm_update_vblank_count(struct drm_device *dev, int crtc)
>>>{
>>>struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>>> -u32 cur_vblank, diff, tslot;
>>> +u32 cur_vblank, diff;
>>>bool rc;
>>>struct timeval t_vblank;
>>>
>>> @@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
>>> *dev, int crtc)
>>>if (diff == 0)
>>>return;
>>>
>>> -/* Reinitialize corresponding vblank timestamp if high-precision query
>>> - * available. Skip this step if query unsupported or failed. Will
>>> - * reinitialize delayed at next vblank interrupt in that case.
>>> +/*
>>> + * Only reinitialize corresponding vblank timestamp if high-precision 
>>> query
>>> + * available and didn't fail. Will reinitialize delayed at next vblank
>>> + * interrupt in that case.
>>> */
>>> -   

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Peter Hurley
On 04/15/2015 01:31 PM, Daniel Vetter wrote:
> On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
>> Hi Daniel,
>>
>> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
>>> This was a bit too much cargo-culted, so lets make it solid:
>>> - vblank->count doesn't need to be an atomic, writes are always done
>>>   under the protection of dev->vblank_time_lock. Switch to an unsigned
>>>   long instead and update comments. Note that atomic_read is just a
>>>   normal read of a volatile variable, so no need to audit all the
>>>   read-side access specifically.
>>>
>>> - The barriers for the vblank counter seqlock weren't complete: The
>>>   read-side was missing the first barrier between the counter read and
>>>   the timestamp read, it only had a barrier between the ts and the
>>>   counter read. We need both.
>>>
>>> - Barriers weren't properly documented. Since barriers only work if
>>>   you have them on boths sides of the transaction it's prudent to
>>>   reference where the other side is. To avoid duplicating the
>>>   write-side comment 3 times extract a little store_vblank() helper.
>>>   In that helper also assert that we do indeed hold
>>>   dev->vblank_time_lock, since in some cases the lock is acquired a
>>>   few functions up in the callchain.
>>>
>>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
>>> the vblank_wait ioctl.
>>>
>>> Cc: Chris Wilson 
>>> Cc: Mario Kleiner 
>>> Cc: Ville Syrjälä 
>>> Cc: Michel Dänzer 
>>> Signed-off-by: Daniel Vetter 
>>> ---
>>>  drivers/gpu/drm/drm_irq.c | 92 
>>> ---
>>>  include/drm/drmP.h|  8 +++--
>>>  2 files changed, 54 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>> index c8a34476570a..23bfbc61a494 100644
>>> --- a/drivers/gpu/drm/drm_irq.c
>>> +++ b/drivers/gpu/drm/drm_irq.c
>>> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
>>> int, 0600);
>>>  module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
>>> 0600);
>>>  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
>>> 0600);
>>>  
>>> +static void store_vblank(struct drm_device *dev, int crtc,
>>> +unsigned vblank_count_inc,
>>> +struct timeval *t_vblank)
>>> +{
>>> +   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>>> +   u32 tslot;
>>> +
>>> +   assert_spin_locked(&dev->vblank_time_lock);
>>> +
>>> +   if (t_vblank) {
>>> +   tslot = vblank->count + vblank_count_inc;
>>> +   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>>> +   }
>>> +
>>> +   /*
>>> +* vblank timestamp updates are protected on the write side with
>>> +* vblank_time_lock, but on the read side done locklessly using a
>>> +* sequence-lock on the vblank counter. Ensure correct ordering using
>>> +* memory barrriers. We need the barrier both before and also after the
>>> +* counter update to synchronize with the next timestamp write.
>>> +* The read-side barriers for this are in drm_vblank_count_and_time.
>>> +*/
>>> +   smp_wmb();
>>> +   vblank->count += vblank_count_inc;
>>> +   smp_wmb();
>>
>> The comment and the code are each self-contradictory.
>>
>> If vblank->count writes are always protected by vblank_time_lock (something I
>> did not verify but that the comment above asserts), then the trailing write
>> barrier is not required (and the assertion that it is in the comment is 
>> incorrect).
>>
>> A spin unlock operation is always a write barrier.
> 
> Hm yeah. Otoh to me that's bordering on "code too clever for my own good".
> That the spinlock is held I can assure. That no one goes around and does
> multiple vblank updates (because somehow that code raced with the hw
> itself) I can't easily assure with a simple assert or something similar.
> It's not the case right now, but that can changes.

The algorithm would be broken if multiple updates for the same vblank
count were allowed; that's why it checks to see if the vblank count has
not advanced before storing a new timestamp.

Otherwise, the read side would not be able to determine that the
timestamp is valid by double-checking that the vblank count has not
changed.

And besides, even if the code looped without dropping the spinlock,
the correct write order would still be observed because it would still
be executing on the same cpu.

My objection to the write memory barrier is not about optimization;
it's about correct code.

Regards,
Peter Hurley


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-16 Thread Peter Hurley
On 04/16/2015 02:39 AM, Mario Kleiner wrote:
> On 04/16/2015 03:29 AM, Peter Hurley wrote:
>> On 04/15/2015 05:26 PM, Mario Kleiner wrote:
>>> A couple of questions to educate me and one review comment.
>>>
>>> On 04/15/2015 07:34 PM, Daniel Vetter wrote:
 This was a bit too much cargo-culted, so lets make it solid:
 - vblank->count doesn't need to be an atomic, writes are always done
 under the protection of dev->vblank_time_lock. Switch to an unsigned
 long instead and update comments. Note that atomic_read is just a
 normal read of a volatile variable, so no need to audit all the
 read-side access specifically.

 - The barriers for the vblank counter seqlock weren't complete: The
 read-side was missing the first barrier between the counter read and
 the timestamp read, it only had a barrier between the ts and the
 counter read. We need both.

 - Barriers weren't properly documented. Since barriers only work if
 you have them on boths sides of the transaction it's prudent to
 reference where the other side is. To avoid duplicating the
 write-side comment 3 times extract a little store_vblank() helper.
 In that helper also assert that we do indeed hold
 dev->vblank_time_lock, since in some cases the lock is acquired a
 few functions up in the callchain.

 Spotted while reviewing a patch from Chris Wilson to add a fastpath to
 the vblank_wait ioctl.

 v2: Add comment to better explain how store_vblank works, suggested by
 Chris.

 v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
 implicit barrier in the spin_unlock. But that can only be proven by
 auditing all callers and my point in extracting this little helper was
 to localize all the locking into just one place. Hence I think that
 additional optimization is too risky.

 Cc: Chris Wilson 
 Cc: Mario Kleiner 
 Cc: Ville Syrjälä 
 Cc: Michel Dänzer 
 Cc: Peter Hurley 
 Signed-off-by: Daniel Vetter 
 ---
drivers/gpu/drm/drm_irq.c | 95 
 +--
include/drm/drmP.h|  8 +++-
2 files changed, 57 insertions(+), 46 deletions(-)

 diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
 index c8a34476570a..8694b77d0002 100644
 --- a/drivers/gpu/drm/drm_irq.c
 +++ b/drivers/gpu/drm/drm_irq.c
 @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
 int, 0600);
module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
 int, 0600);
module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
 0600);

 +static void store_vblank(struct drm_device *dev, int crtc,
 + unsigned vblank_count_inc,
 + struct timeval *t_vblank)
 +{
 +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
 +u32 tslot;
 +
 +assert_spin_locked(&dev->vblank_time_lock);
 +
 +if (t_vblank) {
 +/* All writers hold the spinlock, but readers are serialized by
 + * the latching of vblank->count below.
 + */
 +tslot = vblank->count + vblank_count_inc;
 +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
 +}
 +
 +/*
 + * vblank timestamp updates are protected on the write side with
 + * vblank_time_lock, but on the read side done locklessly using a
 + * sequence-lock on the vblank counter. Ensure correct ordering using
 + * memory barrriers. We need the barrier both before and also after 
 the
 + * counter update to synchronize with the next timestamp write.
 + * The read-side barriers for this are in drm_vblank_count_and_time.
 + */
 +smp_wmb();
 +vblank->count += vblank_count_inc;
 +smp_wmb();
 +}
 +
/**
 * drm_update_vblank_count - update the master vblank counter
 * @dev: DRM device
 @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
 drm_timestamp_monotonic, int, 0600);
static void drm_update_vblank_count(struct drm_device *dev, int crtc)
{
struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
 -u32 cur_vblank, diff, tslot;
 +u32 cur_vblank, diff;
bool rc;
struct timeval t_vblank;

 @@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct 
 drm_device *dev, int crtc)
if (diff == 0)
return;

 -/* Reinitialize corresponding vblank timestamp if high-precision query
 - * available. Skip this step if query unsupported or failed. Will
 - * reinitialize delayed at next vblank interrupt in that case.
 +/*
 + * Only reinitialize corresponding vblank timestamp if 

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Mario Kleiner
A couple of questions to educate me and one review comment.

On 04/15/2015 07:34 PM, Daniel Vetter wrote:
> This was a bit too much cargo-culted, so lets make it solid:
> - vblank->count doesn't need to be an atomic, writes are always done
>under the protection of dev->vblank_time_lock. Switch to an unsigned
>long instead and update comments. Note that atomic_read is just a
>normal read of a volatile variable, so no need to audit all the
>read-side access specifically.
>
> - The barriers for the vblank counter seqlock weren't complete: The
>read-side was missing the first barrier between the counter read and
>the timestamp read, it only had a barrier between the ts and the
>counter read. We need both.
>
> - Barriers weren't properly documented. Since barriers only work if
>you have them on boths sides of the transaction it's prudent to
>reference where the other side is. To avoid duplicating the
>write-side comment 3 times extract a little store_vblank() helper.
>In that helper also assert that we do indeed hold
>dev->vblank_time_lock, since in some cases the lock is acquired a
>few functions up in the callchain.
>
> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> the vblank_wait ioctl.
>
> v2: Add comment to better explain how store_vblank works, suggested by
> Chris.
>
> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
> implicit barrier in the spin_unlock. But that can only be proven by
> auditing all callers and my point in extracting this little helper was
> to localize all the locking into just one place. Hence I think that
> additional optimization is too risky.
>
> Cc: Chris Wilson 
> Cc: Mario Kleiner 
> Cc: Ville Syrjälä 
> Cc: Michel Dänzer 
> Cc: Peter Hurley 
> Signed-off-by: Daniel Vetter 
> ---
>   drivers/gpu/drm/drm_irq.c | 95 
> +--
>   include/drm/drmP.h|  8 +++-
>   2 files changed, 57 insertions(+), 46 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..8694b77d0002 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> int, 0600);
>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> 0600);
>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);
>
> +static void store_vblank(struct drm_device *dev, int crtc,
> +  unsigned vblank_count_inc,
> +  struct timeval *t_vblank)
> +{
> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> + u32 tslot;
> +
> + assert_spin_locked(&dev->vblank_time_lock);
> +
> + if (t_vblank) {
> + /* All writers hold the spinlock, but readers are serialized by
> +  * the latching of vblank->count below.
> +  */
> + tslot = vblank->count + vblank_count_inc;
> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> + }
> +
> + /*
> +  * vblank timestamp updates are protected on the write side with
> +  * vblank_time_lock, but on the read side done locklessly using a
> +  * sequence-lock on the vblank counter. Ensure correct ordering using
> +  * memory barrriers. We need the barrier both before and also after the
> +  * counter update to synchronize with the next timestamp write.
> +  * The read-side barriers for this are in drm_vblank_count_and_time.
> +  */
> + smp_wmb();
> + vblank->count += vblank_count_inc;
> + smp_wmb();
> +}
> +
>   /**
>* drm_update_vblank_count - update the master vblank counter
>* @dev: DRM device
> @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
> drm_timestamp_monotonic, int, 0600);
>   static void drm_update_vblank_count(struct drm_device *dev, int crtc)
>   {
>   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> - u32 cur_vblank, diff, tslot;
> + u32 cur_vblank, diff;
>   bool rc;
>   struct timeval t_vblank;
>
> @@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, int crtc)
>   if (diff == 0)
>   return;
>
> - /* Reinitialize corresponding vblank timestamp if high-precision query
> -  * available. Skip this step if query unsupported or failed. Will
> -  * reinitialize delayed at next vblank interrupt in that case.
> + /*
> +  * Only reinitialize corresponding vblank timestamp if high-precision 
> query
> +  * available and didn't fail. Will reinitialize delayed at next vblank
> +  * interrupt in that case.
>*/
> - if (rc) {
> - tslot = atomic_read(&vblank->count) + diff;
> - vblanktimestamp(dev, crtc, tslot) = t_vblank;
> - }
> -
> - smp_mb__before_atomic();
> - atomic_add(diff, &vblank->count);
> - smp_mb__after_atomic();
> + store_vblank(dev, crtc

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Peter Hurley
On 04/15/2015 05:26 PM, Mario Kleiner wrote:
> A couple of questions to educate me and one review comment.
> 
> On 04/15/2015 07:34 PM, Daniel Vetter wrote:
>> This was a bit too much cargo-culted, so lets make it solid:
>> - vblank->count doesn't need to be an atomic, writes are always done
>>under the protection of dev->vblank_time_lock. Switch to an unsigned
>>long instead and update comments. Note that atomic_read is just a
>>normal read of a volatile variable, so no need to audit all the
>>read-side access specifically.
>>
>> - The barriers for the vblank counter seqlock weren't complete: The
>>read-side was missing the first barrier between the counter read and
>>the timestamp read, it only had a barrier between the ts and the
>>counter read. We need both.
>>
>> - Barriers weren't properly documented. Since barriers only work if
>>you have them on boths sides of the transaction it's prudent to
>>reference where the other side is. To avoid duplicating the
>>write-side comment 3 times extract a little store_vblank() helper.
>>In that helper also assert that we do indeed hold
>>dev->vblank_time_lock, since in some cases the lock is acquired a
>>few functions up in the callchain.
>>
>> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
>> the vblank_wait ioctl.
>>
>> v2: Add comment to better explain how store_vblank works, suggested by
>> Chris.
>>
>> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
>> implicit barrier in the spin_unlock. But that can only be proven by
>> auditing all callers and my point in extracting this little helper was
>> to localize all the locking into just one place. Hence I think that
>> additional optimization is too risky.
>>
>> Cc: Chris Wilson 
>> Cc: Mario Kleiner 
>> Cc: Ville Syrjälä 
>> Cc: Michel Dänzer 
>> Cc: Peter Hurley 
>> Signed-off-by: Daniel Vetter 
>> ---
>>   drivers/gpu/drm/drm_irq.c | 95 
>> +--
>>   include/drm/drmP.h|  8 +++-
>>   2 files changed, 57 insertions(+), 46 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index c8a34476570a..8694b77d0002 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
>> int, 0600);
>>   module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
>> 0600);
>>   module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
>> 0600);
>>
>> +static void store_vblank(struct drm_device *dev, int crtc,
>> + unsigned vblank_count_inc,
>> + struct timeval *t_vblank)
>> +{
>> +struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>> +u32 tslot;
>> +
>> +assert_spin_locked(&dev->vblank_time_lock);
>> +
>> +if (t_vblank) {
>> +/* All writers hold the spinlock, but readers are serialized by
>> + * the latching of vblank->count below.
>> + */
>> +tslot = vblank->count + vblank_count_inc;
>> +vblanktimestamp(dev, crtc, tslot) = *t_vblank;
>> +}
>> +
>> +/*
>> + * vblank timestamp updates are protected on the write side with
>> + * vblank_time_lock, but on the read side done locklessly using a
>> + * sequence-lock on the vblank counter. Ensure correct ordering using
>> + * memory barrriers. We need the barrier both before and also after the
>> + * counter update to synchronize with the next timestamp write.
>> + * The read-side barriers for this are in drm_vblank_count_and_time.
>> + */
>> +smp_wmb();
>> +vblank->count += vblank_count_inc;
>> +smp_wmb();
>> +}
>> +
>>   /**
>>* drm_update_vblank_count - update the master vblank counter
>>* @dev: DRM device
>> @@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
>> drm_timestamp_monotonic, int, 0600);
>>   static void drm_update_vblank_count(struct drm_device *dev, int crtc)
>>   {
>>   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
>> -u32 cur_vblank, diff, tslot;
>> +u32 cur_vblank, diff;
>>   bool rc;
>>   struct timeval t_vblank;
>>
>> @@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
>> *dev, int crtc)
>>   if (diff == 0)
>>   return;
>>
>> -/* Reinitialize corresponding vblank timestamp if high-precision query
>> - * available. Skip this step if query unsupported or failed. Will
>> - * reinitialize delayed at next vblank interrupt in that case.
>> +/*
>> + * Only reinitialize corresponding vblank timestamp if high-precision 
>> query
>> + * available and didn't fail. Will reinitialize delayed at next vblank
>> + * interrupt in that case.
>>*/
>> -if (rc) {
>> -tslot = atomic_read(&vblank->count) + diff;
>> -vblanktimestamp(dev, crtc, tslot) = t_vblank;
>> -}
>> -
>> -smp_mb__before_atomic();
>> -atomic_ad

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Daniel Vetter
This was a bit too much cargo-culted, so lets make it solid:
- vblank->count doesn't need to be an atomic, writes are always done
  under the protection of dev->vblank_time_lock. Switch to an unsigned
  long instead and update comments. Note that atomic_read is just a
  normal read of a volatile variable, so no need to audit all the
  read-side access specifically.

- The barriers for the vblank counter seqlock weren't complete: The
  read-side was missing the first barrier between the counter read and
  the timestamp read, it only had a barrier between the ts and the
  counter read. We need both.

- Barriers weren't properly documented. Since barriers only work if
  you have them on boths sides of the transaction it's prudent to
  reference where the other side is. To avoid duplicating the
  write-side comment 3 times extract a little store_vblank() helper.
  In that helper also assert that we do indeed hold
  dev->vblank_time_lock, since in some cases the lock is acquired a
  few functions up in the callchain.

Spotted while reviewing a patch from Chris Wilson to add a fastpath to
the vblank_wait ioctl.

v2: Add comment to better explain how store_vblank works, suggested by
Chris.

v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
implicit barrier in the spin_unlock. But that can only be proven by
auditing all callers and my point in extracting this little helper was
to localize all the locking into just one place. Hence I think that
additional optimization is too risky.

Cc: Chris Wilson 
Cc: Mario Kleiner 
Cc: Ville Syrjälä 
Cc: Michel Dänzer 
Cc: Peter Hurley 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_irq.c | 95 +--
 include/drm/drmP.h|  8 +++-
 2 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8a34476570a..8694b77d0002 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 
0600);
 module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
0600);
 module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);

+static void store_vblank(struct drm_device *dev, int crtc,
+unsigned vblank_count_inc,
+struct timeval *t_vblank)
+{
+   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
+   u32 tslot;
+
+   assert_spin_locked(&dev->vblank_time_lock);
+
+   if (t_vblank) {
+   /* All writers hold the spinlock, but readers are serialized by
+* the latching of vblank->count below.
+*/
+   tslot = vblank->count + vblank_count_inc;
+   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
+   }
+
+   /*
+* vblank timestamp updates are protected on the write side with
+* vblank_time_lock, but on the read side done locklessly using a
+* sequence-lock on the vblank counter. Ensure correct ordering using
+* memory barrriers. We need the barrier both before and also after the
+* counter update to synchronize with the next timestamp write.
+* The read-side barriers for this are in drm_vblank_count_and_time.
+*/
+   smp_wmb();
+   vblank->count += vblank_count_inc;
+   smp_wmb();
+}
+
 /**
  * drm_update_vblank_count - update the master vblank counter
  * @dev: DRM device
@@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
drm_timestamp_monotonic, int, 0600);
 static void drm_update_vblank_count(struct drm_device *dev, int crtc)
 {
struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
-   u32 cur_vblank, diff, tslot;
+   u32 cur_vblank, diff;
bool rc;
struct timeval t_vblank;

@@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
*dev, int crtc)
if (diff == 0)
return;

-   /* Reinitialize corresponding vblank timestamp if high-precision query
-* available. Skip this step if query unsupported or failed. Will
-* reinitialize delayed at next vblank interrupt in that case.
+   /*
+* Only reinitialize corresponding vblank timestamp if high-precision 
query
+* available and didn't fail. Will reinitialize delayed at next vblank
+* interrupt in that case.
 */
-   if (rc) {
-   tslot = atomic_read(&vblank->count) + diff;
-   vblanktimestamp(dev, crtc, tslot) = t_vblank;
-   }
-
-   smp_mb__before_atomic();
-   atomic_add(diff, &vblank->count);
-   smp_mb__after_atomic();
+   store_vblank(dev, crtc, diff, rc ? &t_vblank : NULL);
 }

 /*
@@ -218,7 +242,7 @@ static void vblank_disable_and_save(struct drm_device *dev, 
int crtc)
/* Compute time difference to stored timestamp of last vblank
 * as updated by last invocation of drm_handle_vblank() in 

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Daniel Vetter
On Wed, Apr 15, 2015 at 09:00:04AM -0400, Peter Hurley wrote:
> Hi Daniel,
> 
> On 04/15/2015 03:17 AM, Daniel Vetter wrote:
> > This was a bit too much cargo-culted, so lets make it solid:
> > - vblank->count doesn't need to be an atomic, writes are always done
> >   under the protection of dev->vblank_time_lock. Switch to an unsigned
> >   long instead and update comments. Note that atomic_read is just a
> >   normal read of a volatile variable, so no need to audit all the
> >   read-side access specifically.
> > 
> > - The barriers for the vblank counter seqlock weren't complete: The
> >   read-side was missing the first barrier between the counter read and
> >   the timestamp read, it only had a barrier between the ts and the
> >   counter read. We need both.
> > 
> > - Barriers weren't properly documented. Since barriers only work if
> >   you have them on boths sides of the transaction it's prudent to
> >   reference where the other side is. To avoid duplicating the
> >   write-side comment 3 times extract a little store_vblank() helper.
> >   In that helper also assert that we do indeed hold
> >   dev->vblank_time_lock, since in some cases the lock is acquired a
> >   few functions up in the callchain.
> > 
> > Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> > the vblank_wait ioctl.
> > 
> > Cc: Chris Wilson 
> > Cc: Mario Kleiner 
> > Cc: Ville Syrjälä 
> > Cc: Michel Dänzer 
> > Signed-off-by: Daniel Vetter 
> > ---
> >  drivers/gpu/drm/drm_irq.c | 92 
> > ---
> >  include/drm/drmP.h|  8 +++--
> >  2 files changed, 54 insertions(+), 46 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > index c8a34476570a..23bfbc61a494 100644
> > --- a/drivers/gpu/drm/drm_irq.c
> > +++ b/drivers/gpu/drm/drm_irq.c
> > @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> > int, 0600);
> >  module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> > 0600);
> >  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
> > 0600);
> >  
> > +static void store_vblank(struct drm_device *dev, int crtc,
> > +unsigned vblank_count_inc,
> > +struct timeval *t_vblank)
> > +{
> > +   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> > +   u32 tslot;
> > +
> > +   assert_spin_locked(&dev->vblank_time_lock);
> > +
> > +   if (t_vblank) {
> > +   tslot = vblank->count + vblank_count_inc;
> > +   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> > +   }
> > +
> > +   /*
> > +* vblank timestamp updates are protected on the write side with
> > +* vblank_time_lock, but on the read side done locklessly using a
> > +* sequence-lock on the vblank counter. Ensure correct ordering using
> > +* memory barrriers. We need the barrier both before and also after the
> > +* counter update to synchronize with the next timestamp write.
> > +* The read-side barriers for this are in drm_vblank_count_and_time.
> > +*/
> > +   smp_wmb();
> > +   vblank->count += vblank_count_inc;
> > +   smp_wmb();
> 
> The comment and the code are each self-contradictory.
> 
> If vblank->count writes are always protected by vblank_time_lock (something I
> did not verify but that the comment above asserts), then the trailing write
> barrier is not required (and the assertion that it is in the comment is 
> incorrect).
> 
> A spin unlock operation is always a write barrier.

Hm yeah. Otoh to me that's bordering on "code too clever for my own good".
That the spinlock is held I can assure. That no one goes around and does
multiple vblank updates (because somehow that code raced with the hw
itself) I can't easily assure with a simple assert or something similar.
It's not the case right now, but that can changes.

Also it's not contradictory here, since you'd need to audit all the
callers to be able to make the claim that the 2nd smp_wmb() is redundant.
I'll just add a comment about this.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Chris Wilson
On Wed, Apr 15, 2015 at 07:34:43PM +0200, Daniel Vetter wrote:
> This was a bit too much cargo-culted, so lets make it solid:
> - vblank->count doesn't need to be an atomic, writes are always done
>   under the protection of dev->vblank_time_lock. Switch to an unsigned
>   long instead and update comments. Note that atomic_read is just a
>   normal read of a volatile variable, so no need to audit all the
>   read-side access specifically.
> 
> - The barriers for the vblank counter seqlock weren't complete: The
>   read-side was missing the first barrier between the counter read and
>   the timestamp read, it only had a barrier between the ts and the
>   counter read. We need both.
> 
> - Barriers weren't properly documented. Since barriers only work if
>   you have them on boths sides of the transaction it's prudent to
>   reference where the other side is. To avoid duplicating the
>   write-side comment 3 times extract a little store_vblank() helper.
>   In that helper also assert that we do indeed hold
>   dev->vblank_time_lock, since in some cases the lock is acquired a
>   few functions up in the callchain.
> 
> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> the vblank_wait ioctl.
> 
> v2: Add comment to better explain how store_vblank works, suggested by
> Chris.
> 
> v3: Peter noticed that as-is the 2nd smp_wmb is redundant with the
> implicit barrier in the spin_unlock. But that can only be proven by
> auditing all callers and my point in extracting this little helper was
> to localize all the locking into just one place. Hence I think that
> additional optimization is too risky.
> 
> Cc: Chris Wilson 
> Cc: Mario Kleiner 
> Cc: Ville Syrjälä 
> Cc: Michel Dänzer 
> Cc: Peter Hurley 
> Signed-off-by: Daniel Vetter 

Fwiw, there was no discernible difference in the time to query the
vblank counter (on an ivb i7-3720QM).

Reviewed-by: Chris Wilson 
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Daniel Vetter
This was a bit too much cargo-culted, so lets make it solid:
- vblank->count doesn't need to be an atomic, writes are always done
  under the protection of dev->vblank_time_lock. Switch to an unsigned
  long instead and update comments. Note that atomic_read is just a
  normal read of a volatile variable, so no need to audit all the
  read-side access specifically.

- The barriers for the vblank counter seqlock weren't complete: The
  read-side was missing the first barrier between the counter read and
  the timestamp read, it only had a barrier between the ts and the
  counter read. We need both.

- Barriers weren't properly documented. Since barriers only work if
  you have them on boths sides of the transaction it's prudent to
  reference where the other side is. To avoid duplicating the
  write-side comment 3 times extract a little store_vblank() helper.
  In that helper also assert that we do indeed hold
  dev->vblank_time_lock, since in some cases the lock is acquired a
  few functions up in the callchain.

Spotted while reviewing a patch from Chris Wilson to add a fastpath to
the vblank_wait ioctl.

v2: Add comment to better explain how store_vblank works, suggested by
Chris.

Cc: Chris Wilson 
Cc: Mario Kleiner 
Cc: Ville Syrjälä 
Cc: Michel Dänzer 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_irq.c | 95 +--
 include/drm/drmP.h|  8 +++-
 2 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8a34476570a..8694b77d0002 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -74,6 +74,36 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 
0600);
 module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
0600);
 module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);

+static void store_vblank(struct drm_device *dev, int crtc,
+unsigned vblank_count_inc,
+struct timeval *t_vblank)
+{
+   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
+   u32 tslot;
+
+   assert_spin_locked(&dev->vblank_time_lock);
+
+   if (t_vblank) {
+   /* All writers hold the spinlock, but readers are serialized by
+* the latching of vblank->count below.
+*/
+   tslot = vblank->count + vblank_count_inc;
+   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
+   }
+
+   /*
+* vblank timestamp updates are protected on the write side with
+* vblank_time_lock, but on the read side done locklessly using a
+* sequence-lock on the vblank counter. Ensure correct ordering using
+* memory barrriers. We need the barrier both before and also after the
+* counter update to synchronize with the next timestamp write.
+* The read-side barriers for this are in drm_vblank_count_and_time.
+*/
+   smp_wmb();
+   vblank->count += vblank_count_inc;
+   smp_wmb();
+}
+
 /**
  * drm_update_vblank_count - update the master vblank counter
  * @dev: DRM device
@@ -93,7 +123,7 @@ module_param_named(timestamp_monotonic, 
drm_timestamp_monotonic, int, 0600);
 static void drm_update_vblank_count(struct drm_device *dev, int crtc)
 {
struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
-   u32 cur_vblank, diff, tslot;
+   u32 cur_vblank, diff;
bool rc;
struct timeval t_vblank;

@@ -129,18 +159,12 @@ static void drm_update_vblank_count(struct drm_device 
*dev, int crtc)
if (diff == 0)
return;

-   /* Reinitialize corresponding vblank timestamp if high-precision query
-* available. Skip this step if query unsupported or failed. Will
-* reinitialize delayed at next vblank interrupt in that case.
+   /*
+* Only reinitialize corresponding vblank timestamp if high-precision 
query
+* available and didn't fail. Will reinitialize delayed at next vblank
+* interrupt in that case.
 */
-   if (rc) {
-   tslot = atomic_read(&vblank->count) + diff;
-   vblanktimestamp(dev, crtc, tslot) = t_vblank;
-   }
-
-   smp_mb__before_atomic();
-   atomic_add(diff, &vblank->count);
-   smp_mb__after_atomic();
+   store_vblank(dev, crtc, diff, rc ? &t_vblank : NULL);
 }

 /*
@@ -218,7 +242,7 @@ static void vblank_disable_and_save(struct drm_device *dev, 
int crtc)
/* Compute time difference to stored timestamp of last vblank
 * as updated by last invocation of drm_handle_vblank() in vblank irq.
 */
-   vblcount = atomic_read(&vblank->count);
+   vblcount = vblank->count;
diff_ns = timeval_to_ns(&tvblank) -
  timeval_to_ns(&vblanktimestamp(dev, crtc, vblcount));

@@ -234,17 +258,8 @@ static void vblank_disable_and_save(struct drm_device 
*dev, int crtc)
 * avail

[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Daniel Vetter
On Wed, Apr 15, 2015 at 09:17:03AM +0100, Chris Wilson wrote:
> On Wed, Apr 15, 2015 at 09:17:02AM +0200, Daniel Vetter wrote:
> > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > index c8a34476570a..23bfbc61a494 100644
> > --- a/drivers/gpu/drm/drm_irq.c
> > +++ b/drivers/gpu/drm/drm_irq.c
> > @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> > int, 0600);
> >  module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> > 0600);
> >  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
> > 0600);
> >  
> > +static void store_vblank(struct drm_device *dev, int crtc,
> > +unsigned vblank_count_inc,
> > +struct timeval *t_vblank)
> > +{
> > +   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> > +   u32 tslot;
> > +
> > +   assert_spin_locked(&dev->vblank_time_lock);
> > +
> > +   if (t_vblank) {
> > +   tslot = vblank->count + vblank_count_inc;
> > +   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> > +   }
> 
> It is not obvious this updates the right tslot in all circumstances.
> Care to explain?

Writers are synchronized with vblank_time_lock, so there shouldn't be any
races. Mario also has a patch to clear the ts slot if we don't have
anything to set it too (that one will conflict ofc).

Or what exactly do you mean?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[Intel-gfx] [PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Chris Wilson
On Wed, Apr 15, 2015 at 11:25:00AM +0200, Daniel Vetter wrote:
> On Wed, Apr 15, 2015 at 09:17:03AM +0100, Chris Wilson wrote:
> > On Wed, Apr 15, 2015 at 09:17:02AM +0200, Daniel Vetter wrote:
> > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> > > index c8a34476570a..23bfbc61a494 100644
> > > --- a/drivers/gpu/drm/drm_irq.c
> > > +++ b/drivers/gpu/drm/drm_irq.c
> > > @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, 
> > > drm_vblank_offdelay, int, 0600);
> > >  module_param_named(timestamp_precision_usec, drm_timestamp_precision, 
> > > int, 0600);
> > >  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 
> > > 0600);
> > >  
> > > +static void store_vblank(struct drm_device *dev, int crtc,
> > > +  unsigned vblank_count_inc,
> > > +  struct timeval *t_vblank)
> > > +{
> > > + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> > > + u32 tslot;
> > > +
> > > + assert_spin_locked(&dev->vblank_time_lock);
> > > +
> > > + if (t_vblank) {
> > > + tslot = vblank->count + vblank_count_inc;
> > > + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> > > + }
> > 
> > It is not obvious this updates the right tslot in all circumstances.
> > Care to explain?
> 
> Writers are synchronized with vblank_time_lock, so there shouldn't be any
> races. Mario also has a patch to clear the ts slot if we don't have
> anything to set it too (that one will conflict ofc).
> 
> Or what exactly do you mean?

I was staring at vblank->count and reading backwards from the smp_wmb().

Just something like:
if (t_vblank) {
/* All writers hold the spinlock, but readers are serialized by
 * the latching of vblank->count below.
 */
 u32 tslot = vblank->count + vblank_count_inc;
 ...

would help me understand the relationship better.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Chris Wilson
On Wed, Apr 15, 2015 at 09:17:02AM +0200, Daniel Vetter wrote:
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..23bfbc61a494 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> int, 0600);
>  module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> 0600);
>  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);
>  
> +static void store_vblank(struct drm_device *dev, int crtc,
> +  unsigned vblank_count_inc,
> +  struct timeval *t_vblank)
> +{
> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> + u32 tslot;
> +
> + assert_spin_locked(&dev->vblank_time_lock);
> +
> + if (t_vblank) {
> + tslot = vblank->count + vblank_count_inc;
> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> + }

It is not obvious this updates the right tslot in all circumstances.
Care to explain?

Otherwise the rest looks consistent with seqlock, using the
vblank->count as the latch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Daniel Vetter
This was a bit too much cargo-culted, so lets make it solid:
- vblank->count doesn't need to be an atomic, writes are always done
  under the protection of dev->vblank_time_lock. Switch to an unsigned
  long instead and update comments. Note that atomic_read is just a
  normal read of a volatile variable, so no need to audit all the
  read-side access specifically.

- The barriers for the vblank counter seqlock weren't complete: The
  read-side was missing the first barrier between the counter read and
  the timestamp read, it only had a barrier between the ts and the
  counter read. We need both.

- Barriers weren't properly documented. Since barriers only work if
  you have them on boths sides of the transaction it's prudent to
  reference where the other side is. To avoid duplicating the
  write-side comment 3 times extract a little store_vblank() helper.
  In that helper also assert that we do indeed hold
  dev->vblank_time_lock, since in some cases the lock is acquired a
  few functions up in the callchain.

Spotted while reviewing a patch from Chris Wilson to add a fastpath to
the vblank_wait ioctl.

Cc: Chris Wilson 
Cc: Mario Kleiner 
Cc: Ville Syrjälä 
Cc: Michel Dänzer 
Signed-off-by: Daniel Vetter 
---
 drivers/gpu/drm/drm_irq.c | 92 ---
 include/drm/drmP.h|  8 +++--
 2 files changed, 54 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8a34476570a..23bfbc61a494 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, int, 
0600);
 module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
0600);
 module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);

+static void store_vblank(struct drm_device *dev, int crtc,
+unsigned vblank_count_inc,
+struct timeval *t_vblank)
+{
+   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
+   u32 tslot;
+
+   assert_spin_locked(&dev->vblank_time_lock);
+
+   if (t_vblank) {
+   tslot = vblank->count + vblank_count_inc;
+   vblanktimestamp(dev, crtc, tslot) = *t_vblank;
+   }
+
+   /*
+* vblank timestamp updates are protected on the write side with
+* vblank_time_lock, but on the read side done locklessly using a
+* sequence-lock on the vblank counter. Ensure correct ordering using
+* memory barrriers. We need the barrier both before and also after the
+* counter update to synchronize with the next timestamp write.
+* The read-side barriers for this are in drm_vblank_count_and_time.
+*/
+   smp_wmb();
+   vblank->count += vblank_count_inc;
+   smp_wmb();
+}
+
 /**
  * drm_update_vblank_count - update the master vblank counter
  * @dev: DRM device
@@ -93,7 +120,7 @@ module_param_named(timestamp_monotonic, 
drm_timestamp_monotonic, int, 0600);
 static void drm_update_vblank_count(struct drm_device *dev, int crtc)
 {
struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
-   u32 cur_vblank, diff, tslot;
+   u32 cur_vblank, diff;
bool rc;
struct timeval t_vblank;

@@ -129,18 +156,12 @@ static void drm_update_vblank_count(struct drm_device 
*dev, int crtc)
if (diff == 0)
return;

-   /* Reinitialize corresponding vblank timestamp if high-precision query
-* available. Skip this step if query unsupported or failed. Will
-* reinitialize delayed at next vblank interrupt in that case.
+   /*
+* Only reinitialize corresponding vblank timestamp if high-precision 
query
+* available and didn't fail. Will reinitialize delayed at next vblank
+* interrupt in that case.
 */
-   if (rc) {
-   tslot = atomic_read(&vblank->count) + diff;
-   vblanktimestamp(dev, crtc, tslot) = t_vblank;
-   }
-
-   smp_mb__before_atomic();
-   atomic_add(diff, &vblank->count);
-   smp_mb__after_atomic();
+   store_vblank(dev, crtc, diff, rc ? &t_vblank : NULL);
 }

 /*
@@ -218,7 +239,7 @@ static void vblank_disable_and_save(struct drm_device *dev, 
int crtc)
/* Compute time difference to stored timestamp of last vblank
 * as updated by last invocation of drm_handle_vblank() in vblank irq.
 */
-   vblcount = atomic_read(&vblank->count);
+   vblcount = vblank->count;
diff_ns = timeval_to_ns(&tvblank) -
  timeval_to_ns(&vblanktimestamp(dev, crtc, vblcount));

@@ -234,17 +255,8 @@ static void vblank_disable_and_save(struct drm_device 
*dev, int crtc)
 * available. In that case we can't account for this and just
 * hope for the best.
 */
-   if (vblrc && (abs64(diff_ns) > 100)) {
-   /* Store new timestamp in ringbuffer. */
-   vblanktimestam

[PATCH] drm/vblank: Fixup and document timestamp update/read barriers

2015-04-15 Thread Peter Hurley
Hi Daniel,

On 04/15/2015 03:17 AM, Daniel Vetter wrote:
> This was a bit too much cargo-culted, so lets make it solid:
> - vblank->count doesn't need to be an atomic, writes are always done
>   under the protection of dev->vblank_time_lock. Switch to an unsigned
>   long instead and update comments. Note that atomic_read is just a
>   normal read of a volatile variable, so no need to audit all the
>   read-side access specifically.
> 
> - The barriers for the vblank counter seqlock weren't complete: The
>   read-side was missing the first barrier between the counter read and
>   the timestamp read, it only had a barrier between the ts and the
>   counter read. We need both.
> 
> - Barriers weren't properly documented. Since barriers only work if
>   you have them on boths sides of the transaction it's prudent to
>   reference where the other side is. To avoid duplicating the
>   write-side comment 3 times extract a little store_vblank() helper.
>   In that helper also assert that we do indeed hold
>   dev->vblank_time_lock, since in some cases the lock is acquired a
>   few functions up in the callchain.
> 
> Spotted while reviewing a patch from Chris Wilson to add a fastpath to
> the vblank_wait ioctl.
> 
> Cc: Chris Wilson 
> Cc: Mario Kleiner 
> Cc: Ville Syrjälä 
> Cc: Michel Dänzer 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_irq.c | 92 
> ---
>  include/drm/drmP.h|  8 +++--
>  2 files changed, 54 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..23bfbc61a494 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -74,6 +74,33 @@ module_param_named(vblankoffdelay, drm_vblank_offdelay, 
> int, 0600);
>  module_param_named(timestamp_precision_usec, drm_timestamp_precision, int, 
> 0600);
>  module_param_named(timestamp_monotonic, drm_timestamp_monotonic, int, 0600);
>  
> +static void store_vblank(struct drm_device *dev, int crtc,
> +  unsigned vblank_count_inc,
> +  struct timeval *t_vblank)
> +{
> + struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> + u32 tslot;
> +
> + assert_spin_locked(&dev->vblank_time_lock);
> +
> + if (t_vblank) {
> + tslot = vblank->count + vblank_count_inc;
> + vblanktimestamp(dev, crtc, tslot) = *t_vblank;
> + }
> +
> + /*
> +  * vblank timestamp updates are protected on the write side with
> +  * vblank_time_lock, but on the read side done locklessly using a
> +  * sequence-lock on the vblank counter. Ensure correct ordering using
> +  * memory barrriers. We need the barrier both before and also after the
> +  * counter update to synchronize with the next timestamp write.
> +  * The read-side barriers for this are in drm_vblank_count_and_time.
> +  */
> + smp_wmb();
> + vblank->count += vblank_count_inc;
> + smp_wmb();

The comment and the code are each self-contradictory.

If vblank->count writes are always protected by vblank_time_lock (something I
did not verify but that the comment above asserts), then the trailing write
barrier is not required (and the assertion that it is in the comment is 
incorrect).

A spin unlock operation is always a write barrier.

Regards,
Peter Hurley

> +}
> +
>  /**
>   * drm_update_vblank_count - update the master vblank counter
>   * @dev: DRM device
> @@ -93,7 +120,7 @@ module_param_named(timestamp_monotonic, 
> drm_timestamp_monotonic, int, 0600);
>  static void drm_update_vblank_count(struct drm_device *dev, int crtc)
>  {
>   struct drm_vblank_crtc *vblank = &dev->vblank[crtc];
> - u32 cur_vblank, diff, tslot;
> + u32 cur_vblank, diff;
>   bool rc;
>   struct timeval t_vblank;
>  
> @@ -129,18 +156,12 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, int crtc)
>   if (diff == 0)
>   return;
>  
> - /* Reinitialize corresponding vblank timestamp if high-precision query
> -  * available. Skip this step if query unsupported or failed. Will
> -  * reinitialize delayed at next vblank interrupt in that case.
> + /*
> +  * Only reinitialize corresponding vblank timestamp if high-precision 
> query
> +  * available and didn't fail. Will reinitialize delayed at next vblank
> +  * interrupt in that case.
>*/
> - if (rc) {
> - tslot = atomic_read(&vblank->count) + diff;
> - vblanktimestamp(dev, crtc, tslot) = t_vblank;
> - }
> -
> - smp_mb__before_atomic();
> - atomic_add(diff, &vblank->count);
> - smp_mb__after_atomic();
> + store_vblank(dev, crtc, diff, rc ? &t_vblank : NULL);
>  }
>  
>  /*
> @@ -218,7 +239,7 @@ static void vblank_disable_and_save(struct drm_device 
> *dev, int crtc)
>   /* Compute time difference to stored timestamp of last vblank
>* as updated by last invocation of drm_handle_vblank