Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-14 Thread Daniel Vetter
On Fri, Jun 14, 2013 at 9:12 PM, Stéphane Marchesin
 wrote:
> On Wed, Jun 12, 2013 at 3:06 PM, Jesse Barnes  
> wrote:
>> On Wed, 12 Jun 2013 00:48:25 +0100
>> Chris Wilson  wrote:
>>
>>> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
>>> >
>>> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson  
>>> > wrote:
>>> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
>>> > >> During suspend all fences are reset, including their pin_count which
>>> > >> is reset to 0. However a framebuffer can be bound across
>>> > >> suspend/resume, which means that when the buffer is unbound after
>>> > >> resume, the pin count for the buffer will be negative. Since the
>>> > >> fence pin count is now negative when available and zero when in use,
>>> > >> the buffer's fence will get recycled when the fence is in use which
>>> > >> is the opposite of what we want. The visible effect is that since the
>>> > >> fence is recycled the tiling mode goes away while the buffer is being
>>> > >> displayed and we get lines/screens of garbage.
>>> > >>
>>> > >> To fix this, we repin the fences for all bound fbs on resume, which
>>> > >> ensures the pin count is right.
>>> > >
>>> > > Yikes. So why do we not just keep the fences alive during suspend (not
>>> > > touching their pin_count), and then just iterate over the list of fences
>>> > > rewriting the register as required upon resume? That would seem less
>>> > > error prone than trying to reconstruct the lost pin_count.
>>> >
>>> > I suspect they'd need to be saved/restored at the hw level as well,
>>> > which AFAICS isn't happening today...
>>>
>>> Ugh, I introduced this bug 30 months ago - saved by the VT switch on
>>> resume. But we can restore the fences from dev_priv->fence_regs...
>>> Actually we have a very similar problem after a GPU reset where we
>>> should restore fences for pinned objects (i.e. the scanout). The patch
>>> to fix both looks fairly straightforward.
>>
>> To be clear, this only affects gen3 right?  For gen4+ we don't need the
>> fences for scanout since we have a bit in the plane control...
>
> Yup I've only ever seen the issue on gen3.
>
> Anyway, what should we do about this? Should I make another patch
> where I save/restore the fence regs instead?

drm-intel-fixes has already the improved patch from Chris,
drm-intel-next-queued has a patch to add a WARN so we'll catch this
much quicker next time around.
-Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-14 Thread Stéphane Marchesin
On Wed, Jun 12, 2013 at 3:06 PM, Jesse Barnes  wrote:
> On Wed, 12 Jun 2013 00:48:25 +0100
> Chris Wilson  wrote:
>
>> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
>> >
>> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson  
>> > wrote:
>> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
>> > >> During suspend all fences are reset, including their pin_count which
>> > >> is reset to 0. However a framebuffer can be bound across
>> > >> suspend/resume, which means that when the buffer is unbound after
>> > >> resume, the pin count for the buffer will be negative. Since the
>> > >> fence pin count is now negative when available and zero when in use,
>> > >> the buffer's fence will get recycled when the fence is in use which
>> > >> is the opposite of what we want. The visible effect is that since the
>> > >> fence is recycled the tiling mode goes away while the buffer is being
>> > >> displayed and we get lines/screens of garbage.
>> > >>
>> > >> To fix this, we repin the fences for all bound fbs on resume, which
>> > >> ensures the pin count is right.
>> > >
>> > > Yikes. So why do we not just keep the fences alive during suspend (not
>> > > touching their pin_count), and then just iterate over the list of fences
>> > > rewriting the register as required upon resume? That would seem less
>> > > error prone than trying to reconstruct the lost pin_count.
>> >
>> > I suspect they'd need to be saved/restored at the hw level as well,
>> > which AFAICS isn't happening today...
>>
>> Ugh, I introduced this bug 30 months ago - saved by the VT switch on
>> resume. But we can restore the fences from dev_priv->fence_regs...
>> Actually we have a very similar problem after a GPU reset where we
>> should restore fences for pinned objects (i.e. the scanout). The patch
>> to fix both looks fairly straightforward.
>
> To be clear, this only affects gen3 right?  For gen4+ we don't need the
> fences for scanout since we have a bit in the plane control...

Yup I've only ever seen the issue on gen3.

Anyway, what should we do about this? Should I make another patch
where I save/restore the fence regs instead?

Stéphane

>
> Or are we failing to fault on a previously mapped scanout too?  If so,
> we'd need to cover more than just scanout here.
>
> --
> Jesse Barnes, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-12 Thread Chris Wilson
On Wed, Jun 12, 2013 at 03:06:51PM -0700, Jesse Barnes wrote:
> On Wed, 12 Jun 2013 00:48:25 +0100
> Chris Wilson  wrote:
> 
> > On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
> > > 
> > > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson  
> > > wrote:
> > > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> > > >> During suspend all fences are reset, including their pin_count which
> > > >> is reset to 0. However a framebuffer can be bound across
> > > >> suspend/resume, which means that when the buffer is unbound after
> > > >> resume, the pin count for the buffer will be negative. Since the
> > > >> fence pin count is now negative when available and zero when in use,
> > > >> the buffer's fence will get recycled when the fence is in use which
> > > >> is the opposite of what we want. The visible effect is that since the
> > > >> fence is recycled the tiling mode goes away while the buffer is being
> > > >> displayed and we get lines/screens of garbage.
> > > >>
> > > >> To fix this, we repin the fences for all bound fbs on resume, which
> > > >> ensures the pin count is right.
> > > >
> > > > Yikes. So why do we not just keep the fences alive during suspend (not
> > > > touching their pin_count), and then just iterate over the list of fences
> > > > rewriting the register as required upon resume? That would seem less
> > > > error prone than trying to reconstruct the lost pin_count.
> > > 
> > > I suspect they'd need to be saved/restored at the hw level as well,
> > > which AFAICS isn't happening today...
> > 
> > Ugh, I introduced this bug 30 months ago - saved by the VT switch on
> > resume. But we can restore the fences from dev_priv->fence_regs...
> > Actually we have a very similar problem after a GPU reset where we
> > should restore fences for pinned objects (i.e. the scanout). The patch
> > to fix both looks fairly straightforward.
> 
> To be clear, this only affects gen3 right?  For gen4+ we don't need the
> fences for scanout since we have a bit in the plane control...

True, you will only get scanout corruption from lack of a fence on gen2/3.
FBC will also be more broken than before.
 
> Or are we failing to fault on a previously mapped scanout too?  If so,
> we'd need to cover more than just scanout here.

They all get faulted in again, and all will grab a fence again. Only
scanouts pinned at the time of resume will believe that they hold an
additional reference to the fence, and so unpin it once too often. If we
do that enough times, we will starve ourselves of fences. And a gen3
scanout runs the risk of losing its fence at any time.

So the impact is far less severe than it appears at first glance. Unless
I've missed something.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-12 Thread Jesse Barnes
On Wed, 12 Jun 2013 00:48:25 +0100
Chris Wilson  wrote:

> On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
> > 
> > On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson  
> > wrote:
> > > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> > >> During suspend all fences are reset, including their pin_count which
> > >> is reset to 0. However a framebuffer can be bound across
> > >> suspend/resume, which means that when the buffer is unbound after
> > >> resume, the pin count for the buffer will be negative. Since the
> > >> fence pin count is now negative when available and zero when in use,
> > >> the buffer's fence will get recycled when the fence is in use which
> > >> is the opposite of what we want. The visible effect is that since the
> > >> fence is recycled the tiling mode goes away while the buffer is being
> > >> displayed and we get lines/screens of garbage.
> > >>
> > >> To fix this, we repin the fences for all bound fbs on resume, which
> > >> ensures the pin count is right.
> > >
> > > Yikes. So why do we not just keep the fences alive during suspend (not
> > > touching their pin_count), and then just iterate over the list of fences
> > > rewriting the register as required upon resume? That would seem less
> > > error prone than trying to reconstruct the lost pin_count.
> > 
> > I suspect they'd need to be saved/restored at the hw level as well,
> > which AFAICS isn't happening today...
> 
> Ugh, I introduced this bug 30 months ago - saved by the VT switch on
> resume. But we can restore the fences from dev_priv->fence_regs...
> Actually we have a very similar problem after a GPU reset where we
> should restore fences for pinned objects (i.e. the scanout). The patch
> to fix both looks fairly straightforward.

To be clear, this only affects gen3 right?  For gen4+ we don't need the
fences for scanout since we have a bit in the plane control...

Or are we failing to fault on a previously mapped scanout too?  If so,
we'd need to cover more than just scanout here.

-- 
Jesse Barnes, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-11 Thread Chris Wilson
On Tue, Jun 11, 2013 at 04:01:21PM -0700, Stéphane Marchesin wrote:
> 
> On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson  
> wrote:
> > On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> >> During suspend all fences are reset, including their pin_count which
> >> is reset to 0. However a framebuffer can be bound across
> >> suspend/resume, which means that when the buffer is unbound after
> >> resume, the pin count for the buffer will be negative. Since the
> >> fence pin count is now negative when available and zero when in use,
> >> the buffer's fence will get recycled when the fence is in use which
> >> is the opposite of what we want. The visible effect is that since the
> >> fence is recycled the tiling mode goes away while the buffer is being
> >> displayed and we get lines/screens of garbage.
> >>
> >> To fix this, we repin the fences for all bound fbs on resume, which
> >> ensures the pin count is right.
> >
> > Yikes. So why do we not just keep the fences alive during suspend (not
> > touching their pin_count), and then just iterate over the list of fences
> > rewriting the register as required upon resume? That would seem less
> > error prone than trying to reconstruct the lost pin_count.
> 
> I suspect they'd need to be saved/restored at the hw level as well,
> which AFAICS isn't happening today...

Ugh, I introduced this bug 30 months ago - saved by the VT switch on
resume. But we can restore the fences from dev_priv->fence_regs...
Actually we have a very similar problem after a GPU reset where we
should restore fences for pinned objects (i.e. the scanout). The patch
to fix both looks fairly straightforward.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-11 Thread Stéphane Marchesin
On Tue, Jun 11, 2013 at 3:57 PM, Chris Wilson  wrote:
> On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
>> During suspend all fences are reset, including their pin_count which
>> is reset to 0. However a framebuffer can be bound across
>> suspend/resume, which means that when the buffer is unbound after
>> resume, the pin count for the buffer will be negative. Since the
>> fence pin count is now negative when available and zero when in use,
>> the buffer's fence will get recycled when the fence is in use which
>> is the opposite of what we want. The visible effect is that since the
>> fence is recycled the tiling mode goes away while the buffer is being
>> displayed and we get lines/screens of garbage.
>>
>> To fix this, we repin the fences for all bound fbs on resume, which
>> ensures the pin count is right.
>
> Yikes. So why do we not just keep the fences alive during suspend (not
> touching their pin_count), and then just iterate over the list of fences
> rewriting the register as required upon resume? That would seem less
> error prone than trying to reconstruct the lost pin_count.

I suspect they'd need to be saved/restored at the hw level as well,
which AFAICS isn't happening today...

Stéphane
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-11 Thread Chris Wilson
On Tue, Jun 11, 2013 at 03:49:27PM -0700, Stéphane Marchesin wrote:
> During suspend all fences are reset, including their pin_count which
> is reset to 0. However a framebuffer can be bound across
> suspend/resume, which means that when the buffer is unbound after
> resume, the pin count for the buffer will be negative. Since the
> fence pin count is now negative when available and zero when in use,
> the buffer's fence will get recycled when the fence is in use which
> is the opposite of what we want. The visible effect is that since the
> fence is recycled the tiling mode goes away while the buffer is being
> displayed and we get lines/screens of garbage.
> 
> To fix this, we repin the fences for all bound fbs on resume, which
> ensures the pin count is right.

Yikes. So why do we not just keep the fences alive during suspend (not
touching their pin_count), and then just iterate over the list of fences
rewriting the register as required upon resume? That would seem less
error prone than trying to reconstruct the lost pin_count.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH 2/2] drm/i915: repin bound framebuffers on resume

2013-06-11 Thread Stéphane Marchesin
During suspend all fences are reset, including their pin_count which
is reset to 0. However a framebuffer can be bound across
suspend/resume, which means that when the buffer is unbound after
resume, the pin count for the buffer will be negative. Since the
fence pin count is now negative when available and zero when in use,
the buffer's fence will get recycled when the fence is in use which
is the opposite of what we want. The visible effect is that since the
fence is recycled the tiling mode goes away while the buffer is being
displayed and we get lines/screens of garbage.

To fix this, we repin the fences for all bound fbs on resume, which
ensures the pin count is right.

Signed-off-by: Stéphane Marchesin 
---
 drivers/gpu/drm/i915/i915_drv.c  |  2 ++
 drivers/gpu/drm/i915/intel_display.c | 32 
 drivers/gpu/drm/i915/intel_drv.h |  1 +
 3 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index a2e4953..b8e82ab 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -643,6 +643,8 @@ static int __i915_drm_thaw(struct drm_device *dev)
/* We need working interrupts for modeset enabling ... */
drm_irq_install(dev);
 
+   /* Repin all live fences before resuming */
+   intel_repin_fences(dev);
intel_modeset_init_hw(dev);
 
drm_modeset_lock_all(dev);
diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 56746dc..4bf3240 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2051,6 +2051,38 @@ void intel_unpin_fb_obj(struct drm_i915_gem_object *obj)
i915_gem_object_unpin(obj);
 }
 
+/*
+ * Repin the fences for all currently bound fbs. During suspend i915_drm_freeze
+ * calls i915_gem_reset, which in turn calls i915_gem_reset_fences, which
+ * resets the fence pin_counts to 0. So on resume we can call this function to
+ * restore the fences on bound framebuffers.
+ */
+void intel_repin_fences(struct drm_device *dev)
+{
+   struct drm_crtc *crtc;
+   struct drm_i915_gem_object *obj;
+   int ret;
+
+   mutex_lock(&mode_config->mutex);
+   list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
+   if (!crtc || !crtc->fb)
+   continue;
+   obj = to_intel_framebuffer(crtc->fb)->obj;
+   if (!obj)
+   continue;
+
+   /* Install a fence for tiled scan-out. */
+   if (obj->tiling_mode != I915_TILING_NONE) {
+   ret = i915_gem_object_get_fence(obj);
+   if (ret)
+   DRM_ERROR("Couldn't get a fence\n");
+   else
+   i915_gem_object_pin_fence(obj);
+   }
+   }
+   mutex_unlock(&mode_config->mutex);
+}
+
 /* Computes the linear offset to the base tile and adjusts x, y. bytes per 
pixel
  * is assumed to be a power-of-two. */
 unsigned long intel_gen4_compute_page_offset(int *x, int *y,
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 624a9e6..b5395ed 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -630,6 +630,7 @@ extern int intel_pin_and_fence_fb_obj(struct drm_device 
*dev,
  struct drm_i915_gem_object *obj,
  struct intel_ring_buffer *pipelined);
 extern void intel_unpin_fb_obj(struct drm_i915_gem_object *obj);
+extern void intel_repin_fences(struct drm_device *dev);
 
 extern int intel_framebuffer_init(struct drm_device *dev,
  struct intel_framebuffer *ifb,
-- 
1.8.3

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx