[Intel-gfx] [PATCH] drm/i915: Only insert the mb() before updating the fence parameter

2012-10-09 Thread Chris Wilson
With a fence, we only need to insert a memory barrier around the actual
fence alteration for CPU accesses through the GTT. Performing the
barrier in flush-fence was inserting unnecessary and expensive barriers
for never fenced objects.

Note removing the barriers from flush-fence, which was effectively a
barrier before every direct access through the GTT, revealed that we
where missing a barrier before the first access through the GTT. Lack of
that barrier was sufficient to cause GPU hangs.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_gem.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05ff790..c1ef0a8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2771,9 +2771,22 @@ static void i830_write_fence_reg(struct drm_device *dev, 
int reg,
POSTING_READ(FENCE_REG_830_0 + reg * 4);
 }
 
+inline static bool i915_gem_object_needs_mb(struct drm_i915_gem_object *obj)
+{
+   return obj  obj-base.read_domains  I915_GEM_DOMAIN_GTT;
+}
+
 static void i915_gem_write_fence(struct drm_device *dev, int reg,
 struct drm_i915_gem_object *obj)
 {
+   struct drm_i915_private *dev_priv = dev-dev_private;
+
+   /* Ensure that all CPU reads are completed before installing a fence
+* and all writes before removing the fence.
+*/
+   if (i915_gem_object_needs_mb(dev_priv-fence_regs[reg].obj))
+   mb();
+
switch (INTEL_INFO(dev)-gen) {
case 7:
case 6: sandybridge_write_fence_reg(dev, reg, obj); break;
@@ -2783,6 +2796,9 @@ static void i915_gem_write_fence(struct drm_device *dev, 
int reg,
case 2: i830_write_fence_reg(dev, reg, obj); break;
default: break;
}
+
+   if (i915_gem_object_needs_mb(obj))
+   mb();
 }
 
 static inline int fence_number(struct drm_i915_private *dev_priv,
@@ -2812,7 +2828,7 @@ static void i915_gem_object_update_fence(struct 
drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_object_flush_fence(struct drm_i915_gem_object *obj)
+i915_gem_object_wait_fence(struct drm_i915_gem_object *obj)
 {
if (obj-last_fenced_seqno) {
int ret = i915_wait_seqno(obj-ring, obj-last_fenced_seqno);
@@ -2822,12 +2838,6 @@ i915_gem_object_flush_fence(struct drm_i915_gem_object 
*obj)
obj-last_fenced_seqno = 0;
}
 
-   /* Ensure that all CPU reads are completed before installing a fence
-* and all writes before removing the fence.
-*/
-   if (obj-base.read_domains  I915_GEM_DOMAIN_GTT)
-   mb();
-
obj-fenced_gpu_access = false;
return 0;
 }
@@ -2838,7 +2848,7 @@ i915_gem_object_put_fence(struct drm_i915_gem_object *obj)
struct drm_i915_private *dev_priv = obj-base.dev-dev_private;
int ret;
 
-   ret = i915_gem_object_flush_fence(obj);
+   ret = i915_gem_object_wait_fence(obj);
if (ret)
return ret;
 
@@ -2912,7 +2922,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 * will need to serialise the write to the associated fence register?
 */
if (obj-fence_dirty) {
-   ret = i915_gem_object_flush_fence(obj);
+   ret = i915_gem_object_wait_fence(obj);
if (ret)
return ret;
}
@@ -2933,7 +2943,7 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
if (reg-obj) {
struct drm_i915_gem_object *old = reg-obj;
 
-   ret = i915_gem_object_flush_fence(old);
+   ret = i915_gem_object_wait_fence(old);
if (ret)
return ret;
 
@@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct 
drm_i915_gem_object *obj, bool write)
 
i915_gem_object_flush_cpu_write_domain(obj);
 
+   if ((obj-base.read_domains  I915_GEM_DOMAIN_GTT) == 0)
+   mb();
+
old_write_domain = obj-base.write_domain;
old_read_domains = obj-base.read_domains;
 
-- 
1.7.10.4

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only insert the mb() before updating the fence parameter

2012-10-09 Thread Daniel Vetter
On Tue, Oct 09, 2012 at 09:38:58AM +0100, Chris Wilson wrote:
 With a fence, we only need to insert a memory barrier around the actual
 fence alteration for CPU accesses through the GTT. Performing the
 barrier in flush-fence was inserting unnecessary and expensive barriers
 for never fenced objects.
 
 Note removing the barriers from flush-fence, which was effectively a
 barrier before every direct access through the GTT, revealed that we
 where missing a barrier before the first access through the GTT. Lack of
 that barrier was sufficient to cause GPU hangs.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

Looks good and finally puts some clear explanation and consistency behind
our mb()s. Two minor nitpicks, otherwise.

Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch

 ---
  drivers/gpu/drm/i915/i915_gem.c |   33 +++--
  1 file changed, 23 insertions(+), 10 deletions(-)

[snip]

 @@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct 
 drm_i915_gem_object *obj, bool write)
  
   i915_gem_object_flush_cpu_write_domain(obj);
  
 + if ((obj-base.read_domains  I915_GEM_DOMAIN_GTT) == 0)
 + mb();
 +

I think a comment here like we have one for all other gtt related memory
barries would be good. Another thing is the flush_gtt_write_domain uses a
wmb, whereas here we don't bother with micro-optimizing things. So I think
it'd be good to just use a mb() for that, too, if just for consistency.

Also, you know the grumpy maintainer drill: Could we exercise these
barriers with a minimal i-g-t testcase, please? Since you've managed to
kill your machine by removing them, they're no longer just there to keep
us happy, hence I'd like to have them exercised ...

Another thing that just crossed my mind: Could we lack a set of mb()s for
cpu access on llc platforms? For non-coherent platforms the mb() in the
clflush paths will do that, but on llc platforms I couldn't find anything.
And that lp bugs seems to make an excellent case for them being required
...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only insert the mb() before updating the fence parameter

2012-10-09 Thread Daniel Vetter
While looking at barriers, I think we could be a bit more paranoid with
the barrier in intel_read_status_page and up it to a full mb() and move it
into the !lazy_coherency conditional of the various get_seqno functions.
-Daniel

On Tue, Oct 09, 2012 at 11:54:12AM +0200, Daniel Vetter wrote:
 On Tue, Oct 09, 2012 at 09:38:58AM +0100, Chris Wilson wrote:
  With a fence, we only need to insert a memory barrier around the actual
  fence alteration for CPU accesses through the GTT. Performing the
  barrier in flush-fence was inserting unnecessary and expensive barriers
  for never fenced objects.
  
  Note removing the barriers from flush-fence, which was effectively a
  barrier before every direct access through the GTT, revealed that we
  where missing a barrier before the first access through the GTT. Lack of
  that barrier was sufficient to cause GPU hangs.
  
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 
 Looks good and finally puts some clear explanation and consistency behind
 our mb()s. Two minor nitpicks, otherwise.
 
 Reviewed-by: Daniel Vetter daniel.vet...@ffwll.ch
 
  ---
   drivers/gpu/drm/i915/i915_gem.c |   33 +++--
   1 file changed, 23 insertions(+), 10 deletions(-)
 
 [snip]
 
  @@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct 
  drm_i915_gem_object *obj, bool write)
   
  i915_gem_object_flush_cpu_write_domain(obj);
   
  +   if ((obj-base.read_domains  I915_GEM_DOMAIN_GTT) == 0)
  +   mb();
  +
 
 I think a comment here like we have one for all other gtt related memory
 barries would be good. Another thing is the flush_gtt_write_domain uses a
 wmb, whereas here we don't bother with micro-optimizing things. So I think
 it'd be good to just use a mb() for that, too, if just for consistency.
 
 Also, you know the grumpy maintainer drill: Could we exercise these
 barriers with a minimal i-g-t testcase, please? Since you've managed to
 kill your machine by removing them, they're no longer just there to keep
 us happy, hence I'd like to have them exercised ...
 
 Another thing that just crossed my mind: Could we lack a set of mb()s for
 cpu access on llc platforms? For non-coherent platforms the mb() in the
 clflush paths will do that, but on llc platforms I couldn't find anything.
 And that lp bugs seems to make an excellent case for them being required
 ...
 
 Cheers, Daniel
 -- 
 Daniel Vetter
 Software Engineer, Intel Corporation
 +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only insert the mb() before updating the fence parameter

2012-10-09 Thread Chris Wilson
On Tue, 9 Oct 2012 11:54:12 +0200, Daniel Vetter dan...@ffwll.ch wrote:
 On Tue, Oct 09, 2012 at 09:38:58AM +0100, Chris Wilson wrote:
  @@ -3244,6 +3254,9 @@ i915_gem_object_set_to_gtt_domain(struct 
  drm_i915_gem_object *obj, bool write)
   
  i915_gem_object_flush_cpu_write_domain(obj);
   
  +   if ((obj-base.read_domains  I915_GEM_DOMAIN_GTT) == 0)
  +   mb();
  +
 
 I think a comment here like we have one for all other gtt related memory
 barries would be good. Another thing is the flush_gtt_write_domain uses a
 wmb, whereas here we don't bother with micro-optimizing things. So I think
 it'd be good to just use a mb() for that, too, if just for consistency.

In fact, not only is that the wmb() the easiest to micro-optimise, it
is the only one that can be - I think. Around manipulating the fence, we
need a read/write barrier in case we have any pending accesses through
the fenced region, since the register write may be reordered passed the
memory reads since there is no obvious dependency. That might just be
heightened paranoia and our memory controller isn't that smart. Yet. So
those two need to be mb() so that I can sleep safely at night. For the
mb() inside set-to-gtt-domain, I don't have a robust explanation other
than that empirically we need a barrier, therefore there is some
lingering incoherency when reusing a bo. (The hangs always seem to occur
when crossing a page boundary, we see stale data.) You could attempt to
insert a read/write barrier depending upon actual usage, but it hardly
seems worth the effort.
 
 Also, you know the grumpy maintainer drill: Could we exercise these
 barriers with a minimal i-g-t testcase, please? Since you've managed to
 kill your machine by removing them, they're no longer just there to keep
 us happy, hence I'd like to have them exercised ...

Still hunting.
 
 Another thing that just crossed my mind: Could we lack a set of mb()s for
 cpu access on llc platforms? For non-coherent platforms the mb() in the
 clflush paths will do that, but on llc platforms I couldn't find anything.
 And that lp bugs seems to make an excellent case for them being required
 ...

Yes, with LLC we need to treat the GPU as another core and so put
similar SMP-esque memory barriers in place.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only insert the mb() before updating the fence parameter

2012-10-09 Thread Daniel Vetter
On Tue, Oct 9, 2012 at 1:03 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 In fact, not only is that the wmb() the easiest to micro-optimise, it
 is the only one that can be - I think. Around manipulating the fence, we
 need a read/write barrier in case we have any pending accesses through
 the fenced region, since the register write may be reordered passed the
 memory reads since there is no obvious dependency. That might just be
 heightened paranoia and our memory controller isn't that smart. Yet. So
 those two need to be mb() so that I can sleep safely at night. For the
 mb() inside set-to-gtt-domain, I don't have a robust explanation other
 than that empirically we need a barrier, therefore there is some
 lingering incoherency when reusing a bo. (The hangs always seem to occur
 when crossing a page boundary, we see stale data.) You could attempt to
 insert a read/write barrier depending upon actual usage, but it hardly
 seems worth the effort.

Hm, I think we can make a case that the barrier before the fence
change can only be a wmb(): Racing reads against fence changes is
ill-defined anyway, so we don't need the read barrier for that reason.
But we need to flush out any store buffers (especially the wc store
buffer) before changing the fencing. The mb() afterwards seems to be
required, since we need to sync both subsequent reads and writes
against the fence mmio write.

One thing I wonder is whether we miss any barrier between the wc
writes to the ringbuffer and the tail update. If that's the case I
wonder where all the bug reports are ...

Last one: Which machines blow up when you drop that mb()?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Only insert the mb() before updating the fence parameter

2012-10-09 Thread Chris Wilson
On Tue, 9 Oct 2012 13:14:09 +0200, Daniel Vetter dan...@ffwll.ch wrote:
 One thing I wonder is whether we miss any barrier between the wc
 writes to the ringbuffer and the tail update. If that's the case I
 wonder where all the bug reports are ...

Ditto. I've often wondered how we get away without a wmb() there...
 
 Last one: Which machines blow up when you drop that mb()?

pnv, though that's the only non-LLC I've been testing with the
incomplete patch so I can't say it is limited to that machine.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx