Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Jeff Chua
On Thu, Jan 20, 2011 at 10:39 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 Ok, so I have a new issue that I'm currently bisecting but that people
 may be able to figure out even befor emy bisect finishes.

 On my slow Atom netbook (that I'm planning on using as my traveling
 companion for LCA), suspend-to-RAM takes a long time with current git.
 It's quite noticeable - it used to be pretty much instant, now it
 takes three seconds. And it's all i915 graphics (although I haven't
 bisected it down fully, I've bisected it down to the drm merge).

 In the good case (like plain 2.6.37), a suspend event will look
 something like this:...
   PM: suspend of devices complete after 147.646 msecs
 but the i915 driver at some point made it take 3s:
  PM: suspend of devices complete after 3059.656 msecs
 which is definitely long enough to be worth fixing.

 Maybe the person responsible will go oh, that's obviously due to
 xyz, and just fix it. But I'll continue to bisect in case nobody
 steps up to admit to wasting time..

Rafael send out two patches earlier. Could be related. I was facing
issue during resume.

Attached are the two patches. You'll need to apply both.

Len has suggested before to try to booting with acpi_sleep=nonvs
which works as well for my case.

Thanks,
Jeff


patch-resume1
Description: Binary data


patch-resume2
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Chris Wilson
On Wed, 19 Jan 2011 22:22:48 -0800, Linus Torvalds 
torva...@linux-foundation.org wrote:
 On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote:
 
  Rafael send out two patches earlier. Could be related. I was facing
  issue during resume.
 
 No, I'm aware of the rcu-synchronize thing, this isn't it. This is
 really at the suspend stage, and I had bisected it down to the drm
 changes.
 
 In fact, by now I have bisected it down to a single commit. It's
 another merge commit, which makes me a bit nervous (I bisected another
 issue today, and it turned out to simply not be repeatable).
 
 But this time the merge commit actually has a real conflict that got
 fixed up in the merge, and the code around the conflict waits for
 three seconds, and three seconds is also exactly how long the delay at
 suspend time is. So I get the feeling that this time it's a real
 issue, and what happened was that the merge may have been a mismerge.
 
 Chris: as of commit 8d5203ca6253 (Merge branch 'drm-intel-fixes' into
 drm-intel-next) I'm getting that 3-second delay at suspend time. And
 the merge diff looks like this:
 
  +struct drm_device *dev = ring-dev;
  +struct drm_i915_private *dev_priv = dev-dev_private;
   unsigned long end;
  -drm_i915_private_t *dev_priv = dev-dev_private;
   u32 head;
 
 - head = intel_read_status_page(ring, 4);
 - if (head) {
 - ring-head = head  HEAD_ADDR;
 - ring-space = ring-head - (ring-tail + 8);
 - if (ring-space  0)
 - ring-space += ring-size;
 - if (ring-space = n)
 - return 0;
 - }
 -
   trace_i915_ring_wait_begin (dev);
   end = jiffies + 3 * HZ;
   do {
 
 and that whole do-loop with a 3-second timeout makes me *very*
 suspicious. It used to have (in _one_ of the parent branches) that
 code before it to return early if there was space in the ring, now it
 doesn't any more - and that merge co-incides with my suspend suddenly
 taking 3 seconds.
 
 The same check that is deleted does exist inside the loop too, but
 there it has some extra code it in (compare to actual_head and so
 on), so I wonder if the fast-case was somehow hiding this issue.

Right, the autoreported HEAD may have been already reset to 0 and so hit
the wraparound bug which caused it to exit early without actually
quiescing the ringbuffer.

Another possibility is that I added a 3s timeout waiting for a request if
IRQs were suspended:

commit b5ba177d8d71f011c23b1cabec99fdaddae65c4d
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Tue Dec 14 12:17:15 2010 +

drm/i915: Poll for seqno completion if IRQ is disabled

Both of those I think are symptoms of another problem, that perhaps during
suspend we are shutting down parts of the chip before idling?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Jeff Chua
On Thu, Jan 20, 2011 at 2:22 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote:

 Rafael send out two patches earlier. Could be related. I was facing
 issue during resume.

 No, I'm aware of the rcu-synchronize thing, this isn't it. This is
 really at the suspend stage, and I had bisected it down to the drm
 changes.

 In fact, by now I have bisected it down to a single commit. It's
 another merge commit, which makes me a bit nervous (I bisected another
 issue today, and it turned out to simply not be repeatable).

 But this time the merge commit actually has a real conflict that got
 fixed up in the merge, and the code around the conflict waits for
 three seconds, and three seconds is also exactly how long the delay at
 suspend time is. So I get the feeling that this time it's a real
 issue, and what happened was that the merge may have been a mismerge.

I did see that once during suspend. But as you mentioned,  3 seconds,
and it wasn't repeatable. It was at the first suspend right after
reboot.


Jeff
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Linus Torvalds
On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:

 Right, the autoreported HEAD may have been already reset to 0 and so hit
 the wraparound bug which caused it to exit early without actually
 quiescing the ringbuffer.

Yeah, that would explain the issue.

 Another possibility is that I added a 3s timeout waiting for a request if
 IRQs were suspended:

No, if IRQ's are actually suspended here, then that codepath is
totally buggy and would blow up (msleep() doesn't work, and jiffies
wouldn't advance on UP). So that's not it.

 Both of those I think are symptoms of another problem, that perhaps during
 suspend we are shutting down parts of the chip before idling?

That could be, but looking at the code, one thing strikes me: the
_normal_ case (of just waiting for enough space in the ring buffer)
doesn't need to use the exact case, but the wait for ring buffer to
be totally empty does.

Which means that the use of the fast-but-inaccurate 'head' sounds
wrong for the wait for idle case.

So can you explain the difference between

   intel_read_status_page(ring, 4);

vs

   I915_READ_HEAD(ring);

because from looking at the code, I get the notion that
intel_read_status_page() may not be exact. But what happens if that
inexact value matches our cached ring-actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty. No?

Also, isn't that head  ring-actual_head buggy? What about the
overflow case? Not that we care, because afaik, 'actual_head' is not
actually used anywhere, so it should be called 'pointless_head'?

That code looks suspiciously bogus.

Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Chris Wilson
On Thu, 20 Jan 2011 08:07:02 -0800, Linus Torvalds 
torva...@linux-foundation.org wrote:
 On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
 
  Right, the autoreported HEAD may have been already reset to 0 and so hit
  the wraparound bug which caused it to exit early without actually
  quiescing the ringbuffer.
 
 Yeah, that would explain the issue.
 
  Another possibility is that I added a 3s timeout waiting for a request if
  IRQs were suspended:
 
 No, if IRQ's are actually suspended here, then that codepath is
 totally buggy and would blow up (msleep() doesn't work, and jiffies
 wouldn't advance on UP). So that's not it.
 
  Both of those I think are symptoms of another problem, that perhaps during
  suspend we are shutting down parts of the chip before idling?
 
 That could be, but looking at the code, one thing strikes me: the
 _normal_ case (of just waiting for enough space in the ring buffer)
 doesn't need to use the exact case, but the wait for ring buffer to
 be totally empty does.
 
 Which means that the use of the fast-but-inaccurate 'head' sounds
 wrong for the wait for idle case.
 
 So can you explain the difference between
 
intel_read_status_page(ring, 4);
 
 vs
 
I915_READ_HEAD(ring);

For I915_READ_HEAD, we need to wake up the GT power well, perform an
uncached read from the register, and then power down. This takes on the
order of a 100 microseconds (less if the GT is already powered up, etc).

Instead a read from the status page is from cached memory. The caveat here
is that value is only updated by the gfx engine when its HEAD crosses
every 64k boundary. So quite rarely.

 because from looking at the code, I get the notion that
 intel_read_status_page() may not be exact. But what happens if that
 inexact value matches our cached ring-actual_head, so we never even
 try to read the exact case? Does it _stay_ inexact for arbitrarily
 long times? If so, we might wait for the ring to empty forever (well,
 until the timeout - the behavior I see), even though the ring really
 _is_ empty. No?

Ah. Your analysis is spot on and this will cause a hang whilst polling if
we enter the loop with the last known head the same as the reported value.
 
 Also, isn't that head  ring-actual_head buggy? What about the
 overflow case? Not that we care, because afaik, 'actual_head' is not
 actually used anywhere, so it should be called 'pointless_head'?

This is the one case that I think is handled correctly, ignoring all the
other bugs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Linus Torvalds
On Thu, Jan 20, 2011 at 9:51 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 So how about just doing this in the loop? It will mean that the
 _first_ read uses the fast cached one (the common case, hopefully),
 but then if we loop, we'll use the slow exact one.

 (cut-and-paste, so whitespace isn't good):

  diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
  index 03e3370..11bbfb5 100644
  --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
  +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
  @@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct
 intel_ring_buffer *ring, int n)
                  msleep(1);
                  if (atomic_read(dev_priv-mm.wedged))
                          return -EAGAIN;
  +               /* Force a re-read. FIXME: what if read_status_page
 says 0 too */
  +               ring-actual_head = 0;
          } while (!time_after(jiffies, end));
          trace_i915_ring_wait_end (dev);
          return -EBUSY;

This makes no difference. And the reason is exactly that we get the
zero case that I had in the comment.

But THIS attached patch actually seems to fix the slow suspend for me.
I removed the accesses to actual_head, because that whole field
seems to not be used.

So it seems like the intel_read_status_page() thing returns zero
forever when suspending. Maybe you can explain why.

 Linus
 drivers/gpu/drm/i915/intel_ringbuffer.c |5 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 03e3370..f6b9baa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -928,6 +928,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
 
 int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 {
+	int reread = 0;
 	struct drm_device *dev = ring-dev;
 	struct drm_i915_private *dev_priv = dev-dev_private;
 	unsigned long end;
@@ -940,9 +941,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 		 * fallback to the slow and accurate path.
 		 */
 		head = intel_read_status_page(ring, 4);
-		if (head  ring-actual_head)
+		if (reread)
 			head = I915_READ_HEAD(ring);
-		ring-actual_head = head;
 		ring-head = head  HEAD_ADDR;
 		ring-space = ring-head - (ring-tail + 8);
 		if (ring-space  0)
@@ -961,6 +961,7 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 		msleep(1);
 		if (atomic_read(dev_priv-mm.wedged))
 			return -EAGAIN;
+		reread = 1;
 	} while (!time_after(jiffies, end));
 	trace_i915_ring_wait_end (dev);
 	return -EBUSY;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index be9087e..5b0abfa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -47,7 +47,6 @@ struct  intel_ring_buffer {
 	struct		drm_device *dev;
 	struct		drm_i915_gem_object *obj;
 
-	u32		actual_head;
 	u32		head;
 	u32		tail;
 	int		space;
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-19 Thread Linus Torvalds
Ok, so I have a new issue that I'm currently bisecting but that people
may be able to figure out even befor emy bisect finishes.

On my slow Atom netbook (that I'm planning on using as my traveling
companion for LCA), suspend-to-RAM takes a long time with current git.
It's quite noticeable - it used to be pretty much instant, now it
takes three seconds. And it's all i915 graphics (although I haven't
bisected it down fully, I've bisected it down to the drm merge).

In the good case (like plain 2.6.37), a suspend event will look
something like this:

   ...
   PM: suspend of devices complete after 147.646 msecs
   ...

but the i915 driver at some point made it take 3s:

  ...
  PM: suspend of devices complete after 3059.656 msecs
  ...

which is definitely long enough to be worth fixing.

Maybe the person responsible will go oh, that's obviously due to
xyz, and just fix it. But I'll continue to bisect in case nobody
steps up to admit to wasting time..

Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-19 Thread Linus Torvalds
On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote:

 Rafael send out two patches earlier. Could be related. I was facing
 issue during resume.

No, I'm aware of the rcu-synchronize thing, this isn't it. This is
really at the suspend stage, and I had bisected it down to the drm
changes.

In fact, by now I have bisected it down to a single commit. It's
another merge commit, which makes me a bit nervous (I bisected another
issue today, and it turned out to simply not be repeatable).

But this time the merge commit actually has a real conflict that got
fixed up in the merge, and the code around the conflict waits for
three seconds, and three seconds is also exactly how long the delay at
suspend time is. So I get the feeling that this time it's a real
issue, and what happened was that the merge may have been a mismerge.

Chris: as of commit 8d5203ca6253 (Merge branch 'drm-intel-fixes' into
drm-intel-next) I'm getting that 3-second delay at suspend time. And
the merge diff looks like this:

 +  struct drm_device *dev = ring-dev;
 +  struct drm_i915_private *dev_priv = dev-dev_private;
unsigned long end;
 -  drm_i915_private_t *dev_priv = dev-dev_private;
u32 head;

-   head = intel_read_status_page(ring, 4);
-   if (head) {
-   ring-head = head  HEAD_ADDR;
-   ring-space = ring-head - (ring-tail + 8);
-   if (ring-space  0)
-   ring-space += ring-size;
-   if (ring-space = n)
-   return 0;
-   }
-
trace_i915_ring_wait_begin (dev);
end = jiffies + 3 * HZ;
do {

and that whole do-loop with a 3-second timeout makes me *very*
suspicious. It used to have (in _one_ of the parent branches) that
code before it to return early if there was space in the ring, now it
doesn't any more - and that merge co-incides with my suspend suddenly
taking 3 seconds.

The same check that is deleted does exist inside the loop too, but
there it has some extra code it in (compare to actual_head and so
on), so I wonder if the fast-case was somehow hiding this issue.

But I don't know the code. I just see that whole PM: suspend of
devices complete after x.xxx msecs issue, and I can see the machine
taking too long to suspend.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Linus Torvalds
On Tue, Jan 11, 2011 at 10:03 PM, Dave Airlie airl...@linux.ie wrote:

 I'm stuck at home with just my i5 laptop due to the office being shut due
 to the ongoing floods. But I've booted and ran this for a few hours and it
 seems to be better than the current tree. It contains a couple of patches
 to fix DMAR interaction issues I see on this laptop on top of Chris's
 pull.

Hmm. I'm not seeing the screensaver issue any more, but there's
something wrong with video. At least the TED ones (I'm not seeing it
on a youtube video i tried). See for example

  
http://www.ted.com/talks/lang/eng/david_gallo_shows_underwater_astonishments.html

and when there is fast movement in the video (like when the octopus is
spooked), I get these odd lines of noise.

In fact, while I noticed the lines in the video itself, it's actually
most repeatably noticeable in the buttons underneath while the video
is playing: make your mouse go back-and-forth between the rate and
share buttons, and they get corrupted (and it also corrupts the
progress bar).

It looks a bit like the noise you get with insufficient memory
bandwidth, but I doubt that's the case here. Perhaps just some
motion-comp problem?

Any ideas?

  Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Linus Torvalds
On Wed, Jan 12, 2011 at 11:46 AM, Jesse Barnes jbar...@virtuousgeek.org wrote:

 Since I doubt we're actually offloading to our video decode kernels for
 Flash video on your machine

It's the latest 64-bit beta flash player, so maybe it does use hw acceleration.


   it could very well be a memory bw issue.
 Can you try this small patch to see if one of the low power watermarks
 is giving you trouble (note: cut  pasted)?

No difference.

 It could also be the normal power watermarks though too; you could just
 make plane-wm and cursor_wm higher to test that.

I multiplied them by two, no difference. The patch I used attached.

Does nobody else see this?

   Linus
 drivers/gpu/drm/i915/intel_display.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 25d9688..2ea1a51 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3445,6 +3445,7 @@ static bool ironlake_compute_wm0(struct drm_device *dev,
 		entries += tlb_miss;
 	entries = DIV_ROUND_UP(entries, display-cacheline_size);
 	*plane_wm = entries + display-guard_size;
+*plane_wm *=2;
 	if (*plane_wm  (int)display-max_wm)
 		*plane_wm = display-max_wm;
 
@@ -3457,6 +3458,7 @@ static bool ironlake_compute_wm0(struct drm_device *dev,
 		entries += tlb_miss;
 	entries = DIV_ROUND_UP(entries, cursor-cacheline_size);
 	*cursor_wm = entries + cursor-guard_size;
+*cursor_wm *= 2;
 	if (*cursor_wm  (int)cursor-max_wm)
 		*cursor_wm = (int)cursor-max_wm;
 
@@ -3607,6 +3609,8 @@ static void ironlake_update_wm(struct drm_device *dev,
 	if (enabled != 1)
 		return;
 
+return;
+
 	clock = planea_clock ? planea_clock : planeb_clock;
 
 	/* WM1 */
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Linus Torvalds
On Wed, Jan 12, 2011 at 12:27 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Wed, Jan 12, 2011 at 11:46 AM, Jesse Barnes jbar...@virtuousgeek.org 
 wrote:

 Since I doubt we're actually offloading to our video decode kernels for
 Flash video on your machine

 It's the latest 64-bit beta flash player, so maybe it does use hw 
 acceleration.


                       it could very well be a memory bw issue.
 Can you try this small patch to see if one of the low power watermarks
 is giving you trouble (note: cut  pasted)?

 No difference.

Oh, and I'm also seeing corruption on my sandybridge machine. No video
involved, the gdm login screen is already corrupted this way. Similar
odd shifted lines etc, so I'd assume it's related.

Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Linus Torvalds
On Wed, Jan 12, 2011 at 1:28 PM, Linus Torvalds
torva...@linux-foundation.org wrote:

 Oh, and I'm also seeing corruption on my sandybridge machine. No video
 involved, the gdm login screen is already corrupted this way. Similar
 odd shifted lines etc, so I'd assume it's related.

Hmm. I bisected it down to

  commit 6fe4f14044f181e146cdc15485428f95fa541ce8
  Author: Chris Wilson ch...@chris-wilson.co.uk
  Date:   Mon Jan 10 17:35:37 2011 +

  drm/i915/execbuffer: Reorder binding of objects to favour restrictions

on my sandybridge machine.  Chris?

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Linus Torvalds
On Wed, Jan 12, 2011 at 2:22 PM, Jesse Barnes jbar...@virtuousgeek.org wrote:

 Ah, ok.  So it could be our internal FDI link is underrunning; it goes
 between the CPU and PCH and carries display bits.

I'm not sure it's an underrun or anything like that: the corruption is
long-term in the non-video case. So I take back the looks like memory
bandwidth problems, because it really looks more like a corrupted
blit operation there.

 Are these both desktop type machines with DVI attached monitors?

DVI on the Core i5, plain analog VGA on the sandybridge one (I can
hear you asking Why?. Because the silly intel motherboard doesn't
_have_ DVI out, and I didn't have a hdmi cable)

 If it's an FDI or transcoder problem, something like the below may give
 us more info.

See above. It's long-term, it was just the video behavior that made me
originally think it was temporary.

 Can you take a picture of the corruption?

Will do. I'll have to reboot to the broken kernel (my bisection ended
in a non-broken case)

  Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Chris Wilson
On Wed, 12 Jan 2011 14:24:17 -0800, Linus Torvalds 
torva...@linux-foundation.org wrote:
 On Wed, Jan 12, 2011 at 1:28 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  Oh, and I'm also seeing corruption on my sandybridge machine. No video
  involved, the gdm login screen is already corrupted this way. Similar
  odd shifted lines etc, so I'd assume it's related.
 
 Hmm. I bisected it down to
 
   commit 6fe4f14044f181e146cdc15485428f95fa541ce8
   Author: Chris Wilson ch...@chris-wilson.co.uk
   Date:   Mon Jan 10 17:35:37 2011 +
 
   drm/i915/execbuffer: Reorder binding of objects to favour restrictions
 
 on my sandybridge machine.  Chris?

Wow. That should have had zero visible impact upon the rendering. All it
should have done is reorder the sequence in which we pin the buffers into
the GTT before applying the relocations, just to allow some pathological
execbuffers.

Just the SNB machine?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Jesse Barnes
On Wed, 12 Jan 2011 14:31:33 -0800
Linus Torvalds torva...@linux-foundation.org wrote:

 On Wed, Jan 12, 2011 at 2:22 PM, Jesse Barnes jbar...@virtuousgeek.org 
 wrote:
 
  Ah, ok.  So it could be our internal FDI link is underrunning; it goes
  between the CPU and PCH and carries display bits.
 
 I'm not sure it's an underrun or anything like that: the corruption is
 long-term in the non-video case. So I take back the looks like memory
 bandwidth problems, because it really looks more like a corrupted
 blit operation there.

Ah ok if it's long running then yeah it's more likely to be a rendering
issue.  It could also be the FDI link getting its timings messed up
though, and consistently delivering the wrong bits; that could show up
in the same place on the screen each time, or it might move in a
pattern across the screen (usually from top to bottom).

 Will do. I'll have to reboot to the broken kernel (my bisection ended
 in a non-broken case)

Great, thanks.

-- 
Jesse Barnes, Intel Open Source Technology Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm intel only fixes

2011-01-12 Thread Chris Wilson
On Wed, 12 Jan 2011 15:05:36 -0800, Linus Torvalds 
torva...@linux-foundation.org wrote:
 On Wed, Jan 12, 2011 at 2:40 PM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
  Just the SNB machine?
 
 No. I just checked. Reverting that commit on my other machine makes
 that TED video on my Core i5 machine look fine too.
 
 So it's definitely the same bug on both Sandybridge and Core-i5 (I
 guess that's Ironlake in the crazy intel codename naming), just two
 slightly different symptoms. And I worried a bit that my bisect was
 bogus, but with the revert clearing it up on the other machine, I'm
 confident the bisect was good too.
 
 On my sandybridge machine, the corruption happens already at the gdm
 login screen, which is why I used that one to bisect things. I'm
 including a (bad) photo taken with my cellphone of what the corruption
 looks like - see how the sandybridge.linux-foundation.org machine
 name text has been corrupted, and obviously my name (and the e in
 Other). And that blue rounded rectangle should contain Log in as
 torvalds or something like that, but instead it's clear.

Yes, that looks consistent with using the wrong relocation entry or GTT
offset within the batch.

Thanks,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel