subject:"Re\: more intel drm issues \(was Re\: \[git pull\] drm intel only fixes\)"

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Jeff Chua

On Thu, Jan 20, 2011 at 10:39 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 Ok, so I have a new issue that I'm currently bisecting but that people
 may be able to figure out even befor emy bisect finishes.

 On my slow Atom netbook (that I'm planning on using as my traveling
 companion for LCA), suspend-to-RAM takes a long time with current git.
 It's quite noticeable - it used to be pretty much instant, now it
 takes three seconds. And it's all i915 graphics (although I haven't
 bisected it down fully, I've bisected it down to the drm merge).

 In the good case (like plain 2.6.37), a suspend event will look
 something like this:...
   PM: suspend of devices complete after 147.646 msecs
 but the i915 driver at some point made it take 3s:
  PM: suspend of devices complete after 3059.656 msecs
 which is definitely long enough to be worth fixing.

 Maybe the person responsible will go oh, that's obviously due to
 xyz, and just fix it. But I'll continue to bisect in case nobody
 steps up to admit to wasting time..

Rafael send out two patches earlier. Could be related. I was facing
issue during resume.

Attached are the two patches. You'll need to apply both.

Len has suggested before to try to booting with acpi_sleep=nonvs
which works as well for my case.

Thanks,
Jeff


patch-resume1
Description: Binary data


patch-resume2
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Chris Wilson

On Wed, 19 Jan 2011 22:22:48 -0800, Linus Torvalds 
torva...@linux-foundation.org wrote:
 On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote:
 
  Rafael send out two patches earlier. Could be related. I was facing
  issue during resume.
 
 No, I'm aware of the rcu-synchronize thing, this isn't it. This is
 really at the suspend stage, and I had bisected it down to the drm
 changes.
 
 In fact, by now I have bisected it down to a single commit. It's
 another merge commit, which makes me a bit nervous (I bisected another
 issue today, and it turned out to simply not be repeatable).
 
 But this time the merge commit actually has a real conflict that got
 fixed up in the merge, and the code around the conflict waits for
 three seconds, and three seconds is also exactly how long the delay at
 suspend time is. So I get the feeling that this time it's a real
 issue, and what happened was that the merge may have been a mismerge.
 
 Chris: as of commit 8d5203ca6253 (Merge branch 'drm-intel-fixes' into
 drm-intel-next) I'm getting that 3-second delay at suspend time. And
 the merge diff looks like this:
 
  +struct drm_device *dev = ring-dev;
  +struct drm_i915_private *dev_priv = dev-dev_private;
   unsigned long end;
  -drm_i915_private_t *dev_priv = dev-dev_private;
   u32 head;
 
 - head = intel_read_status_page(ring, 4);
 - if (head) {
 - ring-head = head  HEAD_ADDR;
 - ring-space = ring-head - (ring-tail + 8);
 - if (ring-space  0)
 - ring-space += ring-size;
 - if (ring-space = n)
 - return 0;
 - }
 -
   trace_i915_ring_wait_begin (dev);
   end = jiffies + 3 * HZ;
   do {
 
 and that whole do-loop with a 3-second timeout makes me *very*
 suspicious. It used to have (in _one_ of the parent branches) that
 code before it to return early if there was space in the ring, now it
 doesn't any more - and that merge co-incides with my suspend suddenly
 taking 3 seconds.
 
 The same check that is deleted does exist inside the loop too, but
 there it has some extra code it in (compare to actual_head and so
 on), so I wonder if the fast-case was somehow hiding this issue.

Right, the autoreported HEAD may have been already reset to 0 and so hit
the wraparound bug which caused it to exit early without actually
quiescing the ringbuffer.

Another possibility is that I added a 3s timeout waiting for a request if
IRQs were suspended:

commit b5ba177d8d71f011c23b1cabec99fdaddae65c4d
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Tue Dec 14 12:17:15 2010 +

drm/i915: Poll for seqno completion if IRQ is disabled

Both of those I think are symptoms of another problem, that perhaps during
suspend we are shutting down parts of the chip before idling?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Jeff Chua

On Thu, Jan 20, 2011 at 2:22 PM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote:

 Rafael send out two patches earlier. Could be related. I was facing
 issue during resume.

 No, I'm aware of the rcu-synchronize thing, this isn't it. This is
 really at the suspend stage, and I had bisected it down to the drm
 changes.

 In fact, by now I have bisected it down to a single commit. It's
 another merge commit, which makes me a bit nervous (I bisected another
 issue today, and it turned out to simply not be repeatable).

 But this time the merge commit actually has a real conflict that got
 fixed up in the merge, and the code around the conflict waits for
 three seconds, and three seconds is also exactly how long the delay at
 suspend time is. So I get the feeling that this time it's a real
 issue, and what happened was that the merge may have been a mismerge.

I did see that once during suspend. But as you mentioned,  3 seconds,
and it wasn't repeatable. It was at the first suspend right after
reboot.


Jeff
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Linus Torvalds

On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson ch...@chris-wilson.co.uk wrote:

 Right, the autoreported HEAD may have been already reset to 0 and so hit
 the wraparound bug which caused it to exit early without actually
 quiescing the ringbuffer.

Yeah, that would explain the issue.

 Another possibility is that I added a 3s timeout waiting for a request if
 IRQs were suspended:

No, if IRQ's are actually suspended here, then that codepath is
totally buggy and would blow up (msleep() doesn't work, and jiffies
wouldn't advance on UP). So that's not it.

 Both of those I think are symptoms of another problem, that perhaps during
 suspend we are shutting down parts of the chip before idling?

That could be, but looking at the code, one thing strikes me: the
_normal_ case (of just waiting for enough space in the ring buffer)
doesn't need to use the exact case, but the wait for ring buffer to
be totally empty does.

Which means that the use of the fast-but-inaccurate 'head' sounds
wrong for the wait for idle case.

So can you explain the difference between

   intel_read_status_page(ring, 4);

vs

   I915_READ_HEAD(ring);

because from looking at the code, I get the notion that
intel_read_status_page() may not be exact. But what happens if that
inexact value matches our cached ring-actual_head, so we never even
try to read the exact case? Does it _stay_ inexact for arbitrarily
long times? If so, we might wait for the ring to empty forever (well,
until the timeout - the behavior I see), even though the ring really
_is_ empty. No?

Also, isn't that head  ring-actual_head buggy? What about the
overflow case? Not that we care, because afaik, 'actual_head' is not
actually used anywhere, so it should be called 'pointless_head'?

That code looks suspiciously bogus.

Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Chris Wilson

On Thu, 20 Jan 2011 08:07:02 -0800, Linus Torvalds 
torva...@linux-foundation.org wrote:
 On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
 
  Right, the autoreported HEAD may have been already reset to 0 and so hit
  the wraparound bug which caused it to exit early without actually
  quiescing the ringbuffer.
 
 Yeah, that would explain the issue.
 
  Another possibility is that I added a 3s timeout waiting for a request if
  IRQs were suspended:
 
 No, if IRQ's are actually suspended here, then that codepath is
 totally buggy and would blow up (msleep() doesn't work, and jiffies
 wouldn't advance on UP). So that's not it.
 
  Both of those I think are symptoms of another problem, that perhaps during
  suspend we are shutting down parts of the chip before idling?
 
 That could be, but looking at the code, one thing strikes me: the
 _normal_ case (of just waiting for enough space in the ring buffer)
 doesn't need to use the exact case, but the wait for ring buffer to
 be totally empty does.
 
 Which means that the use of the fast-but-inaccurate 'head' sounds
 wrong for the wait for idle case.
 
 So can you explain the difference between
 
intel_read_status_page(ring, 4);
 
 vs
 
I915_READ_HEAD(ring);

For I915_READ_HEAD, we need to wake up the GT power well, perform an
uncached read from the register, and then power down. This takes on the
order of a 100 microseconds (less if the GT is already powered up, etc).

Instead a read from the status page is from cached memory. The caveat here
is that value is only updated by the gfx engine when its HEAD crosses
every 64k boundary. So quite rarely.

 because from looking at the code, I get the notion that
 intel_read_status_page() may not be exact. But what happens if that
 inexact value matches our cached ring-actual_head, so we never even
 try to read the exact case? Does it _stay_ inexact for arbitrarily
 long times? If so, we might wait for the ring to empty forever (well,
 until the timeout - the behavior I see), even though the ring really
 _is_ empty. No?

Ah. Your analysis is spot on and this will cause a hang whilst polling if
we enter the loop with the last known head the same as the reported value.
 
 Also, isn't that head  ring-actual_head buggy? What about the
 overflow case? Not that we care, because afaik, 'actual_head' is not
 actually used anywhere, so it should be called 'pointless_head'?

This is the one case that I think is handled correctly, ignoring all the
other bugs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-20 Thread Linus Torvalds

On Thu, Jan 20, 2011 at 9:51 AM, Linus Torvalds
torva...@linux-foundation.org wrote:

 So how about just doing this in the loop? It will mean that the
 _first_ read uses the fast cached one (the common case, hopefully),
 but then if we loop, we'll use the slow exact one.

 (cut-and-paste, so whitespace isn't good):

  diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c
 b/drivers/gpu/drm/i915/intel_ringbuffer.c
  index 03e3370..11bbfb5 100644
  --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
  +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
  @@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct
 intel_ring_buffer *ring, int n)
                  msleep(1);
                  if (atomic_read(dev_priv-mm.wedged))
                          return -EAGAIN;
  +               /* Force a re-read. FIXME: what if read_status_page
 says 0 too */
  +               ring-actual_head = 0;
          } while (!time_after(jiffies, end));
          trace_i915_ring_wait_end (dev);
          return -EBUSY;

This makes no difference. And the reason is exactly that we get the
zero case that I had in the comment.

But THIS attached patch actually seems to fix the slow suspend for me.
I removed the accesses to actual_head, because that whole field
seems to not be used.

So it seems like the intel_read_status_page() thing returns zero
forever when suspending. Maybe you can explain why.

 Linus
 drivers/gpu/drm/i915/intel_ringbuffer.c |5 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 03e3370..f6b9baa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -928,6 +928,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring)
 
 int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 {
+	int reread = 0;
 	struct drm_device *dev = ring-dev;
 	struct drm_i915_private *dev_priv = dev-dev_private;
 	unsigned long end;
@@ -940,9 +941,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 		 * fallback to the slow and accurate path.
 		 */
 		head = intel_read_status_page(ring, 4);
-		if (head  ring-actual_head)
+		if (reread)
 			head = I915_READ_HEAD(ring);
-		ring-actual_head = head;
 		ring-head = head  HEAD_ADDR;
 		ring-space = ring-head - (ring-tail + 8);
 		if (ring-space  0)
@@ -961,6 +961,7 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n)
 		msleep(1);
 		if (atomic_read(dev_priv-mm.wedged))
 			return -EAGAIN;
+		reread = 1;
 	} while (!time_after(jiffies, end));
 	trace_i915_ring_wait_end (dev);
 	return -EBUSY;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index be9087e..5b0abfa 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -47,7 +47,6 @@ struct  intel_ring_buffer {
 	struct		drm_device *dev;
 	struct		drm_i915_gem_object *obj;
 
-	u32		actual_head;
 	u32		head;
 	u32		tail;
 	int		space;
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

2011-01-19 Thread Linus Torvalds

On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote:

 Rafael send out two patches earlier. Could be related. I was facing
 issue during resume.

No, I'm aware of the rcu-synchronize thing, this isn't it. This is
really at the suspend stage, and I had bisected it down to the drm
changes.

In fact, by now I have bisected it down to a single commit. It's
another merge commit, which makes me a bit nervous (I bisected another
issue today, and it turned out to simply not be repeatable).

But this time the merge commit actually has a real conflict that got
fixed up in the merge, and the code around the conflict waits for
three seconds, and three seconds is also exactly how long the delay at
suspend time is. So I get the feeling that this time it's a real
issue, and what happened was that the merge may have been a mismerge.

Chris: as of commit 8d5203ca6253 (Merge branch 'drm-intel-fixes' into
drm-intel-next) I'm getting that 3-second delay at suspend time. And
the merge diff looks like this:

 +  struct drm_device *dev = ring-dev;
 +  struct drm_i915_private *dev_priv = dev-dev_private;
unsigned long end;
 -  drm_i915_private_t *dev_priv = dev-dev_private;
u32 head;

-   head = intel_read_status_page(ring, 4);
-   if (head) {
-   ring-head = head  HEAD_ADDR;
-   ring-space = ring-head - (ring-tail + 8);
-   if (ring-space  0)
-   ring-space += ring-size;
-   if (ring-space = n)
-   return 0;
-   }
-
trace_i915_ring_wait_begin (dev);
end = jiffies + 3 * HZ;
do {

and that whole do-loop with a 3-second timeout makes me *very*
suspicious. It used to have (in _one_ of the parent branches) that
code before it to return early if there was space in the ring, now it
doesn't any more - and that merge co-incides with my suspend suddenly
taking 3 seconds.

The same check that is deleted does exist inside the loop too, but
there it has some extra code it in (compare to actual_head and so
on), so I wonder if the fast-case was somehow hiding this issue.

But I don't know the code. I just see that whole PM: suspend of
devices complete after x.xxx msecs issue, and I can see the machine
taking too long to suspend.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

Re: more intel drm issues (was Re: [git pull] drm intel only fixes)

7 matches

Site Navigation

Mail list logo

Footer information