Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Thu, Jan 20, 2011 at 10:39 AM, Linus Torvalds torva...@linux-foundation.org wrote: Ok, so I have a new issue that I'm currently bisecting but that people may be able to figure out even befor emy bisect finishes. On my slow Atom netbook (that I'm planning on using as my traveling companion for LCA), suspend-to-RAM takes a long time with current git. It's quite noticeable - it used to be pretty much instant, now it takes three seconds. And it's all i915 graphics (although I haven't bisected it down fully, I've bisected it down to the drm merge). In the good case (like plain 2.6.37), a suspend event will look something like this:... PM: suspend of devices complete after 147.646 msecs but the i915 driver at some point made it take 3s: PM: suspend of devices complete after 3059.656 msecs which is definitely long enough to be worth fixing. Maybe the person responsible will go oh, that's obviously due to xyz, and just fix it. But I'll continue to bisect in case nobody steps up to admit to wasting time.. Rafael send out two patches earlier. Could be related. I was facing issue during resume. Attached are the two patches. You'll need to apply both. Len has suggested before to try to booting with acpi_sleep=nonvs which works as well for my case. Thanks, Jeff patch-resume1 Description: Binary data patch-resume2 Description: Binary data ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Wed, 19 Jan 2011 22:22:48 -0800, Linus Torvalds torva...@linux-foundation.org wrote: On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote: Rafael send out two patches earlier. Could be related. I was facing issue during resume. No, I'm aware of the rcu-synchronize thing, this isn't it. This is really at the suspend stage, and I had bisected it down to the drm changes. In fact, by now I have bisected it down to a single commit. It's another merge commit, which makes me a bit nervous (I bisected another issue today, and it turned out to simply not be repeatable). But this time the merge commit actually has a real conflict that got fixed up in the merge, and the code around the conflict waits for three seconds, and three seconds is also exactly how long the delay at suspend time is. So I get the feeling that this time it's a real issue, and what happened was that the merge may have been a mismerge. Chris: as of commit 8d5203ca6253 (Merge branch 'drm-intel-fixes' into drm-intel-next) I'm getting that 3-second delay at suspend time. And the merge diff looks like this: +struct drm_device *dev = ring-dev; +struct drm_i915_private *dev_priv = dev-dev_private; unsigned long end; -drm_i915_private_t *dev_priv = dev-dev_private; u32 head; - head = intel_read_status_page(ring, 4); - if (head) { - ring-head = head HEAD_ADDR; - ring-space = ring-head - (ring-tail + 8); - if (ring-space 0) - ring-space += ring-size; - if (ring-space = n) - return 0; - } - trace_i915_ring_wait_begin (dev); end = jiffies + 3 * HZ; do { and that whole do-loop with a 3-second timeout makes me *very* suspicious. It used to have (in _one_ of the parent branches) that code before it to return early if there was space in the ring, now it doesn't any more - and that merge co-incides with my suspend suddenly taking 3 seconds. The same check that is deleted does exist inside the loop too, but there it has some extra code it in (compare to actual_head and so on), so I wonder if the fast-case was somehow hiding this issue. Right, the autoreported HEAD may have been already reset to 0 and so hit the wraparound bug which caused it to exit early without actually quiescing the ringbuffer. Another possibility is that I added a 3s timeout waiting for a request if IRQs were suspended: commit b5ba177d8d71f011c23b1cabec99fdaddae65c4d Author: Chris Wilson ch...@chris-wilson.co.uk Date: Tue Dec 14 12:17:15 2010 + drm/i915: Poll for seqno completion if IRQ is disabled Both of those I think are symptoms of another problem, that perhaps during suspend we are shutting down parts of the chip before idling? -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Thu, Jan 20, 2011 at 2:22 PM, Linus Torvalds torva...@linux-foundation.org wrote: On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote: Rafael send out two patches earlier. Could be related. I was facing issue during resume. No, I'm aware of the rcu-synchronize thing, this isn't it. This is really at the suspend stage, and I had bisected it down to the drm changes. In fact, by now I have bisected it down to a single commit. It's another merge commit, which makes me a bit nervous (I bisected another issue today, and it turned out to simply not be repeatable). But this time the merge commit actually has a real conflict that got fixed up in the merge, and the code around the conflict waits for three seconds, and three seconds is also exactly how long the delay at suspend time is. So I get the feeling that this time it's a real issue, and what happened was that the merge may have been a mismerge. I did see that once during suspend. But as you mentioned, 3 seconds, and it wasn't repeatable. It was at the first suspend right after reboot. Jeff ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: Right, the autoreported HEAD may have been already reset to 0 and so hit the wraparound bug which caused it to exit early without actually quiescing the ringbuffer. Yeah, that would explain the issue. Another possibility is that I added a 3s timeout waiting for a request if IRQs were suspended: No, if IRQ's are actually suspended here, then that codepath is totally buggy and would blow up (msleep() doesn't work, and jiffies wouldn't advance on UP). So that's not it. Both of those I think are symptoms of another problem, that perhaps during suspend we are shutting down parts of the chip before idling? That could be, but looking at the code, one thing strikes me: the _normal_ case (of just waiting for enough space in the ring buffer) doesn't need to use the exact case, but the wait for ring buffer to be totally empty does. Which means that the use of the fast-but-inaccurate 'head' sounds wrong for the wait for idle case. So can you explain the difference between intel_read_status_page(ring, 4); vs I915_READ_HEAD(ring); because from looking at the code, I get the notion that intel_read_status_page() may not be exact. But what happens if that inexact value matches our cached ring-actual_head, so we never even try to read the exact case? Does it _stay_ inexact for arbitrarily long times? If so, we might wait for the ring to empty forever (well, until the timeout - the behavior I see), even though the ring really _is_ empty. No? Also, isn't that head ring-actual_head buggy? What about the overflow case? Not that we care, because afaik, 'actual_head' is not actually used anywhere, so it should be called 'pointless_head'? That code looks suspiciously bogus. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Thu, 20 Jan 2011 08:07:02 -0800, Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Jan 20, 2011 at 2:25 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: Right, the autoreported HEAD may have been already reset to 0 and so hit the wraparound bug which caused it to exit early without actually quiescing the ringbuffer. Yeah, that would explain the issue. Another possibility is that I added a 3s timeout waiting for a request if IRQs were suspended: No, if IRQ's are actually suspended here, then that codepath is totally buggy and would blow up (msleep() doesn't work, and jiffies wouldn't advance on UP). So that's not it. Both of those I think are symptoms of another problem, that perhaps during suspend we are shutting down parts of the chip before idling? That could be, but looking at the code, one thing strikes me: the _normal_ case (of just waiting for enough space in the ring buffer) doesn't need to use the exact case, but the wait for ring buffer to be totally empty does. Which means that the use of the fast-but-inaccurate 'head' sounds wrong for the wait for idle case. So can you explain the difference between intel_read_status_page(ring, 4); vs I915_READ_HEAD(ring); For I915_READ_HEAD, we need to wake up the GT power well, perform an uncached read from the register, and then power down. This takes on the order of a 100 microseconds (less if the GT is already powered up, etc). Instead a read from the status page is from cached memory. The caveat here is that value is only updated by the gfx engine when its HEAD crosses every 64k boundary. So quite rarely. because from looking at the code, I get the notion that intel_read_status_page() may not be exact. But what happens if that inexact value matches our cached ring-actual_head, so we never even try to read the exact case? Does it _stay_ inexact for arbitrarily long times? If so, we might wait for the ring to empty forever (well, until the timeout - the behavior I see), even though the ring really _is_ empty. No? Ah. Your analysis is spot on and this will cause a hang whilst polling if we enter the loop with the last known head the same as the reported value. Also, isn't that head ring-actual_head buggy? What about the overflow case? Not that we care, because afaik, 'actual_head' is not actually used anywhere, so it should be called 'pointless_head'? This is the one case that I think is handled correctly, ignoring all the other bugs. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Thu, Jan 20, 2011 at 9:51 AM, Linus Torvalds torva...@linux-foundation.org wrote: So how about just doing this in the loop? It will mean that the _first_ read uses the fast cached one (the common case, hopefully), but then if we loop, we'll use the slow exact one. (cut-and-paste, so whitespace isn't good): diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..11bbfb5 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -961,6 +961,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) msleep(1); if (atomic_read(dev_priv-mm.wedged)) return -EAGAIN; + /* Force a re-read. FIXME: what if read_status_page says 0 too */ + ring-actual_head = 0; } while (!time_after(jiffies, end)); trace_i915_ring_wait_end (dev); return -EBUSY; This makes no difference. And the reason is exactly that we get the zero case that I had in the comment. But THIS attached patch actually seems to fix the slow suspend for me. I removed the accesses to actual_head, because that whole field seems to not be used. So it seems like the intel_read_status_page() thing returns zero forever when suspending. Maybe you can explain why. Linus drivers/gpu/drm/i915/intel_ringbuffer.c |5 +++-- drivers/gpu/drm/i915/intel_ringbuffer.h |1 - 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 03e3370..f6b9baa 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -928,6 +928,7 @@ static int intel_wrap_ring_buffer(struct intel_ring_buffer *ring) int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) { + int reread = 0; struct drm_device *dev = ring-dev; struct drm_i915_private *dev_priv = dev-dev_private; unsigned long end; @@ -940,9 +941,8 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) * fallback to the slow and accurate path. */ head = intel_read_status_page(ring, 4); - if (head ring-actual_head) + if (reread) head = I915_READ_HEAD(ring); - ring-actual_head = head; ring-head = head HEAD_ADDR; ring-space = ring-head - (ring-tail + 8); if (ring-space 0) @@ -961,6 +961,7 @@ int intel_wait_ring_buffer(struct intel_ring_buffer *ring, int n) msleep(1); if (atomic_read(dev_priv-mm.wedged)) return -EAGAIN; + reread = 1; } while (!time_after(jiffies, end)); trace_i915_ring_wait_end (dev); return -EBUSY; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index be9087e..5b0abfa 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -47,7 +47,6 @@ struct intel_ring_buffer { struct drm_device *dev; struct drm_i915_gem_object *obj; - u32 actual_head; u32 head; u32 tail; int space; ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: more intel drm issues (was Re: [git pull] drm intel only fixes)
On Wed, Jan 19, 2011 at 8:55 PM, Jeff Chua jeff.chua.li...@gmail.com wrote: Rafael send out two patches earlier. Could be related. I was facing issue during resume. No, I'm aware of the rcu-synchronize thing, this isn't it. This is really at the suspend stage, and I had bisected it down to the drm changes. In fact, by now I have bisected it down to a single commit. It's another merge commit, which makes me a bit nervous (I bisected another issue today, and it turned out to simply not be repeatable). But this time the merge commit actually has a real conflict that got fixed up in the merge, and the code around the conflict waits for three seconds, and three seconds is also exactly how long the delay at suspend time is. So I get the feeling that this time it's a real issue, and what happened was that the merge may have been a mismerge. Chris: as of commit 8d5203ca6253 (Merge branch 'drm-intel-fixes' into drm-intel-next) I'm getting that 3-second delay at suspend time. And the merge diff looks like this: + struct drm_device *dev = ring-dev; + struct drm_i915_private *dev_priv = dev-dev_private; unsigned long end; - drm_i915_private_t *dev_priv = dev-dev_private; u32 head; - head = intel_read_status_page(ring, 4); - if (head) { - ring-head = head HEAD_ADDR; - ring-space = ring-head - (ring-tail + 8); - if (ring-space 0) - ring-space += ring-size; - if (ring-space = n) - return 0; - } - trace_i915_ring_wait_begin (dev); end = jiffies + 3 * HZ; do { and that whole do-loop with a 3-second timeout makes me *very* suspicious. It used to have (in _one_ of the parent branches) that code before it to return early if there was space in the ring, now it doesn't any more - and that merge co-incides with my suspend suddenly taking 3 seconds. The same check that is deleted does exist inside the loop too, but there it has some extra code it in (compare to actual_head and so on), so I wonder if the fast-case was somehow hiding this issue. But I don't know the code. I just see that whole PM: suspend of devices complete after x.xxx msecs issue, and I can see the machine taking too long to suspend. Linus ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel