Re: [git pull] drm merge for 3.9-rc1
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote: > On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: > > > > Highlights: > > > > i915: all over the map, haswell power well enhancements, valleyview macro > > horrors cleaned up, killing lots of legacy GTT > > code, > > Lowlight: > > There's something wrong with i915 DP detection or whatever. I get > stuff like this: > > [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > . > [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > > and after that the screen ends up black. > > It's happened twice now, but is not 100% repeatable. It looks like the > message itself is new, but the black screen is also new and does seem > to happen when I get the message, so... > > The second time I touched the power button, and the machine came back. > Apparently the suspend/resume cycle made it all magically work: the > suspend caused the same errors, but then the resume made it all good > again. > > Some kind of missed initialization at bootup? It's not reliable enough > to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915: > irq-drive the dp aux communication") since that is where the message > was added.. > > Btw, looking at that commit, what do you think the semantics of the > timeout in something like > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); > > would be? What's that magic "10"? It's some totally random number. > > Guys, it should be something meaningful. If you meant a tenth of a > second, use HZ/10 or something. Because just the plain "10" is crazy. > I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a > hundreth of a second. Was that what you intended? Because if it was, > it is still crap, since CONFIG_HZ might be 100, and then you're > waiting for ten times longer. > > IOW, passing in a random number like that is crazy. It cannot possibly > be right. > > I have no idea whether the timeout has anything to do with anything, > but it reinforces my suspicion that there is something wrong with that > commit. Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies vs. msec confusion. And the other to plug a race in our irq handler which did lead to missed dp aux interrupts according to some digging done by Imre. The important patch is the current tip of git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes 44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south interrupts when handling them Just in case you want to give it a quick whirl. Since the failed dp aux transaction caused the resume modeset to fail for you (resulting in the black screen) I hope that this should fix both issues. I'll forward the pull to Dave in a few days since atm I'm stalling a bit for confirmation on another little regression fix. And there's nothing earth-shattering in my -fixes queue right now. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer wrote: > On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer wrote: >> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer wrote: >>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher >>> wrote: On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer wrote: > On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher > wrote: >> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit > > So I don't think that's actually the cause of the problem. Or at > least > not that alone. I reverted it on top of Linus' latest tree and I > still > get the lockups. Actually, git bisect does seem to have gotten it correct. Once I actually tested the revert of just that on top of Linus' tree (commit d895cb1af1), things seem to be working much better. I've rebooted a dozen times without a lockup. The most I've seen it take on a kernel with that commit included is 3 reboots, so that's definitely at least an improvement. >>> >>> I give up. GPU issues are not my thing. 2 reboots after I sent that it >>> gave me pretty rainbow static again. So it might have been an >>> improvement, but revert it is not a solution. >>> >>> Looking at there rest of the commits, the whole GPU rework might be >>> suspect, but I clearly have no clue. >> >> GPUs are tricky beasts :) > > Understatement ;). > >> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the >> problem anyway since it only affects 6xx/7xx and your card is handled >> by the evergreen code. I'll put together some patches to help narrow >> down the problem. > > Yeah, that's the biggest problem I have, not knowing which functions are > actually being executed for this card. It looks like a combination of > stuff in evergreen.c and ni.c, but I have no idea. > > Patches would be great. If nothing else, I'm really good at building > kernels and rebooting by now. Two possible fixes attached. The first attempts a full reset of all blocks if the MC (memory controller) is hung. That may work better than just resetting the MC. The second just disables MC reset. I'm not sure we can reliably tell if it's busy due to display requests hitting the MC periodically which would lead to needlessly resetting it possibly leading to failures like you are seeing. >>> >>> OK. I'll test them individually. It will probably take a bit because >>> I'll want to do numerous reboots if things seem "fixed" with one or the >>> other. >>> >>> I'll let you know how things go. >> >> I applied each individually on top of Linus' tree as of this morning >> (commit 2a7d2b96d5) built, installed, and tested. >> >> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in >> two reboots. >> >> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone >> 21 reboots without a hang/rainbow static. You'll understand if I'm >> hesitant to declare success, but resetting the MC does indeed appear to >> be the issue. I'll keep rebooting for a while to make sure. > > OK, I'm still running on the kernel with that patch and things still > work. The only other "issue" I'm seeing at the moment is my dmesg is > full of: > > [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing. > [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing. > [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing. > [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing. > [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing. > [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing. > I'll make those debug only when the patch goes upstream. > So hopefully your patch is on the way into Linus' tree at some point > soon. It'll be in my next -fixes pull. Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer wrote: > On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer wrote: >> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher wrote: >>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer wrote: On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher wrote: > ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit So I don't think that's actually the cause of the problem. Or at least not that alone. I reverted it on top of Linus' latest tree and I still get the lockups. >>> >>> Actually, git bisect does seem to have gotten it correct. Once I >>> actually tested the revert of just that on top of Linus' tree (commit >>> d895cb1af1), things seem to be working much better. I've rebooted a >>> dozen times without a lockup. The most I've seen it take on a kernel >>> with that commit included is 3 reboots, so that's definitely at least an >>> improvement. >> >> I give up. GPU issues are not my thing. 2 reboots after I sent that it >> gave me pretty rainbow static again. So it might have been an >> improvement, but revert it is not a solution. >> >> Looking at there rest of the commits, the whole GPU rework might be >> suspect, but I clearly have no clue. > > GPUs are tricky beasts :) Understatement ;). > ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the > problem anyway since it only affects 6xx/7xx and your card is handled > by the evergreen code. I'll put together some patches to help narrow > down the problem. Yeah, that's the biggest problem I have, not knowing which functions are actually being executed for this card. It looks like a combination of stuff in evergreen.c and ni.c, but I have no idea. Patches would be great. If nothing else, I'm really good at building kernels and rebooting by now. >>> >>> Two possible fixes attached. The first attempts a full reset of all >>> blocks if the MC (memory controller) is hung. That may work better >>> than just resetting the MC. The second just disables MC reset. I'm >>> not sure we can reliably tell if it's busy due to display requests >>> hitting the MC periodically which would lead to needlessly resetting >>> it possibly leading to failures like you are seeing. >> >> OK. I'll test them individually. It will probably take a bit because >> I'll want to do numerous reboots if things seem "fixed" with one or the >> other. >> >> I'll let you know how things go. > > I applied each individually on top of Linus' tree as of this morning > (commit 2a7d2b96d5) built, installed, and tested. > > 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in > two reboots. > > 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone > 21 reboots without a hang/rainbow static. You'll understand if I'm > hesitant to declare success, but resetting the MC does indeed appear to > be the issue. I'll keep rebooting for a while to make sure. OK, I'm still running on the kernel with that patch and things still work. The only other "issue" I'm seeing at the moment is my dmesg is full of: [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing. [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing. [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing. [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing. [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing. [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing. So hopefully your patch is on the way into Linus' tree at some point soon. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds wrote: > On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: >> >> Highlights: >> >> i915: all over the map, haswell power well enhancements, valleyview macro >> horrors cleaned up, killing lots of legacy GTT >> code, > > Lowlight: > > There's something wrong with i915 DP detection or whatever. I get > stuff like this: > > [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > . > [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f I have the same messages after upgrading up to b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50 But in my case when I reboot computer the second monitor, that plugged via HDMI, didn't works, end when I run `xrandr`, I have next messages in kern.log Mar 3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status 0xa143003f Mar 3 18:09:15 home-spb kernel: [12321.771715] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! Mar 3 18:09:15 home-spb kernel: [12321.782712] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! Mar 3 18:09:15 home-spb kernel: [12321.793715] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! Mar 3 18:09:15 home-spb kernel: [12321.804719] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! Mar 3 18:09:15 home-spb kernel: [12321.815725] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! Mar 3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status 0xa143003f # lspci | fgrep -i graph 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) I tested some commits, and here the results: - Breaked at v3.8-10206-gb0af9cd - Works normal v3.8-rc3-139-g34f2be4 - Works normal v3.8-rc3-188-g10aa17c - Works normal 6dc1c49 I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it works for me. Thank, Dave. > > and after that the screen ends up black. > > It's happened twice now, but is not 100% repeatable. It looks like the > message itself is new, but the black screen is also new and does seem > to happen when I get the message, so... > > The second time I touched the power button, and the machine came back. > Apparently the suspend/resume cycle made it all magically work: the > suspend caused the same errors, but then the resume made it all good > again. > > Some kind of missed initialization at bootup? It's not reliable enough > to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915: > irq-drive the dp aux communication") since that is where the message > was added.. > > Btw, looking at that commit, what do you think the semantics of the > timeout in something like > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); > > would be? What's that magic "10"? It's some totally random number. > > Guys, it should be something meaningful. If you meant a tenth of a > second, use HZ/10 or something. Because just the plain "10" is crazy. > I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a > hundreth of a second. Was that what you intended? Because if it was, > it is still crap, since CONFIG_HZ might be 100, and then you're > waiting for ten times longer. > > IOW, passing in a random number like that is crazy. It cannot possibly > be right. > > I have no idea whether the timeout has anything to do with anything, > but it reinforces my suspicion that there is something wrong with that > commit. > >Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Respectfully Azat Khuzhin Primary email a3at.m...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer wrote: > On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher wrote: >> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer wrote: >>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher wrote: ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit >>> >>> So I don't think that's actually the cause of the problem. Or at least >>> not that alone. I reverted it on top of Linus' latest tree and I still >>> get the lockups. >> >> Actually, git bisect does seem to have gotten it correct. Once I >> actually tested the revert of just that on top of Linus' tree (commit >> d895cb1af1), things seem to be working much better. I've rebooted a >> dozen times without a lockup. The most I've seen it take on a kernel >> with that commit included is 3 reboots, so that's definitely at least an >> improvement. > > I give up. GPU issues are not my thing. 2 reboots after I sent that it > gave me pretty rainbow static again. So it might have been an > improvement, but revert it is not a solution. > > Looking at there rest of the commits, the whole GPU rework might be > suspect, but I clearly have no clue. GPUs are tricky beasts :) >>> >>> Understatement ;). >>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the problem anyway since it only affects 6xx/7xx and your card is handled by the evergreen code. I'll put together some patches to help narrow down the problem. >>> >>> Yeah, that's the biggest problem I have, not knowing which functions are >>> actually being executed for this card. It looks like a combination of >>> stuff in evergreen.c and ni.c, but I have no idea. >>> >>> Patches would be great. If nothing else, I'm really good at building >>> kernels and rebooting by now. >> >> Two possible fixes attached. The first attempts a full reset of all >> blocks if the MC (memory controller) is hung. That may work better >> than just resetting the MC. The second just disables MC reset. I'm >> not sure we can reliably tell if it's busy due to display requests >> hitting the MC periodically which would lead to needlessly resetting >> it possibly leading to failures like you are seeing. > > OK. I'll test them individually. It will probably take a bit because > I'll want to do numerous reboots if things seem "fixed" with one or the > other. > > I'll let you know how things go. I applied each individually on top of Linus' tree as of this morning (commit 2a7d2b96d5) built, installed, and tested. 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in two reboots. 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone 21 reboots without a hang/rainbow static. You'll understand if I'm hesitant to declare success, but resetting the MC does indeed appear to be the issue. I'll keep rebooting for a while to make sure. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson wrote: > On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote: >> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek wrote: >> > Hi, >> > >> > I am seeing this also on Linux-Next. >> > >> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.202381] >> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout >> > (has irq: 1)! >> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.210588] >> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout >> > (has irq: 1)! >> > >> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [ 27.408280] >> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout >> > (has irq: 1)! >> > >> > This seems to be hard reproducible... >> > Laptop-LCD... Sandybridge Mobile-GT2. >> > >> > Is there a way to force the error? >> > >> > Possible patch see [1]. >> > >> > - Sedat - >> > >> > [1] https://patchwork.kernel.org/patch/2192721/ > > That was: > > + if (!done) { > + status = I915_READ_NOTRACE(ch_ctl); > + DRM_ERROR("dp aux hw did not signal timeout (has irq: > %i), status=%08x!\n", > + has_aux_irq, status); > + } > > You applied > > + if (!done) { > + status = I915_READ_NOTRACE(ch_ctl); > + DRM_ERROR("dp aux hw did not signal timeout (has irq: > %i), status=%08x!\n", > + has_aux_irq, status); > + { > > That second '{' is the source of the compile error. Schei**e, OK I try with a v2. A hint how to force the error? - Sedat - > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher wrote: > On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer wrote: >> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher wrote: >>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit >> >> So I don't think that's actually the cause of the problem. Or at least >> not that alone. I reverted it on top of Linus' latest tree and I still >> get the lockups. > > Actually, git bisect does seem to have gotten it correct. Once I > actually tested the revert of just that on top of Linus' tree (commit > d895cb1af1), things seem to be working much better. I've rebooted a > dozen times without a lockup. The most I've seen it take on a kernel > with that commit included is 3 reboots, so that's definitely at least an > improvement. I give up. GPU issues are not my thing. 2 reboots after I sent that it gave me pretty rainbow static again. So it might have been an improvement, but revert it is not a solution. Looking at there rest of the commits, the whole GPU rework might be suspect, but I clearly have no clue. >>> >>> GPUs are tricky beasts :) >> >> Understatement ;). >> >>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the >>> problem anyway since it only affects 6xx/7xx and your card is handled >>> by the evergreen code. I'll put together some patches to help narrow >>> down the problem. >> >> Yeah, that's the biggest problem I have, not knowing which functions are >> actually being executed for this card. It looks like a combination of >> stuff in evergreen.c and ni.c, but I have no idea. >> >> Patches would be great. If nothing else, I'm really good at building >> kernels and rebooting by now. > > Two possible fixes attached. The first attempts a full reset of all > blocks if the MC (memory controller) is hung. That may work better > than just resetting the MC. The second just disables MC reset. I'm > not sure we can reliably tell if it's busy due to display requests > hitting the MC periodically which would lead to needlessly resetting > it possibly leading to failures like you are seeing. OK. I'll test them individually. It will probably take a bit because I'll want to do numerous reboots if things seem "fixed" with one or the other. I'll let you know how things go. josh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote: > On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek wrote: > > Hi, > > > > I am seeing this also on Linux-Next. > > > > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.202381] > > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout > > (has irq: 1)! > > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.210588] > > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout > > (has irq: 1)! > > > > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [ 27.408280] > > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout > > (has irq: 1)! > > > > This seems to be hard reproducible... > > Laptop-LCD... Sandybridge Mobile-GT2. > > > > Is there a way to force the error? > > > > Possible patch see [1]. > > > > - Sedat - > > > > [1] https://patchwork.kernel.org/patch/2192721/ That was: + if (!done) { + status = I915_READ_NOTRACE(ch_ctl); + DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), status=%08x!\n", + has_aux_irq, status); + } You applied + if (!done) { + status = I915_READ_NOTRACE(ch_ctl); + DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), status=%08x!\n", + has_aux_irq, status); + { That second '{' is the source of the compile error. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek wrote: > Hi, > > I am seeing this also on Linux-Next. > > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.202381] > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout > (has irq: 1)! > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.210588] > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout > (has irq: 1)! > > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [ 27.408280] > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout > (has irq: 1)! > > This seems to be hard reproducible... > Laptop-LCD... Sandybridge Mobile-GT2. > > Is there a way to force the error? > > Possible patch see [1]. > > - Sedat - > > [1] https://patchwork.kernel.org/patch/2192721/ Hmm, I tried to apply the test-patch against next-20130227 and it fails building the i915 kernel-module. - Sedat - LD drivers/gpu/drm/i915/built-in.o CC [M] drivers/gpu/drm/i915/i915_drv.o CC [M] drivers/gpu/drm/i915/i915_dma.o CC [M] drivers/gpu/drm/i915/i915_irq.o CC [M] drivers/gpu/drm/i915/i915_debugfs.o CC [M] drivers/gpu/drm/i915/i915_suspend.o CC [M] drivers/gpu/drm/i915/i915_gem.o CC [M] drivers/gpu/drm/i915/i915_gem_context.o CC [M] drivers/gpu/drm/i915/i915_gem_debug.o CC [M] drivers/gpu/drm/i915/i915_gem_evict.o CC [M] drivers/gpu/drm/i915/i915_gem_execbuffer.o CC [M] drivers/gpu/drm/i915/i915_gem_gtt.o CC [M] drivers/gpu/drm/i915/i915_gem_stolen.o CC [M] drivers/gpu/drm/i915/i915_gem_tiling.o CC [M] drivers/gpu/drm/i915/i915_sysfs.o CC [M] drivers/gpu/drm/i915/i915_trace_points.o CC [M] drivers/gpu/drm/i915/i915_ums.o CC [M] drivers/gpu/drm/i915/intel_display.o CC [M] drivers/gpu/drm/i915/intel_crt.o CC [M] drivers/gpu/drm/i915/intel_lvds.o CC [M] drivers/gpu/drm/i915/intel_bios.o CC [M] drivers/gpu/drm/i915/intel_ddi.o CC [M] drivers/gpu/drm/i915/intel_dp.o drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done': drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for function 'intel_dp_aux_ch' drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for function 'intel_dp_aux_native_write' drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for function 'intel_dp_aux_native_write_1' drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for function 'intel_dp_aux_native_read' drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for function 'intel_dp_i2c_aux_ch' drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for function 'intel_dp_i2c_init' drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for function 'ironlake_set_pll_edp' drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for function 'intel_dp_mode_set' drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for function 'ironlake_wait_panel_status' drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for function 'ironlake_wait_panel_on' drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for function 'ironlake_wait_panel_off' drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for function 'ironlake_wait_panel_power_cycle' drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for function 'ironlake_get_pp_control' drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for function 'ironlake_panel_vdd_off_sync' drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for function 'ironlake_panel_vdd_work' drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for function 'ironlake_edp_pll_on' drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for function 'ironlake_edp_pll_off' drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for function 'intel_dp_get_hw_state' drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for function 'intel_disable_dp' drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for function 'intel_post_disable_dp' drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for function 'intel_enable_dp' drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for function 'intel_pre_enable_dp' drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for function 'intel_dp_aux_native_read_retry' drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for function 'intel_dp_get_link_status' drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for function 'intel_dp_voltage_max' drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for function 'intel_dp_pre_emphasis_max' drivers/gpu/drm/i915/intel_dp.c:1538:1: error:
Re: [git pull] drm merge for 3.9-rc1
Hi, I am seeing this also on Linux-Next. /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.202381] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [ 28.210588] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [ 27.408280] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! This seems to be hard reproducible... Laptop-LCD... Sandybridge Mobile-GT2. Is there a way to force the error? Possible patch see [1]. - Sedat - [1] https://patchwork.kernel.org/patch/2192721/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote: > On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: > > > > Highlights: > > > > i915: all over the map, haswell power well enhancements, valleyview macro > > horrors cleaned up, killing lots of legacy GTT > > code, > > Lowlight: > > There's something wrong with i915 DP detection or whatever. I get > stuff like this: > > [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > > and after that the screen ends up black. > > It's happened twice now, but is not 100% repeatable. It looks like the > message itself is new, but the black screen is also new and does seem > to happen when I get the message, so... That message appears to be the canary. For whatever reason the DP transfer is not functioning, likely the VDD is not powered up. However, the failure to communicate there causes the modeset to abort, resulting in the blank screen. > The second time I touched the power button, and the machine came back. > Apparently the suspend/resume cycle made it all magically work: the > suspend caused the same errors, but then the resume made it all good > again. So it is reproducible during suspend. That should help narrow down the sequence, thank you. > Some kind of missed initialization at bootup? It's not reliable enough > to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915: > irq-drive the dp aux communication") since that is where the message > was added.. > > Btw, looking at that commit, what do you think the semantics of the > timeout in something like > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); > > would be? What's that magic "10"? It's some totally random number. The hardware is required to return a timedout error message after 400 microseconds. The timeout here is to catch the dysfunction driver, and so was intended to be 10 milliseconds, cf https://patchwork.kernel.org/patch/2160541/ As it happens with your machine 10 jiffies is approximately 10 millisecond, and so we should not be aborting before the hardware has had a chance to signal failure. One way to check whether it is a failure to setup the IRQ or a failure to setup the DP comms would be: diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 7b8bfe8..f2486f1 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool has_aux_irq) done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); else done = wait_for_atomic(C, 10) == 0; - if (!done) - DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n", - has_aux_irq); + if (!done) { + status = I915_READ_NOTRACE(ch_ctl); + DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), status=%08x!\n", + has_aux_irq, status); + } #undef C return status; -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie wrote: > > If you want to just bump it so Ironlake isn't affected, (patch attached). It works fine 95% of the time and isn't a hard failure when it doesn't, so this isn't critical. I can wait for it to be fixed a while. > Is this external DP monitor or eDP laptop panel btw? External monitor. Oh, and the monitor is actually connected to HDMI, but the black screen and the DP messages definitely go hand-in-hand. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds wrote: > On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: >> >> Highlights: >> >> i915: all over the map, haswell power well enhancements, valleyview macro >> horrors cleaned up, killing lots of legacy GTT >> code, > > Lowlight: > > There's something wrong with i915 DP detection or whatever. I get > stuff like this: > > [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > . > [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > > and after that the screen ends up black. > > It's happened twice now, but is not 100% repeatable. It looks like the > message itself is new, but the black screen is also new and does seem > to happen when I get the message, so... > > The second time I touched the power button, and the machine came back. > Apparently the suspend/resume cycle made it all magically work: the > suspend caused the same errors, but then the resume made it all good > again. > > Some kind of missed initialization at bootup? It's not reliable enough > to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915: > irq-drive the dp aux communication") since that is where the message > was added.. > > Btw, looking at that commit, what do you think the semantics of the > timeout in something like > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); > > would be? What's that magic "10"? It's some totally random number. > > Guys, it should be something meaningful. If you meant a tenth of a > second, use HZ/10 or something. Because just the plain "10" is crazy. > I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a > hundreth of a second. Was that what you intended? Because if it was, > it is still crap, since CONFIG_HZ might be 100, and then you're > waiting for ten times longer. Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is on holiday this week, so maybe if you can make it revert, that might be the best option, If you want to just bump it so Ironlake isn't affected, (patch attached). Is this external DP monitor or eDP laptop panel btw? Dave. 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch Description: Binary data
Re: [git pull] drm merge for 3.9-rc1
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds wrote: > > Lowlight: > > [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core i5-670", aka Clarkdale, aka GMA-some-random-number). The one before SB. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: > > Highlights: > > i915: all over the map, haswell power well enhancements, valleyview macro > horrors cleaned up, killing lots of legacy GTT > code, Lowlight: There's something wrong with i915 DP detection or whatever. I get stuff like this: [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout (has irq: 1)! [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status 0xa145003f . [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status 0xa145003f and after that the screen ends up black. It's happened twice now, but is not 100% repeatable. It looks like the message itself is new, but the black screen is also new and does seem to happen when I get the message, so... The second time I touched the power button, and the machine came back. Apparently the suspend/resume cycle made it all magically work: the suspend caused the same errors, but then the resume made it all good again. Some kind of missed initialization at bootup? It's not reliable enough to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915: irq-drive the dp aux communication") since that is where the message was added.. Btw, looking at that commit, what do you think the semantics of the timeout in something like done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); would be? What's that magic "10"? It's some totally random number. Guys, it should be something meaningful. If you meant a tenth of a second, use HZ/10 or something. Because just the plain "10" is crazy. I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a hundreth of a second. Was that what you intended? Because if it was, it is still crap, since CONFIG_HZ might be 100, and then you're waiting for ten times longer. IOW, passing in a random number like that is crazy. It cannot possibly be right. I have no idea whether the timeout has anything to do with anything, but it reinforces my suspicion that there is something wrong with that commit. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
> > I did the fun conflict resolution, so my tree doesn't have the ordering > changes. > > I also did some things slightly differently from you - you had left > some direct ib[] accesses that I spotted (see for example "case 0x48" > (aka "Copy L2T Frame to Field"), and yours apparently has a few cases > where you use "idx_value" instead of my mindless conflict resolution > that just re-did the brute-force "repace direct ib[] read accesses > with the radeon_get_ib_value() helper function". But you don't do it > for *all* the radeon_get_ib_value(p, idx+2) users, so whatever. Yeah the rules for radeon_get_ib_value are that they are meant to be sequential, but it actually doesn't matter as long as the values are within a page of each other, I was just avoiding multiple calls to get the same value with the idx_value, but I think Alex or Jerome can clean this up a bit further anyways. > Anyway - my conflict resolution isn't exactly the same as yours, and > maybe I screwed something up. But it's damn close, and the differences > _seem_ be all be benign. > > Btw, why is it ok that some functions still read the ib[] array > directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg() > etc)? The semantics for that function are a bit underdocumented, and I thought the other developers understood them after I explained them, but I found out that they hadn't quite grasped the true extent of pain. So yes there are other places that need to be cleaned up, but most of the time direct ib access will work fine, until you have a buffer that straddles a page boundary. > Whatever. I prefer doing my own resolutions just so that I know what's > going on, and it all seems to build and looks reasonable, but it's > always good to get a second opinion. Particularly since I can't > actually test the radeon stuff, so just eyeballing it and saying > "looks semantically identical to Dave's resolution" may not be 100% > sufficient.. Yup I've reviewed it and it looks fine, any cleanup is just going to be an optimisation. So I'll work with Alex/Jerome to clean up anything else out-of-band and hopefully we can avoid any big conflicts in future! Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [git pull] drm merge for 3.9-rc1
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie wrote: > > So up front, this has a massive merge conflict in > drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged > in the same tree, I fixed up some small ordering issues in my merge as > well, however they aren't important if you want the fun of doing a major > conflict resolution. I did the fun conflict resolution, so my tree doesn't have the ordering changes. I also did some things slightly differently from you - you had left some direct ib[] accesses that I spotted (see for example "case 0x48" (aka "Copy L2T Frame to Field"), and yours apparently has a few cases where you use "idx_value" instead of my mindless conflict resolution that just re-did the brute-force "repace direct ib[] read accesses with the radeon_get_ib_value() helper function". But you don't do it for *all* the radeon_get_ib_value(p, idx+2) users, so whatever. Anyway - my conflict resolution isn't exactly the same as yours, and maybe I screwed something up. But it's damn close, and the differences _seem_ be all be benign. Btw, why is it ok that some functions still read the ib[] array directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg() etc)? Whatever. I prefer doing my own resolutions just so that I know what's going on, and it all seems to build and looks reasonable, but it's always good to get a second opinion. Particularly since I can't actually test the radeon stuff, so just eyeballing it and saying "looks semantically identical to Dave's resolution" may not be 100% sufficient.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/