Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> So up front, this has a massive merge conflict in
> drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged
> in the same tree, I fixed up some small ordering issues in my merge as
> well, however they aren't important if you want the fun of doing a major
> conflict resolution.

I did the fun conflict resolution, so my tree doesn't have the ordering changes.

I also did some things slightly differently from you - you had left
some direct ib[] accesses that I spotted (see for example "case 0x48"
(aka "Copy L2T Frame to Field"), and yours apparently has a few cases
where you use "idx_value" instead of my mindless conflict resolution
that just re-did the brute-force "repace direct ib[] read accesses
with the radeon_get_ib_value() helper function". But you don't do it
for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Anyway - my conflict resolution isn't exactly the same as yours, and
maybe I screwed something up. But it's damn close, and the differences
_seem_ be all be benign.

Btw, why is it ok that some functions still read the ib[] array
directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
etc)?


Whatever. I prefer doing my own resolutions just so that I know what's
going on, and it all seems to build and looks reasonable, but it's
always good to get a second opinion. Particularly since I can't
actually test the radeon stuff, so just eyeballing it and saying
"looks semantically identical to Dave's resolution" may not be 100%
sufficient..

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Dave Airlie
>
> I did the fun conflict resolution, so my tree doesn't have the ordering 
> changes.
>
> I also did some things slightly differently from you - you had left
> some direct ib[] accesses that I spotted (see for example "case 0x48"
> (aka "Copy L2T Frame to Field"), and yours apparently has a few cases
> where you use "idx_value" instead of my mindless conflict resolution
> that just re-did the brute-force "repace direct ib[] read accesses
> with the radeon_get_ib_value() helper function". But you don't do it
> for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Yeah the rules for radeon_get_ib_value are that they are meant to be sequential,
but it actually doesn't matter as long as the values are within a page
of each other,
I was just avoiding multiple calls to get the same value with the idx_value, but
I think Alex or Jerome can clean this up a bit further anyways.

> Anyway - my conflict resolution isn't exactly the same as yours, and
> maybe I screwed something up. But it's damn close, and the differences
> _seem_ be all be benign.
>
> Btw, why is it ok that some functions still read the ib[] array
> directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
> etc)?

The semantics for that function are a bit underdocumented, and I thought
the other developers understood them after I explained them, but I found
out that they hadn't quite grasped the true extent of pain. So yes there
are other places that need to be cleaned up, but most of the time direct
ib access will work fine, until you have a buffer that straddles a
page boundary.

> Whatever. I prefer doing my own resolutions just so that I know what's
> going on, and it all seems to build and looks reasonable, but it's
> always good to get a second opinion. Particularly since I can't
> actually test the radeon stuff, so just eyeballing it and saying
> "looks semantically identical to Dave's resolution" may not be 100%
> sufficient..

Yup I've reviewed it and it looks fine, any cleanup is just going to be
an optimisation.

So I'll work with Alex/Jerome to clean up anything else out-of-band
and hopefully
we can avoid any big conflicts in future!

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> Highlights:
>
> i915: all over the map, haswell power well enhancements, valleyview macro 
> horrors cleaned up, killing lots of legacy GTT
> code,

Lowlight:

There's something wrong with i915 DP detection or whatever. I get
stuff like this:

[5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f
.
[8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f

and after that the screen ends up black.

It's happened twice now, but is not 100% repeatable. It looks like the
message itself is new,  but the black screen is also new and does seem
to happen when I get the message, so...

The second time I touched the power button, and the machine came back.
Apparently the suspend/resume cycle made it all magically work: the
suspend caused the same errors, but then the resume made it all good
again.

Some kind of missed initialization at bootup? It's not reliable enough
to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
irq-drive the dp aux communication") since that is where the message
was added..

Btw, looking at that commit, what do you think the semantics of the
timeout in something like

done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);

would be? What's that magic "10"? It's some totally random number.

Guys, it should be something meaningful. If you meant a tenth of a
second, use HZ/10 or something. Because just the plain "10" is crazy.
I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
hundreth of a second. Was that what you intended? Because if it was,
it is still crap, since CONFIG_HZ might be 100, and then you're
waiting for ten times longer.

IOW, passing in a random number like that is crazy. It cannot possibly
be right.

I have no idea whether the timeout has anything to do with anything,
but it reinforces my suspicion that there is something wrong with that
commit.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds
 wrote:
>
> Lowlight:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!

Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core
i5-670", aka Clarkdale, aka GMA-some-random-number). The one before
SB.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Dave Airlie
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.

Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is
on holiday this week,
so maybe if you can make it revert, that might be the best option,

If you want to just bump it so Ironlake isn't affected, (patch attached).

Is this external DP monitor or eDP laptop panel btw?

Dave.


0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie  wrote:
>
> If you want to just bump it so Ironlake isn't affected, (patch attached).

It works fine 95% of the time and isn't a hard failure when it
doesn't, so this isn't critical. I can wait for it to be fixed a
while.

> Is this external DP monitor or eDP laptop panel btw?

External monitor. Oh, and the monitor is actually connected to HDMI,
but the black screen and the DP messages definitely go hand-in-hand.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-27 Thread Chris Wilson
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...

That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.

So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.

The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/

As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool 
has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
-   if (!done)
-   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), 
status=%08x!\n",
+ has_aux_irq, status);
+   }
 #undef C
 
return status;

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
Hi,

I am seeing this also on Linux-Next.

/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

/var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

This seems to be hard reproducible...
Laptop-LCD... Sandybridge Mobile-GT2.

Is there a way to force the error?

Possible patch see [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/2192721/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> Hi,
>
> I am seeing this also on Linux-Next.
>
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> This seems to be hard reproducible...
> Laptop-LCD... Sandybridge Mobile-GT2.
>
> Is there a way to force the error?
>
> Possible patch see [1].
>
> - Sedat -
>
> [1] https://patchwork.kernel.org/patch/2192721/

Hmm, I tried to apply the test-patch against next-20130227 and it
fails building the i915 kernel-module.

- Sedat -
  LD  drivers/gpu/drm/i915/built-in.o
  CC [M]  drivers/gpu/drm/i915/i915_drv.o
  CC [M]  drivers/gpu/drm/i915/i915_dma.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_execbuffer.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_stolen.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/i915/i915_ums.o
  CC [M]  drivers/gpu/drm/i915/intel_display.o
  CC [M]  drivers/gpu/drm/i915/intel_crt.o
  CC [M]  drivers/gpu/drm/i915/intel_lvds.o
  CC [M]  drivers/gpu/drm/i915/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/intel_dp.o
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done':
drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for 
function 'intel_dp_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for 
function 'intel_dp_aux_native_write'
drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for 
function 'intel_dp_aux_native_write_1'
drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for 
function 'intel_dp_aux_native_read'
drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for 
function 'intel_dp_i2c_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for 
function 'intel_dp_i2c_init'
drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for 
function 'ironlake_set_pll_edp'
drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for 
function 'intel_dp_mode_set'
drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for 
function 'ironlake_wait_panel_status'
drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for 
function 'ironlake_wait_panel_on'
drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for 
function 'ironlake_wait_panel_off'
drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for 
function 'ironlake_wait_panel_power_cycle'
drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for 
function 'ironlake_get_pp_control'
drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for 
function 'ironlake_panel_vdd_off_sync'
drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for 
function 'ironlake_panel_vdd_work'
drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for 
function 'ironlake_edp_pll_on'
drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for 
function 'ironlake_edp_pll_off'
drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for 
function 'intel_dp_get_hw_state'
drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for 
function 'intel_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for 
function 'intel_post_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for 
function 'intel_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for 
function 'intel_pre_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for 
function 'intel_dp_aux_native_read_retry'
drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for 
function 'intel_dp_get_link_status'
drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for 
function 'intel_dp_voltage_max'
drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for 
function 'intel_dp_pre_emphasis_max'
drivers/gpu/drm/i915/intel_dp.c:1538:1: error:

Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Chris Wilson
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> > Hi,
> >
> > I am seeing this also on Linux-Next.
> >
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > This seems to be hard reproducible...
> > Laptop-LCD... Sandybridge Mobile-GT2.
> >
> > Is there a way to force the error?
> >
> > Possible patch see [1].
> >
> > - Sedat -
> >
> > [1] https://patchwork.kernel.org/patch/2192721/

That was:

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   }

You applied

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   {

That second '{' is the source of the compile error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>
>> So I don't think that's actually the cause of the problem.  Or at least
>> not that alone.  I reverted it on top of Linus' latest tree and I still
>> get the lockups.
>
> Actually, git bisect does seem to have gotten it correct.  Once I
> actually tested the revert of just that on top of Linus' tree (commit
> d895cb1af1), things seem to be working much better.  I've rebooted a
> dozen times without a lockup.  The most I've seen it take on a kernel
> with that commit included is 3 reboots, so that's definitely at least an
> improvement.

 I give up.  GPU issues are not my thing.  2 reboots after I sent that it
 gave me pretty rainbow static again.  So it might have been an
 improvement, but revert it is not a solution.

 Looking at there rest of the commits, the whole GPU rework might be
 suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code.  I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card.  It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great.  If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached.  The first attempts a full reset of all
> blocks if the MC (memory controller) is hung.  That may work better
> than just resetting the MC.  The second just disables MC reset.  I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK.  I'll test them individually.  It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Sedat Dilek
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson  wrote:
> On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
>> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
>> > Hi,
>> >
>> > I am seeing this also on Linux-Next.
>> >
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > This seems to be hard reproducible...
>> > Laptop-LCD... Sandybridge Mobile-GT2.
>> >
>> > Is there a way to force the error?
>> >
>> > Possible patch see [1].
>> >
>> > - Sedat -
>> >
>> > [1] https://patchwork.kernel.org/patch/2192721/
>
> That was:
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   }
>
> You applied
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   {
>
> That second '{' is the source of the compile error.

Schei**e, OK I try with a v2.

A hint how to force the error?

- Sedat -

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
 ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>
>>> So I don't think that's actually the cause of the problem.  Or at least
>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>> get the lockups.
>>
>> Actually, git bisect does seem to have gotten it correct.  Once I
>> actually tested the revert of just that on top of Linus' tree (commit
>> d895cb1af1), things seem to be working much better.  I've rebooted a
>> dozen times without a lockup.  The most I've seen it take on a kernel
>> with that commit included is 3 reboots, so that's definitely at least an
>> improvement.
>
> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
> gave me pretty rainbow static again.  So it might have been an
> improvement, but revert it is not a solution.
>
> Looking at there rest of the commits, the whole GPU rework might be
> suspect, but I clearly have no clue.

 GPUs are tricky beasts :)
>>>
>>> Understatement ;).
>>>
 ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
 problem anyway since it only affects 6xx/7xx and your card is handled
 by the evergreen code.  I'll put together some patches to help narrow
 down the problem.
>>>
>>> Yeah, that's the biggest problem I have, not knowing which functions are
>>> actually being executed for this card.  It looks like a combination of
>>> stuff in evergreen.c and ni.c, but I have no idea.
>>>
>>> Patches would be great.  If nothing else, I'm really good at building
>>> kernels and rebooting by now.
>>
>> Two possible fixes attached.  The first attempts a full reset of all
>> blocks if the MC (memory controller) is hung.  That may work better
>> than just resetting the MC.  The second just disables MC reset.  I'm
>> not sure we can reliably tell if it's busy due to display requests
>> hitting the MC periodically which would lead to needlessly resetting
>> it possibly leading to failures like you are seeing.
>
> OK.  I'll test them individually.  It will probably take a bit because
> I'll want to do numerous reboots if things seem "fixed" with one or the
> other.
>
> I'll let you know how things go.

I applied each individually on top of Linus' tree as of this morning
(commit 2a7d2b96d5) built, installed, and tested.

0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
two reboots.

0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
21 reboots without a hang/rainbow static.  You'll understand if I'm
hesitant to declare success, but resetting the MC does indeed appear to
be the issue.  I'll keep rebooting for a while to make sure.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-03 Thread Azat Khuzhin
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f


I have the same messages after upgrading up to
b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50
But in my case when I reboot computer the second monitor, that plugged
via HDMI, didn't works, end when I run `xrandr`, I have next messages
in kern.log

Mar  3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f
Mar  3 18:09:15 home-spb kernel: [12321.771715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.782712]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.793715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.804719]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.815725]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f

# lspci | fgrep -i graph
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09)

I tested some commits, and here the results:
- Breaked at v3.8-10206-gb0af9cd
- Works normal v3.8-rc3-139-g34f2be4
- Works normal v3.8-rc3-188-g10aa17c
- Works normal 6dc1c49

I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it
works for me.
Thank, Dave.

>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
>
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
>
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Respectfully
Azat Khuzhin
Primary email a3at.m...@gmail.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Alex Deucher
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  
>>> wrote:
 On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
> wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>
> So I don't think that's actually the cause of the problem.  Or at 
> least
> not that alone.  I reverted it on top of Linus' latest tree and I 
> still
> get the lockups.

 Actually, git bisect does seem to have gotten it correct.  Once I
 actually tested the revert of just that on top of Linus' tree (commit
 d895cb1af1), things seem to be working much better.  I've rebooted a
 dozen times without a lockup.  The most I've seen it take on a kernel
 with that commit included is 3 reboots, so that's definitely at least 
 an
 improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

 Two possible fixes attached.  The first attempts a full reset of all
 blocks if the MC (memory controller) is hung.  That may work better
 than just resetting the MC.  The second just disables MC reset.  I'm
 not sure we can reliably tell if it's busy due to display requests
 hitting the MC periodically which would lead to needlessly resetting
 it possibly leading to failures like you are seeing.
>>>
>>> OK.  I'll test them individually.  It will probably take a bit because
>>> I'll want to do numerous reboots if things seem "fixed" with one or the
>>> other.
>>>
>>> I'll let you know how things go.
>>
>> I applied each individually on top of Linus' tree as of this morning
>> (commit 2a7d2b96d5) built, installed, and tested.
>>
>> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
>> two reboots.
>>
>> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
>> 21 reboots without a hang/rainbow static.  You'll understand if I'm
>> hesitant to declare success, but resetting the MC does indeed appear to
>> be the issue.  I'll keep rebooting for a while to make sure.
>
> OK, I'm still running on the kernel with that patch and things still
> work.  The only other "issue" I'm seeing at the moment is my dmesg is
> full of:
>
> [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.
>

I'll make those debug only when the patch goes upstream.

> So hopefully your patch is on the way into Linus' tree at some point
> soon.

It'll be in my next -fixes pull.

Alex
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Daniel Vetter
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
> 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
> 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.
> 
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
> 
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
> 
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.

Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies
vs. msec confusion. And the other to plug a race in our irq handler which
did lead to missed dp aux interrupts according to some digging done by
Imre. The important patch is the current tip of

git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes

44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south 
interrupts when handling them

Just in case you want to give it a quick whirl. Since the failed dp aux
transaction caused the resume modeset to fail for you (resulting in the
black screen) I hope that this should fix both issues.

I'll forward the pull to Dave in a few days since atm I'm stalling a bit
for confirmation on another little regression fix. And there's nothing
earth-shattering in my -fixes queue right now.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Josh Boyer
On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
 On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
 wrote:
> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit

 So I don't think that's actually the cause of the problem.  Or at least
 not that alone.  I reverted it on top of Linus' latest tree and I still
 get the lockups.
>>>
>>> Actually, git bisect does seem to have gotten it correct.  Once I
>>> actually tested the revert of just that on top of Linus' tree (commit
>>> d895cb1af1), things seem to be working much better.  I've rebooted a
>>> dozen times without a lockup.  The most I've seen it take on a kernel
>>> with that commit included is 3 reboots, so that's definitely at least an
>>> improvement.
>>
>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>> gave me pretty rainbow static again.  So it might have been an
>> improvement, but revert it is not a solution.
>>
>> Looking at there rest of the commits, the whole GPU rework might be
>> suspect, but I clearly have no clue.
>
> GPUs are tricky beasts :)

 Understatement ;).

> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
> problem anyway since it only affects 6xx/7xx and your card is handled
> by the evergreen code.  I'll put together some patches to help narrow
> down the problem.

 Yeah, that's the biggest problem I have, not knowing which functions are
 actually being executed for this card.  It looks like a combination of
 stuff in evergreen.c and ni.c, but I have no idea.

 Patches would be great.  If nothing else, I'm really good at building
 kernels and rebooting by now.
>>>
>>> Two possible fixes attached.  The first attempts a full reset of all
>>> blocks if the MC (memory controller) is hung.  That may work better
>>> than just resetting the MC.  The second just disables MC reset.  I'm
>>> not sure we can reliably tell if it's busy due to display requests
>>> hitting the MC periodically which would lead to needlessly resetting
>>> it possibly leading to failures like you are seeing.
>>
>> OK.  I'll test them individually.  It will probably take a bit because
>> I'll want to do numerous reboots if things seem "fixed" with one or the
>> other.
>>
>> I'll let you know how things go.
>
> I applied each individually on top of Linus' tree as of this morning
> (commit 2a7d2b96d5) built, installed, and tested.
>
> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
> two reboots.
>
> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
> 21 reboots without a hang/rainbow static.  You'll understand if I'm
> hesitant to declare success, but resetting the MC does indeed appear to
> be the issue.  I'll keep rebooting for a while to make sure.

OK, I'm still running on the kernel with that patch and things still
work.  The only other "issue" I'm seeing at the moment is my dmesg is
full of:

[349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
[349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.

So hopefully your patch is on the way into Linus' tree at some point
soon.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> So up front, this has a massive merge conflict in
> drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged
> in the same tree, I fixed up some small ordering issues in my merge as
> well, however they aren't important if you want the fun of doing a major
> conflict resolution.

I did the fun conflict resolution, so my tree doesn't have the ordering changes.

I also did some things slightly differently from you - you had left
some direct ib[] accesses that I spotted (see for example "case 0x48"
(aka "Copy L2T Frame to Field"), and yours apparently has a few cases
where you use "idx_value" instead of my mindless conflict resolution
that just re-did the brute-force "repace direct ib[] read accesses
with the radeon_get_ib_value() helper function". But you don't do it
for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Anyway - my conflict resolution isn't exactly the same as yours, and
maybe I screwed something up. But it's damn close, and the differences
_seem_ be all be benign.

Btw, why is it ok that some functions still read the ib[] array
directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
etc)?


Whatever. I prefer doing my own resolutions just so that I know what's
going on, and it all seems to build and looks reasonable, but it's
always good to get a second opinion. Particularly since I can't
actually test the radeon stuff, so just eyeballing it and saying
"looks semantically identical to Dave's resolution" may not be 100%
sufficient..

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Dave Airlie
>
> I did the fun conflict resolution, so my tree doesn't have the ordering 
> changes.
>
> I also did some things slightly differently from you - you had left
> some direct ib[] accesses that I spotted (see for example "case 0x48"
> (aka "Copy L2T Frame to Field"), and yours apparently has a few cases
> where you use "idx_value" instead of my mindless conflict resolution
> that just re-did the brute-force "repace direct ib[] read accesses
> with the radeon_get_ib_value() helper function". But you don't do it
> for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Yeah the rules for radeon_get_ib_value are that they are meant to be sequential,
but it actually doesn't matter as long as the values are within a page
of each other,
I was just avoiding multiple calls to get the same value with the idx_value, but
I think Alex or Jerome can clean this up a bit further anyways.

> Anyway - my conflict resolution isn't exactly the same as yours, and
> maybe I screwed something up. But it's damn close, and the differences
> _seem_ be all be benign.
>
> Btw, why is it ok that some functions still read the ib[] array
> directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
> etc)?

The semantics for that function are a bit underdocumented, and I thought
the other developers understood them after I explained them, but I found
out that they hadn't quite grasped the true extent of pain. So yes there
are other places that need to be cleaned up, but most of the time direct
ib access will work fine, until you have a buffer that straddles a
page boundary.

> Whatever. I prefer doing my own resolutions just so that I know what's
> going on, and it all seems to build and looks reasonable, but it's
> always good to get a second opinion. Particularly since I can't
> actually test the radeon stuff, so just eyeballing it and saying
> "looks semantically identical to Dave's resolution" may not be 100%
> sufficient..

Yup I've reviewed it and it looks fine, any cleanup is just going to be
an optimisation.

So I'll work with Alex/Jerome to clean up anything else out-of-band
and hopefully
we can avoid any big conflicts in future!

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> Highlights:
>
> i915: all over the map, haswell power well enhancements, valleyview macro 
> horrors cleaned up, killing lots of legacy GTT
> code,

Lowlight:

There's something wrong with i915 DP detection or whatever. I get
stuff like this:

[5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f
.
[8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f

and after that the screen ends up black.

It's happened twice now, but is not 100% repeatable. It looks like the
message itself is new,  but the black screen is also new and does seem
to happen when I get the message, so...

The second time I touched the power button, and the machine came back.
Apparently the suspend/resume cycle made it all magically work: the
suspend caused the same errors, but then the resume made it all good
again.

Some kind of missed initialization at bootup? It's not reliable enough
to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
irq-drive the dp aux communication") since that is where the message
was added..

Btw, looking at that commit, what do you think the semantics of the
timeout in something like

done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);

would be? What's that magic "10"? It's some totally random number.

Guys, it should be something meaningful. If you meant a tenth of a
second, use HZ/10 or something. Because just the plain "10" is crazy.
I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
hundreth of a second. Was that what you intended? Because if it was,
it is still crap, since CONFIG_HZ might be 100, and then you're
waiting for ten times longer.

IOW, passing in a random number like that is crazy. It cannot possibly
be right.

I have no idea whether the timeout has anything to do with anything,
but it reinforces my suspicion that there is something wrong with that
commit.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds
 wrote:
>
> Lowlight:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!

Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core
i5-670", aka Clarkdale, aka GMA-some-random-number). The one before
SB.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Dave Airlie
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.

Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is
on holiday this week,
so maybe if you can make it revert, that might be the best option,

If you want to just bump it so Ironlake isn't affected, (patch attached).

Is this external DP monitor or eDP laptop panel btw?

Dave.


0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie  wrote:
>
> If you want to just bump it so Ironlake isn't affected, (patch attached).

It works fine 95% of the time and isn't a hard failure when it
doesn't, so this isn't critical. I can wait for it to be fixed a
while.

> Is this external DP monitor or eDP laptop panel btw?

External monitor. Oh, and the monitor is actually connected to HDMI,
but the black screen and the DP messages definitely go hand-in-hand.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-27 Thread Chris Wilson
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...

That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.

So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.

The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/

As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool 
has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
-   if (!done)
-   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), 
status=%08x!\n",
+ has_aux_irq, status);
+   }
 #undef C
 
return status;

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
Hi,

I am seeing this also on Linux-Next.

/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

/var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

This seems to be hard reproducible...
Laptop-LCD... Sandybridge Mobile-GT2.

Is there a way to force the error?

Possible patch see [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/2192721/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> Hi,
>
> I am seeing this also on Linux-Next.
>
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> This seems to be hard reproducible...
> Laptop-LCD... Sandybridge Mobile-GT2.
>
> Is there a way to force the error?
>
> Possible patch see [1].
>
> - Sedat -
>
> [1] https://patchwork.kernel.org/patch/2192721/

Hmm, I tried to apply the test-patch against next-20130227 and it
fails building the i915 kernel-module.

- Sedat -
  LD  drivers/gpu/drm/i915/built-in.o
  CC [M]  drivers/gpu/drm/i915/i915_drv.o
  CC [M]  drivers/gpu/drm/i915/i915_dma.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_execbuffer.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_stolen.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/i915/i915_ums.o
  CC [M]  drivers/gpu/drm/i915/intel_display.o
  CC [M]  drivers/gpu/drm/i915/intel_crt.o
  CC [M]  drivers/gpu/drm/i915/intel_lvds.o
  CC [M]  drivers/gpu/drm/i915/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/intel_dp.o
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done':
drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for 
function 'intel_dp_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for 
function 'intel_dp_aux_native_write'
drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for 
function 'intel_dp_aux_native_write_1'
drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for 
function 'intel_dp_aux_native_read'
drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for 
function 'intel_dp_i2c_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for 
function 'intel_dp_i2c_init'
drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for 
function 'ironlake_set_pll_edp'
drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for 
function 'intel_dp_mode_set'
drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for 
function 'ironlake_wait_panel_status'
drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for 
function 'ironlake_wait_panel_on'
drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for 
function 'ironlake_wait_panel_off'
drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for 
function 'ironlake_wait_panel_power_cycle'
drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for 
function 'ironlake_get_pp_control'
drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for 
function 'ironlake_panel_vdd_off_sync'
drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for 
function 'ironlake_panel_vdd_work'
drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for 
function 'ironlake_edp_pll_on'
drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for 
function 'ironlake_edp_pll_off'
drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for 
function 'intel_dp_get_hw_state'
drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for 
function 'intel_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for 
function 'intel_post_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for 
function 'intel_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for 
function 'intel_pre_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for 
function 'intel_dp_aux_native_read_retry'
drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for 
function 'intel_dp_get_link_status'
drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for 
function 'intel_dp_voltage_max'
drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for 
function 'intel_dp_pre_emphasis_max'
drivers/gpu/drm/i915/intel_dp.c:1538:1: error:

Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Chris Wilson
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> > Hi,
> >
> > I am seeing this also on Linux-Next.
> >
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > This seems to be hard reproducible...
> > Laptop-LCD... Sandybridge Mobile-GT2.
> >
> > Is there a way to force the error?
> >
> > Possible patch see [1].
> >
> > - Sedat -
> >
> > [1] https://patchwork.kernel.org/patch/2192721/

That was:

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   }

You applied

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   {

That second '{' is the source of the compile error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>
>> So I don't think that's actually the cause of the problem.  Or at least
>> not that alone.  I reverted it on top of Linus' latest tree and I still
>> get the lockups.
>
> Actually, git bisect does seem to have gotten it correct.  Once I
> actually tested the revert of just that on top of Linus' tree (commit
> d895cb1af1), things seem to be working much better.  I've rebooted a
> dozen times without a lockup.  The most I've seen it take on a kernel
> with that commit included is 3 reboots, so that's definitely at least an
> improvement.

 I give up.  GPU issues are not my thing.  2 reboots after I sent that it
 gave me pretty rainbow static again.  So it might have been an
 improvement, but revert it is not a solution.

 Looking at there rest of the commits, the whole GPU rework might be
 suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code.  I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card.  It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great.  If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached.  The first attempts a full reset of all
> blocks if the MC (memory controller) is hung.  That may work better
> than just resetting the MC.  The second just disables MC reset.  I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK.  I'll test them individually.  It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Sedat Dilek
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson  wrote:
> On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
>> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
>> > Hi,
>> >
>> > I am seeing this also on Linux-Next.
>> >
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > This seems to be hard reproducible...
>> > Laptop-LCD... Sandybridge Mobile-GT2.
>> >
>> > Is there a way to force the error?
>> >
>> > Possible patch see [1].
>> >
>> > - Sedat -
>> >
>> > [1] https://patchwork.kernel.org/patch/2192721/
>
> That was:
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   }
>
> You applied
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   {
>
> That second '{' is the source of the compile error.

Schei**e, OK I try with a v2.

A hint how to force the error?

- Sedat -

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
 ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>
>>> So I don't think that's actually the cause of the problem.  Or at least
>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>> get the lockups.
>>
>> Actually, git bisect does seem to have gotten it correct.  Once I
>> actually tested the revert of just that on top of Linus' tree (commit
>> d895cb1af1), things seem to be working much better.  I've rebooted a
>> dozen times without a lockup.  The most I've seen it take on a kernel
>> with that commit included is 3 reboots, so that's definitely at least an
>> improvement.
>
> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
> gave me pretty rainbow static again.  So it might have been an
> improvement, but revert it is not a solution.
>
> Looking at there rest of the commits, the whole GPU rework might be
> suspect, but I clearly have no clue.

 GPUs are tricky beasts :)
>>>
>>> Understatement ;).
>>>
 ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
 problem anyway since it only affects 6xx/7xx and your card is handled
 by the evergreen code.  I'll put together some patches to help narrow
 down the problem.
>>>
>>> Yeah, that's the biggest problem I have, not knowing which functions are
>>> actually being executed for this card.  It looks like a combination of
>>> stuff in evergreen.c and ni.c, but I have no idea.
>>>
>>> Patches would be great.  If nothing else, I'm really good at building
>>> kernels and rebooting by now.
>>
>> Two possible fixes attached.  The first attempts a full reset of all
>> blocks if the MC (memory controller) is hung.  That may work better
>> than just resetting the MC.  The second just disables MC reset.  I'm
>> not sure we can reliably tell if it's busy due to display requests
>> hitting the MC periodically which would lead to needlessly resetting
>> it possibly leading to failures like you are seeing.
>
> OK.  I'll test them individually.  It will probably take a bit because
> I'll want to do numerous reboots if things seem "fixed" with one or the
> other.
>
> I'll let you know how things go.

I applied each individually on top of Linus' tree as of this morning
(commit 2a7d2b96d5) built, installed, and tested.

0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
two reboots.

0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
21 reboots without a hang/rainbow static.  You'll understand if I'm
hesitant to declare success, but resetting the MC does indeed appear to
be the issue.  I'll keep rebooting for a while to make sure.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-03 Thread Azat Khuzhin
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f


I have the same messages after upgrading up to
b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50
But in my case when I reboot computer the second monitor, that plugged
via HDMI, didn't works, end when I run `xrandr`, I have next messages
in kern.log

Mar  3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f
Mar  3 18:09:15 home-spb kernel: [12321.771715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.782712]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.793715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.804719]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.815725]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f

# lspci | fgrep -i graph
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09)

I tested some commits, and here the results:
- Breaked at v3.8-10206-gb0af9cd
- Works normal v3.8-rc3-139-g34f2be4
- Works normal v3.8-rc3-188-g10aa17c
- Works normal 6dc1c49

I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it
works for me.
Thank, Dave.

>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
>
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
>
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Respectfully
Azat Khuzhin
Primary email a3at.m...@gmail.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Alex Deucher
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  
>>> wrote:
 On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
> wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>
> So I don't think that's actually the cause of the problem.  Or at 
> least
> not that alone.  I reverted it on top of Linus' latest tree and I 
> still
> get the lockups.

 Actually, git bisect does seem to have gotten it correct.  Once I
 actually tested the revert of just that on top of Linus' tree (commit
 d895cb1af1), things seem to be working much better.  I've rebooted a
 dozen times without a lockup.  The most I've seen it take on a kernel
 with that commit included is 3 reboots, so that's definitely at least 
 an
 improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

 Two possible fixes attached.  The first attempts a full reset of all
 blocks if the MC (memory controller) is hung.  That may work better
 than just resetting the MC.  The second just disables MC reset.  I'm
 not sure we can reliably tell if it's busy due to display requests
 hitting the MC periodically which would lead to needlessly resetting
 it possibly leading to failures like you are seeing.
>>>
>>> OK.  I'll test them individually.  It will probably take a bit because
>>> I'll want to do numerous reboots if things seem "fixed" with one or the
>>> other.
>>>
>>> I'll let you know how things go.
>>
>> I applied each individually on top of Linus' tree as of this morning
>> (commit 2a7d2b96d5) built, installed, and tested.
>>
>> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
>> two reboots.
>>
>> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
>> 21 reboots without a hang/rainbow static.  You'll understand if I'm
>> hesitant to declare success, but resetting the MC does indeed appear to
>> be the issue.  I'll keep rebooting for a while to make sure.
>
> OK, I'm still running on the kernel with that patch and things still
> work.  The only other "issue" I'm seeing at the moment is my dmesg is
> full of:
>
> [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.
>

I'll make those debug only when the patch goes upstream.

> So hopefully your patch is on the way into Linus' tree at some point
> soon.

It'll be in my next -fixes pull.

Alex
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Daniel Vetter
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
> 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
> 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.
> 
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
> 
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
> 
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.

Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies
vs. msec confusion. And the other to plug a race in our irq handler which
did lead to missed dp aux interrupts according to some digging done by
Imre. The important patch is the current tip of

git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes

44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south 
interrupts when handling them

Just in case you want to give it a quick whirl. Since the failed dp aux
transaction caused the resume modeset to fail for you (resulting in the
black screen) I hope that this should fix both issues.

I'll forward the pull to Dave in a few days since atm I'm stalling a bit
for confirmation on another little regression fix. And there's nothing
earth-shattering in my -fixes queue right now.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Josh Boyer
On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
 On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
 wrote:
> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit

 So I don't think that's actually the cause of the problem.  Or at least
 not that alone.  I reverted it on top of Linus' latest tree and I still
 get the lockups.
>>>
>>> Actually, git bisect does seem to have gotten it correct.  Once I
>>> actually tested the revert of just that on top of Linus' tree (commit
>>> d895cb1af1), things seem to be working much better.  I've rebooted a
>>> dozen times without a lockup.  The most I've seen it take on a kernel
>>> with that commit included is 3 reboots, so that's definitely at least an
>>> improvement.
>>
>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>> gave me pretty rainbow static again.  So it might have been an
>> improvement, but revert it is not a solution.
>>
>> Looking at there rest of the commits, the whole GPU rework might be
>> suspect, but I clearly have no clue.
>
> GPUs are tricky beasts :)

 Understatement ;).

> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
> problem anyway since it only affects 6xx/7xx and your card is handled
> by the evergreen code.  I'll put together some patches to help narrow
> down the problem.

 Yeah, that's the biggest problem I have, not knowing which functions are
 actually being executed for this card.  It looks like a combination of
 stuff in evergreen.c and ni.c, but I have no idea.

 Patches would be great.  If nothing else, I'm really good at building
 kernels and rebooting by now.
>>>
>>> Two possible fixes attached.  The first attempts a full reset of all
>>> blocks if the MC (memory controller) is hung.  That may work better
>>> than just resetting the MC.  The second just disables MC reset.  I'm
>>> not sure we can reliably tell if it's busy due to display requests
>>> hitting the MC periodically which would lead to needlessly resetting
>>> it possibly leading to failures like you are seeing.
>>
>> OK.  I'll test them individually.  It will probably take a bit because
>> I'll want to do numerous reboots if things seem "fixed" with one or the
>> other.
>>
>> I'll let you know how things go.
>
> I applied each individually on top of Linus' tree as of this morning
> (commit 2a7d2b96d5) built, installed, and tested.
>
> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
> two reboots.
>
> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
> 21 reboots without a hang/rainbow static.  You'll understand if I'm
> hesitant to declare success, but resetting the MC does indeed appear to
> be the issue.  I'll keep rebooting for a while to make sure.

OK, I'm still running on the kernel with that patch and things still
work.  The only other "issue" I'm seeing at the moment is my dmesg is
full of:

[349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
[349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.

So hopefully your patch is on the way into Linus' tree at some point
soon.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> So up front, this has a massive merge conflict in
> drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged
> in the same tree, I fixed up some small ordering issues in my merge as
> well, however they aren't important if you want the fun of doing a major
> conflict resolution.

I did the fun conflict resolution, so my tree doesn't have the ordering changes.

I also did some things slightly differently from you - you had left
some direct ib[] accesses that I spotted (see for example "case 0x48"
(aka "Copy L2T Frame to Field"), and yours apparently has a few cases
where you use "idx_value" instead of my mindless conflict resolution
that just re-did the brute-force "repace direct ib[] read accesses
with the radeon_get_ib_value() helper function". But you don't do it
for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Anyway - my conflict resolution isn't exactly the same as yours, and
maybe I screwed something up. But it's damn close, and the differences
_seem_ be all be benign.

Btw, why is it ok that some functions still read the ib[] array
directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
etc)?


Whatever. I prefer doing my own resolutions just so that I know what's
going on, and it all seems to build and looks reasonable, but it's
always good to get a second opinion. Particularly since I can't
actually test the radeon stuff, so just eyeballing it and saying
"looks semantically identical to Dave's resolution" may not be 100%
sufficient..

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Dave Airlie
>
> I did the fun conflict resolution, so my tree doesn't have the ordering 
> changes.
>
> I also did some things slightly differently from you - you had left
> some direct ib[] accesses that I spotted (see for example "case 0x48"
> (aka "Copy L2T Frame to Field"), and yours apparently has a few cases
> where you use "idx_value" instead of my mindless conflict resolution
> that just re-did the brute-force "repace direct ib[] read accesses
> with the radeon_get_ib_value() helper function". But you don't do it
> for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Yeah the rules for radeon_get_ib_value are that they are meant to be sequential,
but it actually doesn't matter as long as the values are within a page
of each other,
I was just avoiding multiple calls to get the same value with the idx_value, but
I think Alex or Jerome can clean this up a bit further anyways.

> Anyway - my conflict resolution isn't exactly the same as yours, and
> maybe I screwed something up. But it's damn close, and the differences
> _seem_ be all be benign.
>
> Btw, why is it ok that some functions still read the ib[] array
> directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
> etc)?

The semantics for that function are a bit underdocumented, and I thought
the other developers understood them after I explained them, but I found
out that they hadn't quite grasped the true extent of pain. So yes there
are other places that need to be cleaned up, but most of the time direct
ib access will work fine, until you have a buffer that straddles a
page boundary.

> Whatever. I prefer doing my own resolutions just so that I know what's
> going on, and it all seems to build and looks reasonable, but it's
> always good to get a second opinion. Particularly since I can't
> actually test the radeon stuff, so just eyeballing it and saying
> "looks semantically identical to Dave's resolution" may not be 100%
> sufficient..

Yup I've reviewed it and it looks fine, any cleanup is just going to be
an optimisation.

So I'll work with Alex/Jerome to clean up anything else out-of-band
and hopefully
we can avoid any big conflicts in future!

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> Highlights:
>
> i915: all over the map, haswell power well enhancements, valleyview macro 
> horrors cleaned up, killing lots of legacy GTT
> code,

Lowlight:

There's something wrong with i915 DP detection or whatever. I get
stuff like this:

[5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f
.
[8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f

and after that the screen ends up black.

It's happened twice now, but is not 100% repeatable. It looks like the
message itself is new,  but the black screen is also new and does seem
to happen when I get the message, so...

The second time I touched the power button, and the machine came back.
Apparently the suspend/resume cycle made it all magically work: the
suspend caused the same errors, but then the resume made it all good
again.

Some kind of missed initialization at bootup? It's not reliable enough
to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
irq-drive the dp aux communication") since that is where the message
was added..

Btw, looking at that commit, what do you think the semantics of the
timeout in something like

done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);

would be? What's that magic "10"? It's some totally random number.

Guys, it should be something meaningful. If you meant a tenth of a
second, use HZ/10 or something. Because just the plain "10" is crazy.
I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
hundreth of a second. Was that what you intended? Because if it was,
it is still crap, since CONFIG_HZ might be 100, and then you're
waiting for ten times longer.

IOW, passing in a random number like that is crazy. It cannot possibly
be right.

I have no idea whether the timeout has anything to do with anything,
but it reinforces my suspicion that there is something wrong with that
commit.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds
 wrote:
>
> Lowlight:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!

Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core
i5-670", aka Clarkdale, aka GMA-some-random-number). The one before
SB.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Dave Airlie
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.

Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is
on holiday this week,
so maybe if you can make it revert, that might be the best option,

If you want to just bump it so Ironlake isn't affected, (patch attached).

Is this external DP monitor or eDP laptop panel btw?

Dave.


0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie  wrote:
>
> If you want to just bump it so Ironlake isn't affected, (patch attached).

It works fine 95% of the time and isn't a hard failure when it
doesn't, so this isn't critical. I can wait for it to be fixed a
while.

> Is this external DP monitor or eDP laptop panel btw?

External monitor. Oh, and the monitor is actually connected to HDMI,
but the black screen and the DP messages definitely go hand-in-hand.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-27 Thread Chris Wilson
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...

That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.

So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.

The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/

As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool 
has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
-   if (!done)
-   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), 
status=%08x!\n",
+ has_aux_irq, status);
+   }
 #undef C
 
return status;

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
Hi,

I am seeing this also on Linux-Next.

/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

/var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

This seems to be hard reproducible...
Laptop-LCD... Sandybridge Mobile-GT2.

Is there a way to force the error?

Possible patch see [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/2192721/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> Hi,
>
> I am seeing this also on Linux-Next.
>
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> This seems to be hard reproducible...
> Laptop-LCD... Sandybridge Mobile-GT2.
>
> Is there a way to force the error?
>
> Possible patch see [1].
>
> - Sedat -
>
> [1] https://patchwork.kernel.org/patch/2192721/

Hmm, I tried to apply the test-patch against next-20130227 and it
fails building the i915 kernel-module.

- Sedat -
  LD  drivers/gpu/drm/i915/built-in.o
  CC [M]  drivers/gpu/drm/i915/i915_drv.o
  CC [M]  drivers/gpu/drm/i915/i915_dma.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_execbuffer.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_stolen.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/i915/i915_ums.o
  CC [M]  drivers/gpu/drm/i915/intel_display.o
  CC [M]  drivers/gpu/drm/i915/intel_crt.o
  CC [M]  drivers/gpu/drm/i915/intel_lvds.o
  CC [M]  drivers/gpu/drm/i915/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/intel_dp.o
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done':
drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for 
function 'intel_dp_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for 
function 'intel_dp_aux_native_write'
drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for 
function 'intel_dp_aux_native_write_1'
drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for 
function 'intel_dp_aux_native_read'
drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for 
function 'intel_dp_i2c_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for 
function 'intel_dp_i2c_init'
drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for 
function 'ironlake_set_pll_edp'
drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for 
function 'intel_dp_mode_set'
drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for 
function 'ironlake_wait_panel_status'
drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for 
function 'ironlake_wait_panel_on'
drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for 
function 'ironlake_wait_panel_off'
drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for 
function 'ironlake_wait_panel_power_cycle'
drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for 
function 'ironlake_get_pp_control'
drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for 
function 'ironlake_panel_vdd_off_sync'
drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for 
function 'ironlake_panel_vdd_work'
drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for 
function 'ironlake_edp_pll_on'
drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for 
function 'ironlake_edp_pll_off'
drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for 
function 'intel_dp_get_hw_state'
drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for 
function 'intel_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for 
function 'intel_post_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for 
function 'intel_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for 
function 'intel_pre_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for 
function 'intel_dp_aux_native_read_retry'
drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for 
function 'intel_dp_get_link_status'
drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for 
function 'intel_dp_voltage_max'
drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for 
function 'intel_dp_pre_emphasis_max'
drivers/gpu/drm/i915/intel_dp.c:1538:1: error:

Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Chris Wilson
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> > Hi,
> >
> > I am seeing this also on Linux-Next.
> >
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > This seems to be hard reproducible...
> > Laptop-LCD... Sandybridge Mobile-GT2.
> >
> > Is there a way to force the error?
> >
> > Possible patch see [1].
> >
> > - Sedat -
> >
> > [1] https://patchwork.kernel.org/patch/2192721/

That was:

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   }

You applied

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   {

That second '{' is the source of the compile error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>
>> So I don't think that's actually the cause of the problem.  Or at least
>> not that alone.  I reverted it on top of Linus' latest tree and I still
>> get the lockups.
>
> Actually, git bisect does seem to have gotten it correct.  Once I
> actually tested the revert of just that on top of Linus' tree (commit
> d895cb1af1), things seem to be working much better.  I've rebooted a
> dozen times without a lockup.  The most I've seen it take on a kernel
> with that commit included is 3 reboots, so that's definitely at least an
> improvement.

 I give up.  GPU issues are not my thing.  2 reboots after I sent that it
 gave me pretty rainbow static again.  So it might have been an
 improvement, but revert it is not a solution.

 Looking at there rest of the commits, the whole GPU rework might be
 suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code.  I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card.  It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great.  If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached.  The first attempts a full reset of all
> blocks if the MC (memory controller) is hung.  That may work better
> than just resetting the MC.  The second just disables MC reset.  I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK.  I'll test them individually.  It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Sedat Dilek
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson  wrote:
> On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
>> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
>> > Hi,
>> >
>> > I am seeing this also on Linux-Next.
>> >
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > This seems to be hard reproducible...
>> > Laptop-LCD... Sandybridge Mobile-GT2.
>> >
>> > Is there a way to force the error?
>> >
>> > Possible patch see [1].
>> >
>> > - Sedat -
>> >
>> > [1] https://patchwork.kernel.org/patch/2192721/
>
> That was:
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   }
>
> You applied
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   {
>
> That second '{' is the source of the compile error.

Schei**e, OK I try with a v2.

A hint how to force the error?

- Sedat -

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
 ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>
>>> So I don't think that's actually the cause of the problem.  Or at least
>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>> get the lockups.
>>
>> Actually, git bisect does seem to have gotten it correct.  Once I
>> actually tested the revert of just that on top of Linus' tree (commit
>> d895cb1af1), things seem to be working much better.  I've rebooted a
>> dozen times without a lockup.  The most I've seen it take on a kernel
>> with that commit included is 3 reboots, so that's definitely at least an
>> improvement.
>
> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
> gave me pretty rainbow static again.  So it might have been an
> improvement, but revert it is not a solution.
>
> Looking at there rest of the commits, the whole GPU rework might be
> suspect, but I clearly have no clue.

 GPUs are tricky beasts :)
>>>
>>> Understatement ;).
>>>
 ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
 problem anyway since it only affects 6xx/7xx and your card is handled
 by the evergreen code.  I'll put together some patches to help narrow
 down the problem.
>>>
>>> Yeah, that's the biggest problem I have, not knowing which functions are
>>> actually being executed for this card.  It looks like a combination of
>>> stuff in evergreen.c and ni.c, but I have no idea.
>>>
>>> Patches would be great.  If nothing else, I'm really good at building
>>> kernels and rebooting by now.
>>
>> Two possible fixes attached.  The first attempts a full reset of all
>> blocks if the MC (memory controller) is hung.  That may work better
>> than just resetting the MC.  The second just disables MC reset.  I'm
>> not sure we can reliably tell if it's busy due to display requests
>> hitting the MC periodically which would lead to needlessly resetting
>> it possibly leading to failures like you are seeing.
>
> OK.  I'll test them individually.  It will probably take a bit because
> I'll want to do numerous reboots if things seem "fixed" with one or the
> other.
>
> I'll let you know how things go.

I applied each individually on top of Linus' tree as of this morning
(commit 2a7d2b96d5) built, installed, and tested.

0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
two reboots.

0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
21 reboots without a hang/rainbow static.  You'll understand if I'm
hesitant to declare success, but resetting the MC does indeed appear to
be the issue.  I'll keep rebooting for a while to make sure.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-03 Thread Azat Khuzhin
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f


I have the same messages after upgrading up to
b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50
But in my case when I reboot computer the second monitor, that plugged
via HDMI, didn't works, end when I run `xrandr`, I have next messages
in kern.log

Mar  3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f
Mar  3 18:09:15 home-spb kernel: [12321.771715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.782712]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.793715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.804719]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.815725]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f

# lspci | fgrep -i graph
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09)

I tested some commits, and here the results:
- Breaked at v3.8-10206-gb0af9cd
- Works normal v3.8-rc3-139-g34f2be4
- Works normal v3.8-rc3-188-g10aa17c
- Works normal 6dc1c49

I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it
works for me.
Thank, Dave.

>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
>
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
>
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Respectfully
Azat Khuzhin
Primary email a3at.m...@gmail.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Alex Deucher
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  
>>> wrote:
 On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
> wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>
> So I don't think that's actually the cause of the problem.  Or at 
> least
> not that alone.  I reverted it on top of Linus' latest tree and I 
> still
> get the lockups.

 Actually, git bisect does seem to have gotten it correct.  Once I
 actually tested the revert of just that on top of Linus' tree (commit
 d895cb1af1), things seem to be working much better.  I've rebooted a
 dozen times without a lockup.  The most I've seen it take on a kernel
 with that commit included is 3 reboots, so that's definitely at least 
 an
 improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

 Two possible fixes attached.  The first attempts a full reset of all
 blocks if the MC (memory controller) is hung.  That may work better
 than just resetting the MC.  The second just disables MC reset.  I'm
 not sure we can reliably tell if it's busy due to display requests
 hitting the MC periodically which would lead to needlessly resetting
 it possibly leading to failures like you are seeing.
>>>
>>> OK.  I'll test them individually.  It will probably take a bit because
>>> I'll want to do numerous reboots if things seem "fixed" with one or the
>>> other.
>>>
>>> I'll let you know how things go.
>>
>> I applied each individually on top of Linus' tree as of this morning
>> (commit 2a7d2b96d5) built, installed, and tested.
>>
>> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
>> two reboots.
>>
>> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
>> 21 reboots without a hang/rainbow static.  You'll understand if I'm
>> hesitant to declare success, but resetting the MC does indeed appear to
>> be the issue.  I'll keep rebooting for a while to make sure.
>
> OK, I'm still running on the kernel with that patch and things still
> work.  The only other "issue" I'm seeing at the moment is my dmesg is
> full of:
>
> [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.
>

I'll make those debug only when the patch goes upstream.

> So hopefully your patch is on the way into Linus' tree at some point
> soon.

It'll be in my next -fixes pull.

Alex
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Daniel Vetter
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
> 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
> 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.
> 
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
> 
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
> 
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.

Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies
vs. msec confusion. And the other to plug a race in our irq handler which
did lead to missed dp aux interrupts according to some digging done by
Imre. The important patch is the current tip of

git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes

44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south 
interrupts when handling them

Just in case you want to give it a quick whirl. Since the failed dp aux
transaction caused the resume modeset to fail for you (resulting in the
black screen) I hope that this should fix both issues.

I'll forward the pull to Dave in a few days since atm I'm stalling a bit
for confirmation on another little regression fix. And there's nothing
earth-shattering in my -fixes queue right now.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Josh Boyer
On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
 On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
 wrote:
> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit

 So I don't think that's actually the cause of the problem.  Or at least
 not that alone.  I reverted it on top of Linus' latest tree and I still
 get the lockups.
>>>
>>> Actually, git bisect does seem to have gotten it correct.  Once I
>>> actually tested the revert of just that on top of Linus' tree (commit
>>> d895cb1af1), things seem to be working much better.  I've rebooted a
>>> dozen times without a lockup.  The most I've seen it take on a kernel
>>> with that commit included is 3 reboots, so that's definitely at least an
>>> improvement.
>>
>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>> gave me pretty rainbow static again.  So it might have been an
>> improvement, but revert it is not a solution.
>>
>> Looking at there rest of the commits, the whole GPU rework might be
>> suspect, but I clearly have no clue.
>
> GPUs are tricky beasts :)

 Understatement ;).

> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
> problem anyway since it only affects 6xx/7xx and your card is handled
> by the evergreen code.  I'll put together some patches to help narrow
> down the problem.

 Yeah, that's the biggest problem I have, not knowing which functions are
 actually being executed for this card.  It looks like a combination of
 stuff in evergreen.c and ni.c, but I have no idea.

 Patches would be great.  If nothing else, I'm really good at building
 kernels and rebooting by now.
>>>
>>> Two possible fixes attached.  The first attempts a full reset of all
>>> blocks if the MC (memory controller) is hung.  That may work better
>>> than just resetting the MC.  The second just disables MC reset.  I'm
>>> not sure we can reliably tell if it's busy due to display requests
>>> hitting the MC periodically which would lead to needlessly resetting
>>> it possibly leading to failures like you are seeing.
>>
>> OK.  I'll test them individually.  It will probably take a bit because
>> I'll want to do numerous reboots if things seem "fixed" with one or the
>> other.
>>
>> I'll let you know how things go.
>
> I applied each individually on top of Linus' tree as of this morning
> (commit 2a7d2b96d5) built, installed, and tested.
>
> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
> two reboots.
>
> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
> 21 reboots without a hang/rainbow static.  You'll understand if I'm
> hesitant to declare success, but resetting the MC does indeed appear to
> be the issue.  I'll keep rebooting for a while to make sure.

OK, I'm still running on the kernel with that patch and things still
work.  The only other "issue" I'm seeing at the moment is my dmesg is
full of:

[349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
[349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.

So hopefully your patch is on the way into Linus' tree at some point
soon.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> So up front, this has a massive merge conflict in
> drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged
> in the same tree, I fixed up some small ordering issues in my merge as
> well, however they aren't important if you want the fun of doing a major
> conflict resolution.

I did the fun conflict resolution, so my tree doesn't have the ordering changes.

I also did some things slightly differently from you - you had left
some direct ib[] accesses that I spotted (see for example "case 0x48"
(aka "Copy L2T Frame to Field"), and yours apparently has a few cases
where you use "idx_value" instead of my mindless conflict resolution
that just re-did the brute-force "repace direct ib[] read accesses
with the radeon_get_ib_value() helper function". But you don't do it
for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Anyway - my conflict resolution isn't exactly the same as yours, and
maybe I screwed something up. But it's damn close, and the differences
_seem_ be all be benign.

Btw, why is it ok that some functions still read the ib[] array
directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
etc)?


Whatever. I prefer doing my own resolutions just so that I know what's
going on, and it all seems to build and looks reasonable, but it's
always good to get a second opinion. Particularly since I can't
actually test the radeon stuff, so just eyeballing it and saying
"looks semantically identical to Dave's resolution" may not be 100%
sufficient..

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Dave Airlie
>
> I did the fun conflict resolution, so my tree doesn't have the ordering 
> changes.
>
> I also did some things slightly differently from you - you had left
> some direct ib[] accesses that I spotted (see for example "case 0x48"
> (aka "Copy L2T Frame to Field"), and yours apparently has a few cases
> where you use "idx_value" instead of my mindless conflict resolution
> that just re-did the brute-force "repace direct ib[] read accesses
> with the radeon_get_ib_value() helper function". But you don't do it
> for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Yeah the rules for radeon_get_ib_value are that they are meant to be sequential,
but it actually doesn't matter as long as the values are within a page
of each other,
I was just avoiding multiple calls to get the same value with the idx_value, but
I think Alex or Jerome can clean this up a bit further anyways.

> Anyway - my conflict resolution isn't exactly the same as yours, and
> maybe I screwed something up. But it's damn close, and the differences
> _seem_ be all be benign.
>
> Btw, why is it ok that some functions still read the ib[] array
> directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
> etc)?

The semantics for that function are a bit underdocumented, and I thought
the other developers understood them after I explained them, but I found
out that they hadn't quite grasped the true extent of pain. So yes there
are other places that need to be cleaned up, but most of the time direct
ib access will work fine, until you have a buffer that straddles a
page boundary.

> Whatever. I prefer doing my own resolutions just so that I know what's
> going on, and it all seems to build and looks reasonable, but it's
> always good to get a second opinion. Particularly since I can't
> actually test the radeon stuff, so just eyeballing it and saying
> "looks semantically identical to Dave's resolution" may not be 100%
> sufficient..

Yup I've reviewed it and it looks fine, any cleanup is just going to be
an optimisation.

So I'll work with Alex/Jerome to clean up anything else out-of-band
and hopefully
we can avoid any big conflicts in future!

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> Highlights:
>
> i915: all over the map, haswell power well enhancements, valleyview macro 
> horrors cleaned up, killing lots of legacy GTT
> code,

Lowlight:

There's something wrong with i915 DP detection or whatever. I get
stuff like this:

[5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f
.
[8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f

and after that the screen ends up black.

It's happened twice now, but is not 100% repeatable. It looks like the
message itself is new,  but the black screen is also new and does seem
to happen when I get the message, so...

The second time I touched the power button, and the machine came back.
Apparently the suspend/resume cycle made it all magically work: the
suspend caused the same errors, but then the resume made it all good
again.

Some kind of missed initialization at bootup? It's not reliable enough
to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
irq-drive the dp aux communication") since that is where the message
was added..

Btw, looking at that commit, what do you think the semantics of the
timeout in something like

done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);

would be? What's that magic "10"? It's some totally random number.

Guys, it should be something meaningful. If you meant a tenth of a
second, use HZ/10 or something. Because just the plain "10" is crazy.
I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
hundreth of a second. Was that what you intended? Because if it was,
it is still crap, since CONFIG_HZ might be 100, and then you're
waiting for ten times longer.

IOW, passing in a random number like that is crazy. It cannot possibly
be right.

I have no idea whether the timeout has anything to do with anything,
but it reinforces my suspicion that there is something wrong with that
commit.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds
 wrote:
>
> Lowlight:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!

Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core
i5-670", aka Clarkdale, aka GMA-some-random-number). The one before
SB.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Dave Airlie
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.

Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is
on holiday this week,
so maybe if you can make it revert, that might be the best option,

If you want to just bump it so Ironlake isn't affected, (patch attached).

Is this external DP monitor or eDP laptop panel btw?

Dave.


0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie  wrote:
>
> If you want to just bump it so Ironlake isn't affected, (patch attached).

It works fine 95% of the time and isn't a hard failure when it
doesn't, so this isn't critical. I can wait for it to be fixed a
while.

> Is this external DP monitor or eDP laptop panel btw?

External monitor. Oh, and the monitor is actually connected to HDMI,
but the black screen and the DP messages definitely go hand-in-hand.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-27 Thread Chris Wilson
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...

That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.

So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.

The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/

As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool 
has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
-   if (!done)
-   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), 
status=%08x!\n",
+ has_aux_irq, status);
+   }
 #undef C
 
return status;

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
Hi,

I am seeing this also on Linux-Next.

/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

/var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

This seems to be hard reproducible...
Laptop-LCD... Sandybridge Mobile-GT2.

Is there a way to force the error?

Possible patch see [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/2192721/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> Hi,
>
> I am seeing this also on Linux-Next.
>
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> This seems to be hard reproducible...
> Laptop-LCD... Sandybridge Mobile-GT2.
>
> Is there a way to force the error?
>
> Possible patch see [1].
>
> - Sedat -
>
> [1] https://patchwork.kernel.org/patch/2192721/

Hmm, I tried to apply the test-patch against next-20130227 and it
fails building the i915 kernel-module.

- Sedat -
  LD  drivers/gpu/drm/i915/built-in.o
  CC [M]  drivers/gpu/drm/i915/i915_drv.o
  CC [M]  drivers/gpu/drm/i915/i915_dma.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_execbuffer.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_stolen.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/i915/i915_ums.o
  CC [M]  drivers/gpu/drm/i915/intel_display.o
  CC [M]  drivers/gpu/drm/i915/intel_crt.o
  CC [M]  drivers/gpu/drm/i915/intel_lvds.o
  CC [M]  drivers/gpu/drm/i915/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/intel_dp.o
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done':
drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for 
function 'intel_dp_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for 
function 'intel_dp_aux_native_write'
drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for 
function 'intel_dp_aux_native_write_1'
drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for 
function 'intel_dp_aux_native_read'
drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for 
function 'intel_dp_i2c_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for 
function 'intel_dp_i2c_init'
drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for 
function 'ironlake_set_pll_edp'
drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for 
function 'intel_dp_mode_set'
drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for 
function 'ironlake_wait_panel_status'
drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for 
function 'ironlake_wait_panel_on'
drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for 
function 'ironlake_wait_panel_off'
drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for 
function 'ironlake_wait_panel_power_cycle'
drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for 
function 'ironlake_get_pp_control'
drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for 
function 'ironlake_panel_vdd_off_sync'
drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for 
function 'ironlake_panel_vdd_work'
drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for 
function 'ironlake_edp_pll_on'
drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for 
function 'ironlake_edp_pll_off'
drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for 
function 'intel_dp_get_hw_state'
drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for 
function 'intel_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for 
function 'intel_post_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for 
function 'intel_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for 
function 'intel_pre_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for 
function 'intel_dp_aux_native_read_retry'
drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for 
function 'intel_dp_get_link_status'
drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for 
function 'intel_dp_voltage_max'
drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for 
function 'intel_dp_pre_emphasis_max'
drivers/gpu/drm/i915/intel_dp.c:1538:1: error:

Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Chris Wilson
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> > Hi,
> >
> > I am seeing this also on Linux-Next.
> >
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > This seems to be hard reproducible...
> > Laptop-LCD... Sandybridge Mobile-GT2.
> >
> > Is there a way to force the error?
> >
> > Possible patch see [1].
> >
> > - Sedat -
> >
> > [1] https://patchwork.kernel.org/patch/2192721/

That was:

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   }

You applied

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   {

That second '{' is the source of the compile error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>
>> So I don't think that's actually the cause of the problem.  Or at least
>> not that alone.  I reverted it on top of Linus' latest tree and I still
>> get the lockups.
>
> Actually, git bisect does seem to have gotten it correct.  Once I
> actually tested the revert of just that on top of Linus' tree (commit
> d895cb1af1), things seem to be working much better.  I've rebooted a
> dozen times without a lockup.  The most I've seen it take on a kernel
> with that commit included is 3 reboots, so that's definitely at least an
> improvement.

 I give up.  GPU issues are not my thing.  2 reboots after I sent that it
 gave me pretty rainbow static again.  So it might have been an
 improvement, but revert it is not a solution.

 Looking at there rest of the commits, the whole GPU rework might be
 suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code.  I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card.  It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great.  If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached.  The first attempts a full reset of all
> blocks if the MC (memory controller) is hung.  That may work better
> than just resetting the MC.  The second just disables MC reset.  I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK.  I'll test them individually.  It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Sedat Dilek
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson  wrote:
> On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
>> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
>> > Hi,
>> >
>> > I am seeing this also on Linux-Next.
>> >
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > This seems to be hard reproducible...
>> > Laptop-LCD... Sandybridge Mobile-GT2.
>> >
>> > Is there a way to force the error?
>> >
>> > Possible patch see [1].
>> >
>> > - Sedat -
>> >
>> > [1] https://patchwork.kernel.org/patch/2192721/
>
> That was:
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   }
>
> You applied
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   {
>
> That second '{' is the source of the compile error.

Schei**e, OK I try with a v2.

A hint how to force the error?

- Sedat -

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
 ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>
>>> So I don't think that's actually the cause of the problem.  Or at least
>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>> get the lockups.
>>
>> Actually, git bisect does seem to have gotten it correct.  Once I
>> actually tested the revert of just that on top of Linus' tree (commit
>> d895cb1af1), things seem to be working much better.  I've rebooted a
>> dozen times without a lockup.  The most I've seen it take on a kernel
>> with that commit included is 3 reboots, so that's definitely at least an
>> improvement.
>
> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
> gave me pretty rainbow static again.  So it might have been an
> improvement, but revert it is not a solution.
>
> Looking at there rest of the commits, the whole GPU rework might be
> suspect, but I clearly have no clue.

 GPUs are tricky beasts :)
>>>
>>> Understatement ;).
>>>
 ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
 problem anyway since it only affects 6xx/7xx and your card is handled
 by the evergreen code.  I'll put together some patches to help narrow
 down the problem.
>>>
>>> Yeah, that's the biggest problem I have, not knowing which functions are
>>> actually being executed for this card.  It looks like a combination of
>>> stuff in evergreen.c and ni.c, but I have no idea.
>>>
>>> Patches would be great.  If nothing else, I'm really good at building
>>> kernels and rebooting by now.
>>
>> Two possible fixes attached.  The first attempts a full reset of all
>> blocks if the MC (memory controller) is hung.  That may work better
>> than just resetting the MC.  The second just disables MC reset.  I'm
>> not sure we can reliably tell if it's busy due to display requests
>> hitting the MC periodically which would lead to needlessly resetting
>> it possibly leading to failures like you are seeing.
>
> OK.  I'll test them individually.  It will probably take a bit because
> I'll want to do numerous reboots if things seem "fixed" with one or the
> other.
>
> I'll let you know how things go.

I applied each individually on top of Linus' tree as of this morning
(commit 2a7d2b96d5) built, installed, and tested.

0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
two reboots.

0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
21 reboots without a hang/rainbow static.  You'll understand if I'm
hesitant to declare success, but resetting the MC does indeed appear to
be the issue.  I'll keep rebooting for a while to make sure.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-03 Thread Azat Khuzhin
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f


I have the same messages after upgrading up to
b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50
But in my case when I reboot computer the second monitor, that plugged
via HDMI, didn't works, end when I run `xrandr`, I have next messages
in kern.log

Mar  3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f
Mar  3 18:09:15 home-spb kernel: [12321.771715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.782712]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.793715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.804719]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.815725]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f

# lspci | fgrep -i graph
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09)

I tested some commits, and here the results:
- Breaked at v3.8-10206-gb0af9cd
- Works normal v3.8-rc3-139-g34f2be4
- Works normal v3.8-rc3-188-g10aa17c
- Works normal 6dc1c49

I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it
works for me.
Thank, Dave.

>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
>
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
>
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Respectfully
Azat Khuzhin
Primary email a3at.m...@gmail.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Alex Deucher
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  
>>> wrote:
 On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
> wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>
> So I don't think that's actually the cause of the problem.  Or at 
> least
> not that alone.  I reverted it on top of Linus' latest tree and I 
> still
> get the lockups.

 Actually, git bisect does seem to have gotten it correct.  Once I
 actually tested the revert of just that on top of Linus' tree (commit
 d895cb1af1), things seem to be working much better.  I've rebooted a
 dozen times without a lockup.  The most I've seen it take on a kernel
 with that commit included is 3 reboots, so that's definitely at least 
 an
 improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

 Two possible fixes attached.  The first attempts a full reset of all
 blocks if the MC (memory controller) is hung.  That may work better
 than just resetting the MC.  The second just disables MC reset.  I'm
 not sure we can reliably tell if it's busy due to display requests
 hitting the MC periodically which would lead to needlessly resetting
 it possibly leading to failures like you are seeing.
>>>
>>> OK.  I'll test them individually.  It will probably take a bit because
>>> I'll want to do numerous reboots if things seem "fixed" with one or the
>>> other.
>>>
>>> I'll let you know how things go.
>>
>> I applied each individually on top of Linus' tree as of this morning
>> (commit 2a7d2b96d5) built, installed, and tested.
>>
>> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
>> two reboots.
>>
>> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
>> 21 reboots without a hang/rainbow static.  You'll understand if I'm
>> hesitant to declare success, but resetting the MC does indeed appear to
>> be the issue.  I'll keep rebooting for a while to make sure.
>
> OK, I'm still running on the kernel with that patch and things still
> work.  The only other "issue" I'm seeing at the moment is my dmesg is
> full of:
>
> [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.
>

I'll make those debug only when the patch goes upstream.

> So hopefully your patch is on the way into Linus' tree at some point
> soon.

It'll be in my next -fixes pull.

Alex
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Daniel Vetter
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
> 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
> 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.
> 
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
> 
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
> 
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.

Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies
vs. msec confusion. And the other to plug a race in our irq handler which
did lead to missed dp aux interrupts according to some digging done by
Imre. The important patch is the current tip of

git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes

44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south 
interrupts when handling them

Just in case you want to give it a quick whirl. Since the failed dp aux
transaction caused the resume modeset to fail for you (resulting in the
black screen) I hope that this should fix both issues.

I'll forward the pull to Dave in a few days since atm I'm stalling a bit
for confirmation on another little regression fix. And there's nothing
earth-shattering in my -fixes queue right now.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Josh Boyer
On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
 On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
 wrote:
> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit

 So I don't think that's actually the cause of the problem.  Or at least
 not that alone.  I reverted it on top of Linus' latest tree and I still
 get the lockups.
>>>
>>> Actually, git bisect does seem to have gotten it correct.  Once I
>>> actually tested the revert of just that on top of Linus' tree (commit
>>> d895cb1af1), things seem to be working much better.  I've rebooted a
>>> dozen times without a lockup.  The most I've seen it take on a kernel
>>> with that commit included is 3 reboots, so that's definitely at least an
>>> improvement.
>>
>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>> gave me pretty rainbow static again.  So it might have been an
>> improvement, but revert it is not a solution.
>>
>> Looking at there rest of the commits, the whole GPU rework might be
>> suspect, but I clearly have no clue.
>
> GPUs are tricky beasts :)

 Understatement ;).

> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
> problem anyway since it only affects 6xx/7xx and your card is handled
> by the evergreen code.  I'll put together some patches to help narrow
> down the problem.

 Yeah, that's the biggest problem I have, not knowing which functions are
 actually being executed for this card.  It looks like a combination of
 stuff in evergreen.c and ni.c, but I have no idea.

 Patches would be great.  If nothing else, I'm really good at building
 kernels and rebooting by now.
>>>
>>> Two possible fixes attached.  The first attempts a full reset of all
>>> blocks if the MC (memory controller) is hung.  That may work better
>>> than just resetting the MC.  The second just disables MC reset.  I'm
>>> not sure we can reliably tell if it's busy due to display requests
>>> hitting the MC periodically which would lead to needlessly resetting
>>> it possibly leading to failures like you are seeing.
>>
>> OK.  I'll test them individually.  It will probably take a bit because
>> I'll want to do numerous reboots if things seem "fixed" with one or the
>> other.
>>
>> I'll let you know how things go.
>
> I applied each individually on top of Linus' tree as of this morning
> (commit 2a7d2b96d5) built, installed, and tested.
>
> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
> two reboots.
>
> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
> 21 reboots without a hang/rainbow static.  You'll understand if I'm
> hesitant to declare success, but resetting the MC does indeed appear to
> be the issue.  I'll keep rebooting for a while to make sure.

OK, I'm still running on the kernel with that patch and things still
work.  The only other "issue" I'm seeing at the moment is my dmesg is
full of:

[349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
[349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.

So hopefully your patch is on the way into Linus' tree at some point
soon.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> So up front, this has a massive merge conflict in
> drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged
> in the same tree, I fixed up some small ordering issues in my merge as
> well, however they aren't important if you want the fun of doing a major
> conflict resolution.

I did the fun conflict resolution, so my tree doesn't have the ordering changes.

I also did some things slightly differently from you - you had left
some direct ib[] accesses that I spotted (see for example "case 0x48"
(aka "Copy L2T Frame to Field"), and yours apparently has a few cases
where you use "idx_value" instead of my mindless conflict resolution
that just re-did the brute-force "repace direct ib[] read accesses
with the radeon_get_ib_value() helper function". But you don't do it
for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Anyway - my conflict resolution isn't exactly the same as yours, and
maybe I screwed something up. But it's damn close, and the differences
_seem_ be all be benign.

Btw, why is it ok that some functions still read the ib[] array
directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
etc)?


Whatever. I prefer doing my own resolutions just so that I know what's
going on, and it all seems to build and looks reasonable, but it's
always good to get a second opinion. Particularly since I can't
actually test the radeon stuff, so just eyeballing it and saying
"looks semantically identical to Dave's resolution" may not be 100%
sufficient..

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Dave Airlie
>
> I did the fun conflict resolution, so my tree doesn't have the ordering 
> changes.
>
> I also did some things slightly differently from you - you had left
> some direct ib[] accesses that I spotted (see for example "case 0x48"
> (aka "Copy L2T Frame to Field"), and yours apparently has a few cases
> where you use "idx_value" instead of my mindless conflict resolution
> that just re-did the brute-force "repace direct ib[] read accesses
> with the radeon_get_ib_value() helper function". But you don't do it
> for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Yeah the rules for radeon_get_ib_value are that they are meant to be sequential,
but it actually doesn't matter as long as the values are within a page
of each other,
I was just avoiding multiple calls to get the same value with the idx_value, but
I think Alex or Jerome can clean this up a bit further anyways.

> Anyway - my conflict resolution isn't exactly the same as yours, and
> maybe I screwed something up. But it's damn close, and the differences
> _seem_ be all be benign.
>
> Btw, why is it ok that some functions still read the ib[] array
> directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
> etc)?

The semantics for that function are a bit underdocumented, and I thought
the other developers understood them after I explained them, but I found
out that they hadn't quite grasped the true extent of pain. So yes there
are other places that need to be cleaned up, but most of the time direct
ib access will work fine, until you have a buffer that straddles a
page boundary.

> Whatever. I prefer doing my own resolutions just so that I know what's
> going on, and it all seems to build and looks reasonable, but it's
> always good to get a second opinion. Particularly since I can't
> actually test the radeon stuff, so just eyeballing it and saying
> "looks semantically identical to Dave's resolution" may not be 100%
> sufficient..

Yup I've reviewed it and it looks fine, any cleanup is just going to be
an optimisation.

So I'll work with Alex/Jerome to clean up anything else out-of-band
and hopefully
we can avoid any big conflicts in future!

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> Highlights:
>
> i915: all over the map, haswell power well enhancements, valleyview macro 
> horrors cleaned up, killing lots of legacy GTT
> code,

Lowlight:

There's something wrong with i915 DP detection or whatever. I get
stuff like this:

[5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f
.
[8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f

and after that the screen ends up black.

It's happened twice now, but is not 100% repeatable. It looks like the
message itself is new,  but the black screen is also new and does seem
to happen when I get the message, so...

The second time I touched the power button, and the machine came back.
Apparently the suspend/resume cycle made it all magically work: the
suspend caused the same errors, but then the resume made it all good
again.

Some kind of missed initialization at bootup? It's not reliable enough
to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
irq-drive the dp aux communication") since that is where the message
was added..

Btw, looking at that commit, what do you think the semantics of the
timeout in something like

done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);

would be? What's that magic "10"? It's some totally random number.

Guys, it should be something meaningful. If you meant a tenth of a
second, use HZ/10 or something. Because just the plain "10" is crazy.
I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
hundreth of a second. Was that what you intended? Because if it was,
it is still crap, since CONFIG_HZ might be 100, and then you're
waiting for ten times longer.

IOW, passing in a random number like that is crazy. It cannot possibly
be right.

I have no idea whether the timeout has anything to do with anything,
but it reinforces my suspicion that there is something wrong with that
commit.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds
 wrote:
>
> Lowlight:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!

Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core
i5-670", aka Clarkdale, aka GMA-some-random-number). The one before
SB.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Dave Airlie
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.

Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is
on holiday this week,
so maybe if you can make it revert, that might be the best option,

If you want to just bump it so Ironlake isn't affected, (patch attached).

Is this external DP monitor or eDP laptop panel btw?

Dave.


0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie  wrote:
>
> If you want to just bump it so Ironlake isn't affected, (patch attached).

It works fine 95% of the time and isn't a hard failure when it
doesn't, so this isn't critical. I can wait for it to be fixed a
while.

> Is this external DP monitor or eDP laptop panel btw?

External monitor. Oh, and the monitor is actually connected to HDMI,
but the black screen and the DP messages definitely go hand-in-hand.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-27 Thread Chris Wilson
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...

That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.

So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.

The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/

As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool 
has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
-   if (!done)
-   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), 
status=%08x!\n",
+ has_aux_irq, status);
+   }
 #undef C
 
return status;

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
Hi,

I am seeing this also on Linux-Next.

/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

/var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

This seems to be hard reproducible...
Laptop-LCD... Sandybridge Mobile-GT2.

Is there a way to force the error?

Possible patch see [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/2192721/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> Hi,
>
> I am seeing this also on Linux-Next.
>
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> This seems to be hard reproducible...
> Laptop-LCD... Sandybridge Mobile-GT2.
>
> Is there a way to force the error?
>
> Possible patch see [1].
>
> - Sedat -
>
> [1] https://patchwork.kernel.org/patch/2192721/

Hmm, I tried to apply the test-patch against next-20130227 and it
fails building the i915 kernel-module.

- Sedat -
  LD  drivers/gpu/drm/i915/built-in.o
  CC [M]  drivers/gpu/drm/i915/i915_drv.o
  CC [M]  drivers/gpu/drm/i915/i915_dma.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_execbuffer.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_stolen.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/i915/i915_ums.o
  CC [M]  drivers/gpu/drm/i915/intel_display.o
  CC [M]  drivers/gpu/drm/i915/intel_crt.o
  CC [M]  drivers/gpu/drm/i915/intel_lvds.o
  CC [M]  drivers/gpu/drm/i915/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/intel_dp.o
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done':
drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for 
function 'intel_dp_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for 
function 'intel_dp_aux_native_write'
drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for 
function 'intel_dp_aux_native_write_1'
drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for 
function 'intel_dp_aux_native_read'
drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for 
function 'intel_dp_i2c_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for 
function 'intel_dp_i2c_init'
drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for 
function 'ironlake_set_pll_edp'
drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for 
function 'intel_dp_mode_set'
drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for 
function 'ironlake_wait_panel_status'
drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for 
function 'ironlake_wait_panel_on'
drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for 
function 'ironlake_wait_panel_off'
drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for 
function 'ironlake_wait_panel_power_cycle'
drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for 
function 'ironlake_get_pp_control'
drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for 
function 'ironlake_panel_vdd_off_sync'
drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for 
function 'ironlake_panel_vdd_work'
drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for 
function 'ironlake_edp_pll_on'
drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for 
function 'ironlake_edp_pll_off'
drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for 
function 'intel_dp_get_hw_state'
drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for 
function 'intel_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for 
function 'intel_post_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for 
function 'intel_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for 
function 'intel_pre_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for 
function 'intel_dp_aux_native_read_retry'
drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for 
function 'intel_dp_get_link_status'
drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for 
function 'intel_dp_voltage_max'
drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for 
function 'intel_dp_pre_emphasis_max'
drivers/gpu/drm/i915/intel_dp.c:1538:1: error:

Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Chris Wilson
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> > Hi,
> >
> > I am seeing this also on Linux-Next.
> >
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > This seems to be hard reproducible...
> > Laptop-LCD... Sandybridge Mobile-GT2.
> >
> > Is there a way to force the error?
> >
> > Possible patch see [1].
> >
> > - Sedat -
> >
> > [1] https://patchwork.kernel.org/patch/2192721/

That was:

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   }

You applied

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   {

That second '{' is the source of the compile error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>
>> So I don't think that's actually the cause of the problem.  Or at least
>> not that alone.  I reverted it on top of Linus' latest tree and I still
>> get the lockups.
>
> Actually, git bisect does seem to have gotten it correct.  Once I
> actually tested the revert of just that on top of Linus' tree (commit
> d895cb1af1), things seem to be working much better.  I've rebooted a
> dozen times without a lockup.  The most I've seen it take on a kernel
> with that commit included is 3 reboots, so that's definitely at least an
> improvement.

 I give up.  GPU issues are not my thing.  2 reboots after I sent that it
 gave me pretty rainbow static again.  So it might have been an
 improvement, but revert it is not a solution.

 Looking at there rest of the commits, the whole GPU rework might be
 suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code.  I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card.  It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great.  If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached.  The first attempts a full reset of all
> blocks if the MC (memory controller) is hung.  That may work better
> than just resetting the MC.  The second just disables MC reset.  I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK.  I'll test them individually.  It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Sedat Dilek
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson  wrote:
> On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
>> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
>> > Hi,
>> >
>> > I am seeing this also on Linux-Next.
>> >
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > This seems to be hard reproducible...
>> > Laptop-LCD... Sandybridge Mobile-GT2.
>> >
>> > Is there a way to force the error?
>> >
>> > Possible patch see [1].
>> >
>> > - Sedat -
>> >
>> > [1] https://patchwork.kernel.org/patch/2192721/
>
> That was:
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   }
>
> You applied
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   {
>
> That second '{' is the source of the compile error.

Schei**e, OK I try with a v2.

A hint how to force the error?

- Sedat -

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
 ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>
>>> So I don't think that's actually the cause of the problem.  Or at least
>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>> get the lockups.
>>
>> Actually, git bisect does seem to have gotten it correct.  Once I
>> actually tested the revert of just that on top of Linus' tree (commit
>> d895cb1af1), things seem to be working much better.  I've rebooted a
>> dozen times without a lockup.  The most I've seen it take on a kernel
>> with that commit included is 3 reboots, so that's definitely at least an
>> improvement.
>
> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
> gave me pretty rainbow static again.  So it might have been an
> improvement, but revert it is not a solution.
>
> Looking at there rest of the commits, the whole GPU rework might be
> suspect, but I clearly have no clue.

 GPUs are tricky beasts :)
>>>
>>> Understatement ;).
>>>
 ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
 problem anyway since it only affects 6xx/7xx and your card is handled
 by the evergreen code.  I'll put together some patches to help narrow
 down the problem.
>>>
>>> Yeah, that's the biggest problem I have, not knowing which functions are
>>> actually being executed for this card.  It looks like a combination of
>>> stuff in evergreen.c and ni.c, but I have no idea.
>>>
>>> Patches would be great.  If nothing else, I'm really good at building
>>> kernels and rebooting by now.
>>
>> Two possible fixes attached.  The first attempts a full reset of all
>> blocks if the MC (memory controller) is hung.  That may work better
>> than just resetting the MC.  The second just disables MC reset.  I'm
>> not sure we can reliably tell if it's busy due to display requests
>> hitting the MC periodically which would lead to needlessly resetting
>> it possibly leading to failures like you are seeing.
>
> OK.  I'll test them individually.  It will probably take a bit because
> I'll want to do numerous reboots if things seem "fixed" with one or the
> other.
>
> I'll let you know how things go.

I applied each individually on top of Linus' tree as of this morning
(commit 2a7d2b96d5) built, installed, and tested.

0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
two reboots.

0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
21 reboots without a hang/rainbow static.  You'll understand if I'm
hesitant to declare success, but resetting the MC does indeed appear to
be the issue.  I'll keep rebooting for a while to make sure.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-03 Thread Azat Khuzhin
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f


I have the same messages after upgrading up to
b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50
But in my case when I reboot computer the second monitor, that plugged
via HDMI, didn't works, end when I run `xrandr`, I have next messages
in kern.log

Mar  3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f
Mar  3 18:09:15 home-spb kernel: [12321.771715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.782712]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.793715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.804719]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.815725]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f

# lspci | fgrep -i graph
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09)

I tested some commits, and here the results:
- Breaked at v3.8-10206-gb0af9cd
- Works normal v3.8-rc3-139-g34f2be4
- Works normal v3.8-rc3-188-g10aa17c
- Works normal 6dc1c49

I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it
works for me.
Thank, Dave.

>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
>
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
>
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Respectfully
Azat Khuzhin
Primary email a3at.m...@gmail.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Alex Deucher
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  
>>> wrote:
 On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
> wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>
> So I don't think that's actually the cause of the problem.  Or at 
> least
> not that alone.  I reverted it on top of Linus' latest tree and I 
> still
> get the lockups.

 Actually, git bisect does seem to have gotten it correct.  Once I
 actually tested the revert of just that on top of Linus' tree (commit
 d895cb1af1), things seem to be working much better.  I've rebooted a
 dozen times without a lockup.  The most I've seen it take on a kernel
 with that commit included is 3 reboots, so that's definitely at least 
 an
 improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

 Two possible fixes attached.  The first attempts a full reset of all
 blocks if the MC (memory controller) is hung.  That may work better
 than just resetting the MC.  The second just disables MC reset.  I'm
 not sure we can reliably tell if it's busy due to display requests
 hitting the MC periodically which would lead to needlessly resetting
 it possibly leading to failures like you are seeing.
>>>
>>> OK.  I'll test them individually.  It will probably take a bit because
>>> I'll want to do numerous reboots if things seem "fixed" with one or the
>>> other.
>>>
>>> I'll let you know how things go.
>>
>> I applied each individually on top of Linus' tree as of this morning
>> (commit 2a7d2b96d5) built, installed, and tested.
>>
>> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
>> two reboots.
>>
>> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
>> 21 reboots without a hang/rainbow static.  You'll understand if I'm
>> hesitant to declare success, but resetting the MC does indeed appear to
>> be the issue.  I'll keep rebooting for a while to make sure.
>
> OK, I'm still running on the kernel with that patch and things still
> work.  The only other "issue" I'm seeing at the moment is my dmesg is
> full of:
>
> [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.
>

I'll make those debug only when the patch goes upstream.

> So hopefully your patch is on the way into Linus' tree at some point
> soon.

It'll be in my next -fixes pull.

Alex
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Daniel Vetter
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
> 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
> 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.
> 
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
> 
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
> 
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.

Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies
vs. msec confusion. And the other to plug a race in our irq handler which
did lead to missed dp aux interrupts according to some digging done by
Imre. The important patch is the current tip of

git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes

44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south 
interrupts when handling them

Just in case you want to give it a quick whirl. Since the failed dp aux
transaction caused the resume modeset to fail for you (resulting in the
black screen) I hope that this should fix both issues.

I'll forward the pull to Dave in a few days since atm I'm stalling a bit
for confirmation on another little regression fix. And there's nothing
earth-shattering in my -fixes queue right now.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Josh Boyer
On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
 On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
 wrote:
> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit

 So I don't think that's actually the cause of the problem.  Or at least
 not that alone.  I reverted it on top of Linus' latest tree and I still
 get the lockups.
>>>
>>> Actually, git bisect does seem to have gotten it correct.  Once I
>>> actually tested the revert of just that on top of Linus' tree (commit
>>> d895cb1af1), things seem to be working much better.  I've rebooted a
>>> dozen times without a lockup.  The most I've seen it take on a kernel
>>> with that commit included is 3 reboots, so that's definitely at least an
>>> improvement.
>>
>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>> gave me pretty rainbow static again.  So it might have been an
>> improvement, but revert it is not a solution.
>>
>> Looking at there rest of the commits, the whole GPU rework might be
>> suspect, but I clearly have no clue.
>
> GPUs are tricky beasts :)

 Understatement ;).

> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
> problem anyway since it only affects 6xx/7xx and your card is handled
> by the evergreen code.  I'll put together some patches to help narrow
> down the problem.

 Yeah, that's the biggest problem I have, not knowing which functions are
 actually being executed for this card.  It looks like a combination of
 stuff in evergreen.c and ni.c, but I have no idea.

 Patches would be great.  If nothing else, I'm really good at building
 kernels and rebooting by now.
>>>
>>> Two possible fixes attached.  The first attempts a full reset of all
>>> blocks if the MC (memory controller) is hung.  That may work better
>>> than just resetting the MC.  The second just disables MC reset.  I'm
>>> not sure we can reliably tell if it's busy due to display requests
>>> hitting the MC periodically which would lead to needlessly resetting
>>> it possibly leading to failures like you are seeing.
>>
>> OK.  I'll test them individually.  It will probably take a bit because
>> I'll want to do numerous reboots if things seem "fixed" with one or the
>> other.
>>
>> I'll let you know how things go.
>
> I applied each individually on top of Linus' tree as of this morning
> (commit 2a7d2b96d5) built, installed, and tested.
>
> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
> two reboots.
>
> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
> 21 reboots without a hang/rainbow static.  You'll understand if I'm
> hesitant to declare success, but resetting the MC does indeed appear to
> be the issue.  I'll keep rebooting for a while to make sure.

OK, I'm still running on the kernel with that patch and things still
work.  The only other "issue" I'm seeing at the moment is my dmesg is
full of:

[349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
[349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
[349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
[349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.

So hopefully your patch is on the way into Linus' tree at some point
soon.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> So up front, this has a massive merge conflict in
> drivers/gpu/drm/radeon/evergreen_cs.c I've fixed it up in drm-next-merged
> in the same tree, I fixed up some small ordering issues in my merge as
> well, however they aren't important if you want the fun of doing a major
> conflict resolution.

I did the fun conflict resolution, so my tree doesn't have the ordering changes.

I also did some things slightly differently from you - you had left
some direct ib[] accesses that I spotted (see for example "case 0x48"
(aka "Copy L2T Frame to Field"), and yours apparently has a few cases
where you use "idx_value" instead of my mindless conflict resolution
that just re-did the brute-force "repace direct ib[] read accesses
with the radeon_get_ib_value() helper function". But you don't do it
for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Anyway - my conflict resolution isn't exactly the same as yours, and
maybe I screwed something up. But it's damn close, and the differences
_seem_ be all be benign.

Btw, why is it ok that some functions still read the ib[] array
directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
etc)?


Whatever. I prefer doing my own resolutions just so that I know what's
going on, and it all seems to build and looks reasonable, but it's
always good to get a second opinion. Particularly since I can't
actually test the radeon stuff, so just eyeballing it and saying
"looks semantically identical to Dave's resolution" may not be 100%
sufficient..

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-25 Thread Dave Airlie
>
> I did the fun conflict resolution, so my tree doesn't have the ordering 
> changes.
>
> I also did some things slightly differently from you - you had left
> some direct ib[] accesses that I spotted (see for example "case 0x48"
> (aka "Copy L2T Frame to Field"), and yours apparently has a few cases
> where you use "idx_value" instead of my mindless conflict resolution
> that just re-did the brute-force "repace direct ib[] read accesses
> with the radeon_get_ib_value() helper function". But you don't do it
> for *all* the radeon_get_ib_value(p, idx+2) users, so whatever.

Yeah the rules for radeon_get_ib_value are that they are meant to be sequential,
but it actually doesn't matter as long as the values are within a page
of each other,
I was just avoiding multiple calls to get the same value with the idx_value, but
I think Alex or Jerome can clean this up a bit further anyways.

> Anyway - my conflict resolution isn't exactly the same as yours, and
> maybe I screwed something up. But it's damn close, and the differences
> _seem_ be all be benign.
>
> Btw, why is it ok that some functions still read the ib[] array
> directly (eg evergreen_vm_packet3_check() or evergreen_cs_check_reg()
> etc)?

The semantics for that function are a bit underdocumented, and I thought
the other developers understood them after I explained them, but I found
out that they hadn't quite grasped the true extent of pain. So yes there
are other places that need to be cleaned up, but most of the time direct
ib access will work fine, until you have a buffer that straddles a
page boundary.

> Whatever. I prefer doing my own resolutions just so that I know what's
> going on, and it all seems to build and looks reasonable, but it's
> always good to get a second opinion. Particularly since I can't
> actually test the radeon stuff, so just eyeballing it and saying
> "looks semantically identical to Dave's resolution" may not be 100%
> sufficient..

Yup I've reviewed it and it looks fine, any cleanup is just going to be
an optimisation.

So I'll work with Alex/Jerome to clean up anything else out-of-band
and hopefully
we can avoid any big conflicts in future!

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>
> Highlights:
>
> i915: all over the map, haswell power well enhancements, valleyview macro 
> horrors cleaned up, killing lots of legacy GTT
> code,

Lowlight:

There's something wrong with i915 DP detection or whatever. I get
stuff like this:

[5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
signal timeout (has irq: 1)!
[5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f
.
[8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
0xa145003f

and after that the screen ends up black.

It's happened twice now, but is not 100% repeatable. It looks like the
message itself is new,  but the black screen is also new and does seem
to happen when I get the message, so...

The second time I touched the power button, and the machine came back.
Apparently the suspend/resume cycle made it all magically work: the
suspend caused the same errors, but then the resume made it all good
again.

Some kind of missed initialization at bootup? It's not reliable enough
to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
irq-drive the dp aux communication") since that is where the message
was added..

Btw, looking at that commit, what do you think the semantics of the
timeout in something like

done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);

would be? What's that magic "10"? It's some totally random number.

Guys, it should be something meaningful. If you meant a tenth of a
second, use HZ/10 or something. Because just the plain "10" is crazy.
I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
hundreth of a second. Was that what you intended? Because if it was,
it is still crap, since CONFIG_HZ might be 100, and then you're
waiting for ten times longer.

IOW, passing in a random number like that is crazy. It cannot possibly
be right.

I have no idea whether the timeout has anything to do with anything,
but it reinforces my suspicion that there is something wrong with that
commit.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 5:39 PM, Linus Torvalds
 wrote:
>
> Lowlight:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!

Oh, forgot to mention - this is my trusty old Westmere chip (aka "Core
i5-670", aka Clarkdale, aka GMA-some-random-number). The one before
SB.

   Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Dave Airlie
On Wed, Feb 27, 2013 at 11:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.

Yeah the looks bogus, Daniel and Imre fail, though I think Daniel is
on holiday this week,
so maybe if you can make it revert, that might be the best option,

If you want to just bump it so Ironlake isn't affected, (patch attached).

Is this external DP monitor or eDP laptop panel btw?

Dave.


0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch
Description: Binary data
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2013 at 7:30 PM, Dave Airlie  wrote:
>
> If you want to just bump it so Ironlake isn't affected, (patch attached).

It works fine 95% of the time and isn't a hard failure when it
doesn't, so this isn't critical. I can wait for it to be fixed a
while.

> Is this external DP monitor or eDP laptop panel btw?

External monitor. Oh, and the monitor is actually connected to HDMI,
but the black screen and the DP messages definitely go hand-in-hand.

 Linus
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-27 Thread Chris Wilson
On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
> >
> > Highlights:
> >
> > i915: all over the map, haswell power well enhancements, valleyview macro 
> > horrors cleaned up, killing lots of legacy GTT
> > code,
> 
> Lowlight:
> 
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
> 
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> 
> and after that the screen ends up black.
> 
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...

That message appears to be the canary. For whatever reason the DP
transfer is not functioning, likely the VDD is not powered up. However,
the failure to communicate there causes the modeset to abort, resulting
in the blank screen.
 
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.

So it is reproducible during suspend. That should help narrow down the
sequence, thank you.
 
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
> 
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
> 
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
> 
> would be? What's that magic "10"? It's some totally random number.

The hardware is required to return a timedout error message after 400
microseconds. The timeout here is to catch the dysfunction driver, and
so was intended to be 10 milliseconds, cf
https://patchwork.kernel.org/patch/2160541/

As it happens with your machine 10 jiffies is approximately 10
millisecond, and so we should not be aborting before the hardware has
had a chance to signal failure. One way to check whether it is a failure
to setup the IRQ or a failure to setup the DP comms would be:

diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 7b8bfe8..f2486f1 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -356,9 +356,11 @@ intel_dp_aux_wait_done(struct intel_dp *intel_dp, bool 
has_aux_irq)
done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
else
done = wait_for_atomic(C, 10) == 0;
-   if (!done)
-   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i)!\n",
- has_aux_irq);
+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq: %i), 
status=%08x!\n",
+ has_aux_irq, status);
+   }
 #undef C
 
return status;

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
Hi,

I am seeing this also on Linux-Next.

/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
/var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

/var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!

This seems to be hard reproducible...
Laptop-LCD... Sandybridge Mobile-GT2.

Is there a way to force the error?

Possible patch see [1].

- Sedat -

[1] https://patchwork.kernel.org/patch/2192721/
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Sedat Dilek
On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> Hi,
>
> I am seeing this also on Linux-Next.
>
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
> /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> (has irq: 1)!
>
> This seems to be hard reproducible...
> Laptop-LCD... Sandybridge Mobile-GT2.
>
> Is there a way to force the error?
>
> Possible patch see [1].
>
> - Sedat -
>
> [1] https://patchwork.kernel.org/patch/2192721/

Hmm, I tried to apply the test-patch against next-20130227 and it
fails building the i915 kernel-module.

- Sedat -
  LD  drivers/gpu/drm/i915/built-in.o
  CC [M]  drivers/gpu/drm/i915/i915_drv.o
  CC [M]  drivers/gpu/drm/i915/i915_dma.o
  CC [M]  drivers/gpu/drm/i915/i915_irq.o
  CC [M]  drivers/gpu/drm/i915/i915_debugfs.o
  CC [M]  drivers/gpu/drm/i915/i915_suspend.o
  CC [M]  drivers/gpu/drm/i915/i915_gem.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_context.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_debug.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_evict.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_execbuffer.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_gtt.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_stolen.o
  CC [M]  drivers/gpu/drm/i915/i915_gem_tiling.o
  CC [M]  drivers/gpu/drm/i915/i915_sysfs.o
  CC [M]  drivers/gpu/drm/i915/i915_trace_points.o
  CC [M]  drivers/gpu/drm/i915/i915_ums.o
  CC [M]  drivers/gpu/drm/i915/intel_display.o
  CC [M]  drivers/gpu/drm/i915/intel_crt.o
  CC [M]  drivers/gpu/drm/i915/intel_lvds.o
  CC [M]  drivers/gpu/drm/i915/intel_bios.o
  CC [M]  drivers/gpu/drm/i915/intel_ddi.o
  CC [M]  drivers/gpu/drm/i915/intel_dp.o
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_aux_wait_done':
drivers/gpu/drm/i915/intel_dp.c:352:1: error: invalid storage class for 
function 'intel_dp_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:351:1: warning: ISO C90 forbids mixed 
declarations and code [-Wdeclaration-after-statement]
drivers/gpu/drm/i915/intel_dp.c:492:1: error: invalid storage class for 
function 'intel_dp_aux_native_write'
drivers/gpu/drm/i915/intel_dp.c:525:1: error: invalid storage class for 
function 'intel_dp_aux_native_write_1'
drivers/gpu/drm/i915/intel_dp.c:533:1: error: invalid storage class for 
function 'intel_dp_aux_native_read'
drivers/gpu/drm/i915/intel_dp.c:572:1: error: invalid storage class for 
function 'intel_dp_i2c_aux_ch'
drivers/gpu/drm/i915/intel_dp.c:669:1: error: invalid storage class for 
function 'intel_dp_i2c_init'
drivers/gpu/drm/i915/intel_dp.c:845:13: error: invalid storage class for 
function 'ironlake_set_pll_edp'
drivers/gpu/drm/i915/intel_dp.c:872:1: error: invalid storage class for 
function 'intel_dp_mode_set'
drivers/gpu/drm/i915/intel_dp.c:985:13: error: invalid storage class for 
function 'ironlake_wait_panel_status'
drivers/gpu/drm/i915/intel_dp.c:1004:13: error: invalid storage class for 
function 'ironlake_wait_panel_on'
drivers/gpu/drm/i915/intel_dp.c:1010:13: error: invalid storage class for 
function 'ironlake_wait_panel_off'
drivers/gpu/drm/i915/intel_dp.c:1016:13: error: invalid storage class for 
function 'ironlake_wait_panel_power_cycle'
drivers/gpu/drm/i915/intel_dp.c:1027:13: error: invalid storage class for 
function 'ironlake_get_pp_control'
drivers/gpu/drm/i915/intel_dp.c:1075:13: error: invalid storage class for 
function 'ironlake_panel_vdd_off_sync'
drivers/gpu/drm/i915/intel_dp.c:1097:13: error: invalid storage class for 
function 'ironlake_panel_vdd_work'
drivers/gpu/drm/i915/intel_dp.c:1244:13: error: invalid storage class for 
function 'ironlake_edp_pll_on'
drivers/gpu/drm/i915/intel_dp.c:1270:13: error: invalid storage class for 
function 'ironlake_edp_pll_off'
drivers/gpu/drm/i915/intel_dp.c:1325:13: error: invalid storage class for 
function 'intel_dp_get_hw_state'
drivers/gpu/drm/i915/intel_dp.c:1374:13: error: invalid storage class for 
function 'intel_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1390:13: error: invalid storage class for 
function 'intel_post_disable_dp'
drivers/gpu/drm/i915/intel_dp.c:1400:13: error: invalid storage class for 
function 'intel_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1419:13: error: invalid storage class for 
function 'intel_pre_enable_dp'
drivers/gpu/drm/i915/intel_dp.c:1432:1: error: invalid storage class for 
function 'intel_dp_aux_native_read_retry'
drivers/gpu/drm/i915/intel_dp.c:1457:1: error: invalid storage class for 
function 'intel_dp_get_link_status'
drivers/gpu/drm/i915/intel_dp.c:1483:1: error: invalid storage class for 
function 'intel_dp_voltage_max'
drivers/gpu/drm/i915/intel_dp.c:1496:1: error: invalid storage class for 
function 'intel_dp_pre_emphasis_max'
drivers/gpu/drm/i915/intel_dp.c:1538:1: error:

Re: [git pull] drm merge for 3.9-rc1

2013-02-28 Thread Chris Wilson
On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
> > Hi,
> >
> > I am seeing this also on Linux-Next.
> >
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
> > (has irq: 1)!
> >
> > This seems to be hard reproducible...
> > Laptop-LCD... Sandybridge Mobile-GT2.
> >
> > Is there a way to force the error?
> >
> > Possible patch see [1].
> >
> > - Sedat -
> >
> > [1] https://patchwork.kernel.org/patch/2192721/

That was:

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   }

You applied

+   if (!done) {
+   status = I915_READ_NOTRACE(ch_ctl);
+   DRM_ERROR("dp aux hw did not signal timeout (has irq:
%i), status=%08x!\n",
+ has_aux_irq, status);
+   {

That second '{' is the source of the compile error.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>
>> So I don't think that's actually the cause of the problem.  Or at least
>> not that alone.  I reverted it on top of Linus' latest tree and I still
>> get the lockups.
>
> Actually, git bisect does seem to have gotten it correct.  Once I
> actually tested the revert of just that on top of Linus' tree (commit
> d895cb1af1), things seem to be working much better.  I've rebooted a
> dozen times without a lockup.  The most I've seen it take on a kernel
> with that commit included is 3 reboots, so that's definitely at least an
> improvement.

 I give up.  GPU issues are not my thing.  2 reboots after I sent that it
 gave me pretty rainbow static again.  So it might have been an
 improvement, but revert it is not a solution.

 Looking at there rest of the commits, the whole GPU rework might be
 suspect, but I clearly have no clue.
>>>
>>> GPUs are tricky beasts :)
>>
>> Understatement ;).
>>
>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>>> problem anyway since it only affects 6xx/7xx and your card is handled
>>> by the evergreen code.  I'll put together some patches to help narrow
>>> down the problem.
>>
>> Yeah, that's the biggest problem I have, not knowing which functions are
>> actually being executed for this card.  It looks like a combination of
>> stuff in evergreen.c and ni.c, but I have no idea.
>>
>> Patches would be great.  If nothing else, I'm really good at building
>> kernels and rebooting by now.
>
> Two possible fixes attached.  The first attempts a full reset of all
> blocks if the MC (memory controller) is hung.  That may work better
> than just resetting the MC.  The second just disables MC reset.  I'm
> not sure we can reliably tell if it's busy due to display requests
> hitting the MC periodically which would lead to needlessly resetting
> it possibly leading to failures like you are seeing.

OK.  I'll test them individually.  It will probably take a bit because
I'll want to do numerous reboots if things seem "fixed" with one or the
other.

I'll let you know how things go.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Sedat Dilek
On Thu, Feb 28, 2013 at 12:18 PM, Chris Wilson  wrote:
> On Thu, Feb 28, 2013 at 12:06:28AM +0100, Sedat Dilek wrote:
>> On Wed, Feb 27, 2013 at 11:36 PM, Sedat Dilek  wrote:
>> > Hi,
>> >
>> > I am seeing this also on Linux-Next.
>> >
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.202381]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> > /var/log/kern.log:Feb 27 22:52:35 fambox kernel: [   28.210588]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > /var/log/kern.log.1:Feb 22 07:36:04 fambox kernel: [   27.408280]
>> > [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
>> > (has irq: 1)!
>> >
>> > This seems to be hard reproducible...
>> > Laptop-LCD... Sandybridge Mobile-GT2.
>> >
>> > Is there a way to force the error?
>> >
>> > Possible patch see [1].
>> >
>> > - Sedat -
>> >
>> > [1] https://patchwork.kernel.org/patch/2192721/
>
> That was:
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   }
>
> You applied
>
> +   if (!done) {
> +   status = I915_READ_NOTRACE(ch_ctl);
> +   DRM_ERROR("dp aux hw did not signal timeout (has irq:
> %i), status=%08x!\n",
> + has_aux_irq, status);
> +   {
>
> That second '{' is the source of the compile error.

Schei**e, OK I try with a v2.

A hint how to force the error?

- Sedat -

> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-01 Thread Josh Boyer
On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  wrote:
>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  wrote:
 ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>
>>> So I don't think that's actually the cause of the problem.  Or at least
>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>> get the lockups.
>>
>> Actually, git bisect does seem to have gotten it correct.  Once I
>> actually tested the revert of just that on top of Linus' tree (commit
>> d895cb1af1), things seem to be working much better.  I've rebooted a
>> dozen times without a lockup.  The most I've seen it take on a kernel
>> with that commit included is 3 reboots, so that's definitely at least an
>> improvement.
>
> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
> gave me pretty rainbow static again.  So it might have been an
> improvement, but revert it is not a solution.
>
> Looking at there rest of the commits, the whole GPU rework might be
> suspect, but I clearly have no clue.

 GPUs are tricky beasts :)
>>>
>>> Understatement ;).
>>>
 ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
 problem anyway since it only affects 6xx/7xx and your card is handled
 by the evergreen code.  I'll put together some patches to help narrow
 down the problem.
>>>
>>> Yeah, that's the biggest problem I have, not knowing which functions are
>>> actually being executed for this card.  It looks like a combination of
>>> stuff in evergreen.c and ni.c, but I have no idea.
>>>
>>> Patches would be great.  If nothing else, I'm really good at building
>>> kernels and rebooting by now.
>>
>> Two possible fixes attached.  The first attempts a full reset of all
>> blocks if the MC (memory controller) is hung.  That may work better
>> than just resetting the MC.  The second just disables MC reset.  I'm
>> not sure we can reliably tell if it's busy due to display requests
>> hitting the MC periodically which would lead to needlessly resetting
>> it possibly leading to failures like you are seeing.
>
> OK.  I'll test them individually.  It will probably take a bit because
> I'll want to do numerous reboots if things seem "fixed" with one or the
> other.
>
> I'll let you know how things go.

I applied each individually on top of Linus' tree as of this morning
(commit 2a7d2b96d5) built, installed, and tested.

0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
two reboots.

0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
21 reboots without a hang/rainbow static.  You'll understand if I'm
hesitant to declare success, but resetting the MC does indeed appear to
be the issue.  I'll keep rebooting for a while to make sure.

josh
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-03 Thread Azat Khuzhin
On Wed, Feb 27, 2013 at 5:39 AM, Linus Torvalds
 wrote:
> On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie  wrote:
>>
>> Highlights:
>>
>> i915: all over the map, haswell power well enhancements, valleyview macro 
>> horrors cleaned up, killing lots of legacy GTT
>> code,
>
> Lowlight:
>
> There's something wrong with i915 DP detection or whatever. I get
> stuff like this:
>
> [5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not
> signal timeout (has irq: 1)!
> [5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f
> .
> [8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status
> 0xa145003f


I have the same messages after upgrading up to
b0af9cd9aab60ceb17d3ebabb9fdf4ff0a99cf50
But in my case when I reboot computer the second monitor, that plugged
via HDMI, didn't works, end when I run `xrandr`, I have next messages
in kern.log

Mar  3 18:09:15 home-spb kernel: [12321.758273] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f
Mar  3 18:09:15 home-spb kernel: [12321.771715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.782712]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.793715]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.804719]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.815725]
[drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not signal timeout
(has irq: 1)!
Mar  3 18:09:15 home-spb kernel: [12321.817293] [drm:intel_dp_aux_ch]
*ERROR* dp_aux_ch not done status 0xa143003f

# lspci | fgrep -i graph
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation
Core Processor Family Integrated Graphics Controller (rev 09)

I tested some commits, and here the results:
- Breaked at v3.8-10206-gb0af9cd
- Works normal v3.8-rc3-139-g34f2be4
- Works normal v3.8-rc3-188-g10aa17c
- Works normal 6dc1c49

I've tested 0001-drm-i915-only-use-irq-for-dp-on-post-ilk.patch and it
works for me.
Thank, Dave.

>
> and after that the screen ends up black.
>
> It's happened twice now, but is not 100% repeatable. It looks like the
> message itself is new,  but the black screen is also new and does seem
> to happen when I get the message, so...
>
> The second time I touched the power button, and the machine came back.
> Apparently the suspend/resume cycle made it all magically work: the
> suspend caused the same errors, but then the resume made it all good
> again.
>
> Some kind of missed initialization at bootup? It's not reliable enough
> to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915:
> irq-drive the dp aux communication") since that is where the message
> was added..
>
> Btw, looking at that commit, what do you think the semantics of the
> timeout in something like
>
> done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10);
>
> would be? What's that magic "10"? It's some totally random number.
>
> Guys, it should be something meaningful. If you meant a tenth of a
> second, use HZ/10 or something. Because just the plain "10" is crazy.
> I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a
> hundreth of a second. Was that what you intended? Because if it was,
> it is still crap, since CONFIG_HZ might be 100, and then you're
> waiting for ten times longer.
>
> IOW, passing in a random number like that is crazy. It cannot possibly
> be right.
>
> I have no idea whether the timeout has anything to do with anything,
> but it reinforces my suspicion that there is something wrong with that
> commit.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/



--
Respectfully
Azat Khuzhin
Primary email a3at.m...@gmail.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [git pull] drm merge for 3.9-rc1

2013-03-05 Thread Alex Deucher
On Tue, Mar 5, 2013 at 10:21 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer  wrote:
>> On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer  wrote:
>>> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher  
>>> wrote:
 On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer  wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher  
> wrote:
>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>
> So I don't think that's actually the cause of the problem.  Or at 
> least
> not that alone.  I reverted it on top of Linus' latest tree and I 
> still
> get the lockups.

 Actually, git bisect does seem to have gotten it correct.  Once I
 actually tested the revert of just that on top of Linus' tree (commit
 d895cb1af1), things seem to be working much better.  I've rebooted a
 dozen times without a lockup.  The most I've seen it take on a kernel
 with that commit included is 3 reboots, so that's definitely at least 
 an
 improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

 Two possible fixes attached.  The first attempts a full reset of all
 blocks if the MC (memory controller) is hung.  That may work better
 than just resetting the MC.  The second just disables MC reset.  I'm
 not sure we can reliably tell if it's busy due to display requests
 hitting the MC periodically which would lead to needlessly resetting
 it possibly leading to failures like you are seeing.
>>>
>>> OK.  I'll test them individually.  It will probably take a bit because
>>> I'll want to do numerous reboots if things seem "fixed" with one or the
>>> other.
>>>
>>> I'll let you know how things go.
>>
>> I applied each individually on top of Linus' tree as of this morning
>> (commit 2a7d2b96d5) built, installed, and tested.
>>
>> 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in
>> two reboots.
>>
>> 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone
>> 21 reboots without a hang/rainbow static.  You'll understand if I'm
>> hesitant to declare success, but resetting the MC does indeed appear to
>> be the issue.  I'll keep rebooting for a while to make sure.
>
> OK, I'm still running on the kernel with that patch and things still
> work.  The only other "issue" I'm seeing at the moment is my dmesg is
> full of:
>
> [349316.595749] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.654946] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349436.655997] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349496.698441] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.726767] radeon :01:00.0: MC busy: 0x0409, clearing.
> [349556.727797] radeon :01:00.0: MC busy: 0x0409, clearing.
>

I'll make those debug only when the patch goes upstream.

> So hopefully your patch is on the way into Linus' tree at some point
> soon.

It'll be in my next -fixes pull.

Alex
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


  1   2   >