[Bug 16140] Suspend To RAM/ Resume broken - Radeon KMS on RV250

bugzilla-daemon Sun, 10 Oct 2010 16:16:28 -0700

https://bugzilla.kernel.org/show_bug.cgi?id=16140






--- Comment #33 from Florian Mickler <flor...@mickler.org>  2010-10-10 23:15:54 
---
Hi!

I did find a rv280 card. On that card, the screen is garbled after resume, but
the ring test doesn't fail. It is using the same code-paths as far as I see. 

So we can probably conclude: 
1. garbled screen and the ring setup failure are independent failures. 
2. the ring setup failure is something specific to your card / or chipset.

Do you see differences in lspci -vv output before and after suspend? 

(In reply to comment #32)
> (Patch see https://bugzilla.kernel.org/show_bug.cgi?id=16140#c26)
> 
> With applying my patch from above, it's this section (Line #1028 and 
> following)
> from r100_cp_init() doing the problem:
> 
>  940 int r100_cp_init(struct radeon_device *rdev, unsigned ring_size)
> ...
> 1026         radeon_ring_start(rdev);
> 1027         r = radeon_ring_test(rdev);
> 1028         if (r) {
> 1029                 DRM_ERROR("radeon: cp isn't working (%d).\n", r);
> 1030                 return r;
> 1031         }
> 1032         rdev->cp.ready = true;
> 1033         return 0;
> 1034 }
> ...
> 
> Replacing "if (r) {" with "if (WARN_ON(r)) {" shows the above Call-trace.

Yes. This is also seen by the "radeon: cp isn't working (-22)." 
Line in your dmesg. But of course the callstack is handy to verify we are
looking at the right code.
I didn't put a WARN there, because we already knew it failed. 
I wondered if some tests without error-messages failed and put the WARN's
there. 
But in retrospect we would have seen that, because the above error message
would have not been preceded by the ring-test error message.

> I looked into r600.c source-code and put "rdev->cp.ready = true;" before Line
> "r = radeon_ring_test(rdev);", not helping.

If you are interested how the driver works, have a look at
http://www.botchco.com/agd5f/?p=50

The "ring" is a buffer where the driver writes commands and the gpu reads those
commands and executes them. It's a ring buffer.
http://en.wikipedia.org/wiki/Circular_buffer

If you set cp.ready and the hardware isn't really ready, that won't help. 

The ring test works like so: The driver writes a value (0xCAFEDEAD) into the
scratch-register and instructs the gpu via the ringbuffer to overwrite it with
"0xDEADBEEF". Then the driver check's if the gpu does it. And if after N
udelays(1)  the gpu did not write the expected value into that register, the
test fails.

But of course, we are left to wonder as to why.

> Again inspired from r600.c, I put Line #966 "r100_cp_load_microcode(rdev);"
> after "r = radeon_ring_init(rdev, ring_size);", this resulted in a
> not-so-garbled screen, after hanging:
> pm-resume in X -> switching to vt-1 -> killing X -> restarting startx

That's interesting. Can you elaborate on the hanging?

> 
> This is doing no harm, see my logs.
> -        DRM_ERROR("radeon: ring test failed (sracth(0x%04X)=0x%08X)\n",
> +        DRM_ERROR("radeon: ring test failed (scratch(0x%04X)=0x%08X)\n",

True, but it is inconvenient. If you 'grep -r' on that error message you only
get the r100 one. With the typo corrected, you get both, the r100 and the r600
one. I agree, not a big deal, but...

> I am not sure what you mean with "radeon driver": the one in the kernel or the
> DDX (xf86-video-ati).

Always the kernel one, at the moment.

> 
> One NOTE:
> In Line #3728 there is a commented "r100_gpu_init(rdev);", it is nowhere
> "defined". I see in r600.c a *_gpu_init() and a *_cp_start() in case of
> resuming. Just a hint, if you wanna compare or dig into it.
> 

Yes. I wondered about that too. 'git-blame' shows it is a  left over from:

commit 90aca4d2740255bd130ea71a91530b9920c70abe
Author: Jerome Glisse <jgli...@redhat.com>
Date:   Tue Mar 9 14:45:12 2010 +0000

    drm/radeon/kms: simplify & improve GPU reset V2

...

> IIRC it would make sense to interprete correctly the Call-trace, I am not that
> familiar with "the internals".

The call-trace is not complicated. The topmost function is the function that is
currently executing. The second entry is the function it will return to. The
third function is the function the second function will return to. and so on. 

see: http://en.wikipedia.org/wiki/Call_stack

I don't know about the item 1 to 3 in that trace. But I guess they are just
artifacts of the WARN_ON macro. 

If you look into the code, you see that the call trace is to be expected. 
What has to be considered bad, is that the ring-test fails because the gpu
doesn't process the ringbuffer in time. 


In comment #12 you said, that turning off agp would fix the suspend issue?
Which one was that? The ring-test error message, or the garbled screen or both?

In my setup (rv280) it only worked once out of ten times. First time, it came
back without garbled screen, but all subsequent suspend/resumes did garble the
screen. 


On that screen garble I have a few thoughts. It is somewhat periodic and always
follows a pattern for me. I can clear the corruption by changing consoles for
example. Then it always scribbles in a predetermined pattern on the framebuffer
where it stays (overwriting itself with a high frequency), till I change
consoles.  Same for you?

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

[Bug 16140] Suspend To RAM/ Resume broken - Radeon KMS on RV250

Reply via email to