Re: R300: Recovering from lockups

2004-06-04 Thread Roland Scheidegger
Roland Scheidegger wrote:
There's historically been code which tried to do that, but it
just never ever worked...

I always thought that the code did work, however since the reset 
usually happened in DRM driver it did not know how to set the mode.
 There were quite a few occasions where I was able to kill X and
then restart it. (though on other occasions killing X resulted in
hard lockup).

What's different in Nikolai's patch is that the reset is initiated
by X itself.
It might be that the lockup was due to clocks being switched off,
but the DAC left active - or something similar.

I can confirm this code works for r200 (I'm still trying to figure
out how to fix the lockups for r200 with multiple GL apps, apparently
 without much success...). However, sometimes it seems to fail, I've
seen 3 different cases what happens if the watchdog is triggered: 1)
one of the 2 running GL apps is killed, the other continues to run. 
2) both running GL apps are killed, if that happens no further GL
apps can be started (will always hang and got killed). 3) GPU lockup
still happens (that happened just once, and with more than 2 apps, I
was not able to reproduce it, and forgot to look in the log file if
the watchdog was triggered or not). Case 1) and 2) are both
frequently seen. But it's nice to see "VPURecover for Linux" ;-).

Roland
I've played around with this a bit (unwillingly ;-) on my r200, though
there seem to be some problems.
First, the watchdog will not always detect if a client has caused a GPU 
hang (since the lock must not be held necessarily at that time). I guess 
this isn't really the fault of the watchdog, as it's not intended for that.
But for instance, running glxgears & quake3 at the same time, both
processes will get stuck in r200WaitForFrameCompletion (as far as I can
tell) rather sooner than later. When using irqs (fthrottle_mode=2), one
client will have the lock and it will get killed (though experience
shows always both will get killed for some reason). However, when using
busy waits (fthrottle_mode=0) the watchdog will never trigger, since the
lock is always released and regrabbed while waiting. Actually though in
this case you can switch away from X to a VT (can't do that when using
irqs), X will moan about "Idle timed out, resetting engine", switch back
to X and both apps will continue to run (and lock up again shortly
afterwards, but that's another story). Maybe the DRI driver itself 
should have some sort of a watchdog too and call just that reset code 
when it has to wait too long? Not sure if/how that works when using 
irqs. That would be nicer than just to kill the app. AFAIK the ATI 
VPURecover feature in the windows driver also works like that, if it 
detects a gpu lockup, it will reset the chip and try to continue, and 
only if it locks up again very soon the application is killed.
Of course it would be better just to fix the lockups ;-), but no 
progress so far :-(.

Other problem I've seen is that sometimes no more 3d apps can be 
started. This seems to depend on where the applications are stuck when 
they are killed. The above quake3 & glxgears example will always have 
that outcome, I'm also seeing this when they are killed:
r200WaitIrq: drmRadeonIrqWait: -16
r200WaitIrq: drmRadeonIrqWait: -16
Interestingly, this can be fixed by switching away from X and back, the 
VTEnter/Resume code seems to be responsible for that I think.

Roland

---
This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300: Recovering from lockups

2004-06-03 Thread Roland Scheidegger
Vladimir Dergachev wrote:

On Wed, 26 May 2004, Keith Whitwell wrote:
Vladimir Dergachev wrote:
Hi Nikolai,
  I merged your patches - thank you very much !

I wonder if a similar approach could allow us to reset the radeon/r200 
after lockups?

Well, Nikolai's patch is not specific to R300 - it uses plain Radeon 
registers.

There's historically been code which tried to do that, but it just 
never ever worked...

I always thought that the code did work, however since the reset usually 
happened in DRM driver it did not know how to set the mode. There were 
quite a few occasions where I was able to kill X and then restart it.
(though on other occasions killing X resulted in hard lockup).

What's different in Nikolai's patch is that the reset is initiated by X 
itself.

It might be that the lockup was due to clocks being switched off, but 
the DAC left active - or something similar.
I can confirm this code works for r200 (I'm still trying to figure out
how to fix the lockups for r200 with multiple GL apps, apparently
without much success...). However, sometimes it seems to fail, I've seen
3 different cases what happens if the watchdog is triggered:
1) one of the 2 running GL apps is killed, the other continues to run.
2) both running GL apps are killed, if that happens no further GL apps 
can be started (will always hang and got killed).
3) GPU lockup still happens (that happened just once, and with more than 
2 apps, I was not able to reproduce it, and forgot to look in the log 
file if the watchdog was triggered or not).
Case 1) and 2) are both frequently seen.
But it's nice to see "VPURecover for Linux" ;-).

Roland

---
This SF.Net email is sponsored by the new InstallShield X.
From Windows to Linux, servers to mobile, InstallShield X is the one
installation-authoring solution that does it all. Learn more and
evaluate today! http://www.installshield.com/Dev2Dev/0504
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300: Recovering from lockups

2004-05-27 Thread Michel Dänzer
On Tue, 2004-05-25 at 21:55, Nicolai Haehnle wrote: 
> 
> As you may be aware, I was trying to get R300 support into a state where it 
> is possible to start OpenGL applications, let them hang the CP and *not* 
> bring  down the entire machine.
> 
> Looks like I was successful :)

Nice!

> The attached patch ati.unlock.1.patch against the DDX makes sure the RBBM 
> (whatever that means; I'm guessing Ring Buffer something or other) 

Register BackBone Manager.

> is reset in RADEONEngineReset(), before any other register is accessed that 
> could potentially cause a final crash (DSTCACHE_* is the major offender in this 
> category).
> 
> Now since I don't have any Radeon-related documentation at all, I have no 
> idea whether this patch will work on any other chip. For all that I know, 
> it might totally break the driver on R100/R200. 

Indeed, anyone who can reproduce a lockup with those chips should give
your patches a spin and let us know what effect they have.

> I'm especially confused by the fact that the bottom half of EngineReset() 
> treats RBBM_SOFT_RESET differently for the R300. Can anybody explain why?
> Maybe it would even be safest/cleanest to move the entire RBBM_SOFT_RESET 
> block to the top of the function?

I guess this code originates from Hui Yu and/or Kevin E. Martin, CC'ing
them. For their reference, I also paste the hunk in question:

--- ati-vladimir/radeon_accel.c 2004-05-20 16:02:24.0 +0200
+++ ati/radeon_accel.c  2004-05-25 21:14:24.0 +0200
@@ -170,6 +170,31 @@
 CARD32 rbbm_soft_reset;
 CARD32 host_path_cntl;
 
+/* The following RBBM_SOFT_RESET sequence can help un-wedge
+ * an R300 after the command processor got stuck.
+ */
+rbbm_soft_reset = INREG(RADEON_RBBM_SOFT_RESET);
+OUTREG(RADEON_RBBM_SOFT_RESET, (rbbm_soft_reset |
+   RADEON_SOFT_RESET_CP |
+   RADEON_SOFT_RESET_HI |
+   RADEON_SOFT_RESET_SE |
+   RADEON_SOFT_RESET_RE |
+   RADEON_SOFT_RESET_PP |
+   RADEON_SOFT_RESET_E2 |
+   RADEON_SOFT_RESET_RB));
+INREG(RADEON_RBBM_SOFT_RESET);
+OUTREG(RADEON_RBBM_SOFT_RESET, (rbbm_soft_reset & (CARD32)
+   ~(RADEON_SOFT_RESET_CP |
+ RADEON_SOFT_RESET_HI |
+   


-- 
Earthling Michel DÃnzer  | Debian (powerpc), X and DRI developer
Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer



---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300: Recovering from lockups

2004-05-26 Thread Vladimir Dergachev

On Wed, 26 May 2004, Keith Whitwell wrote:
Vladimir Dergachev wrote:
Hi Nikolai,
  I merged your patches - thank you very much !
I wonder if a similar approach could allow us to reset the radeon/r200 after 
lockups?
Well, Nikolai's patch is not specific to R300 - it uses plain Radeon 
registers.

There's historically been code which tried to do that, but it just never ever 
worked...
I always thought that the code did work, however since the reset usually 
happened in DRM driver it did not know how to set the mode. There were 
quite a few occasions where I was able to kill X and then restart it.
(though on other occasions killing X resulted in hard lockup).

What's different in Nikolai's patch is that the reset is initiated by X 
itself.

It might be that the lockup was due to clocks being switched off, but the 
DAC left active - or something similar.

 best
Vladimir Dergachev

Keith


---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300: Recovering from lockups

2004-05-26 Thread Keith Whitwell
Vladimir Dergachev wrote:
Hi Nikolai,
  I merged your patches - thank you very much !
I wonder if a similar approach could allow us to reset the radeon/r200 after 
lockups?

There's historically been code which tried to do that, but it just never ever 
worked...

Keith

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: R300: Recovering from lockups

2004-05-25 Thread Vladimir Dergachev
Hi Nikolai,
  I merged your patches - thank you very much !
 Vladimir Dergachev
On Tue, 25 May 2004, Nicolai Haehnle wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
As you may be aware, I was trying to get R300 support into a state where it
is possible to start OpenGL applications, let them hang the CP and *not*
bring  down the entire machine.
Looks like I was successful :)
The attached patch ati.unlock.1.patch against the DDX makes sure the RBBM
(whatever that means; I'm guessing Ring Buffer something or other) is reset
in RADEONEngineReset(), before any other register is accessed that could
potentially cause a final crash (DSTCACHE_* is the major offender in this
category).
Now since I don't have any Radeon-related documentation at all, I have no
idea whether this patch will work on any other chip. For all that I know,
it might totally break the driver on R100/R200. I'm especially confused by
the fact that the bottom half of EngineReset() treats RBBM_SOFT_RESET
differently for the R300. Can anybody explain why?
Maybe it would even be safest/cleanest to move the entire RBBM_SOFT_RESET
block to the top of the function?
I can now launch glxgears several times in a row. It will be killed a few
seconds later (during this time the GUI will hang), and as far as I can
tell, everything continues to work normally.
Of course, for all I know the 3D part of the chip might still be wedged
internally, which would make this patch (partially) useless for working on
the driver. I guess I'll find out soon enough.
Important: You'll need my watchdog patch for the DRM from that other thread.
Otherwise, the reset code in the X server will never be called, and this
patch will have no effect.
I would also like to point out that the modified xf86 driver that was posted
on this list (see http://volodya-project.sourceforge.net/R300.php)  does
not check the version of the DRM. I know, this is really a silly, minor
point to make at this time, but I've attached a small patch to fix this
anyway.
cu,
Nicolai
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAs6SssxPozBga0lwRAsKwAJ0eyDj01OjMybqe18du3Qs06peOSACaAkVL
B9hN0+nizrYWhM6/nXcf6uE=
=4tp0
-END PGP SIGNATURE-

---
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel