offtopic: r200 lockups and context switches

2006-01-26 Thread Michael Bautin
Hello All,Sorry for possible offtopic, but I have a question related to Radeon 9250 card lockups. I am doing an experimental research project on graphics engine resource management based on the r200 driver. I have modified the drm implementation so that all the commands sent to the ring on behalf of a process are being queued in the kernel (xxx_RING macros redefinition), and when a user-level process emits state it marks it in the command buffer so that DRM side can distinguish it from other commands. Then, I have a kernel thread which takes commands from client queues and dispatches them to the GPU. When it detects a radeon_cp_cmdbuf coming from a different client than before, it emits the necessary state the corresponding client relies upon (just like in r200_dri.so does, but in kernel). I tried to optimize context switches by 'remembering' what has been sent last to the GPU for every hardware state 'atom' in the scheduler thread, and when it comes to context switch, sending only the atoms that actually differ. However, it resulted in quite frequent lockups (4 windows with an app drawing 1 spinning triangles lock it up after ten seconds) compared to the version which emits full state every time which is quite more robust. Which 'atoms' are necessary to emit every time even if exactly the same command sequence was emitted for this atom by the previous client?
The lockups I am experiencing are real hardware lockups, because I debugged ring head tail position and it does not change. Is it possible to detect hardware lockup and reset hardware automatically, by the way? I've read that Longhorn display drivers for existing hardware are capable of something like that.
Once again, sorry for offtopic.Thank you.Mikhail Bautin 


Re: offtopic: r200 lockups and context switches

2006-01-26 Thread Alex Deucher
On 1/26/06, Michael Bautin [EMAIL PROTECTED] wrote:
 Hello All,

 Sorry for possible offtopic, but I have a question related to Radeon 9250
 card lockups. I am doing an experimental research project on graphics engine
 resource management based on the r200 driver. I have modified the drm
 implementation so that all the commands sent to the ring on behalf of a
 process are being queued in the kernel (xxx_RING macros redefinition), and
 when a user-level process emits state it marks it in the command buffer so
 that DRM side can distinguish it from other commands. Then, I have a kernel
 thread which takes commands from client queues and dispatches them to the
 GPU. When it detects a radeon_cp_cmdbuf coming from a different client than
 before, it emits the necessary state the corresponding client relies upon
 (just like in r200_dri.so does, but in kernel). I tried to optimize context
 switches by 'remembering' what has been sent last to the GPU for every
 hardware state 'atom' in the scheduler thread, and when it comes to context
 switch, sending only the atoms that actually differ. However, it resulted in
 quite frequent lockups (4 windows with an app drawing 1 spinning
 triangles lock it up after ten seconds) compared to the version which emits
 full state every time which is quite more robust. Which 'atoms' are
 necessary to emit every time even if exactly the same command sequence was
 emitted for this atom by the previous client?


this sounds like the state atom ordering bug.  Apparently radeon
hardware is particular about the ordering of atoms in some cases. 
Unfortunately there doesn't seem to be a general rule for what this
is.

Alex

 The lockups I am experiencing are real hardware lockups, because I debugged
 ring head tail position and it does not change. Is it possible to detect
 hardware lockup and reset hardware automatically, by the way? I've read that
 Longhorn display drivers for existing hardware are capable of something like
 that.

 Once again, sorry for offtopic.

 Thank you.
 Mikhail Bautin




---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: offtopic: r200 lockups and context switches

2006-01-26 Thread Roland Scheidegger

Michael Bautin wrote:
The lockups I am experiencing are real hardware lockups, because I 
debugged ring head tail position and it does not change. Is it possible 
to detect hardware lockup and reset hardware automatically, by the way? 
I've read that Longhorn display drivers for existing hardware are 
capable of something like that.
Yes, this is possible. Catalyst driver does that since some time. As 
you've noted, if the ring doesn't advance the chip has locked up. You 
can then reset it, though you will lose pretty much all state (?). This 
would actually be a nice thing to have. There once was a similar patch 
floating around, though it wasn't actually for that problem, instead 
intended to kill apps which never release the lock (and it just happens 
that the lock is often held when the chip has locked up and the app is 
waiting for buffers).


Roland


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnkkid=103432bid=230486dat=121642
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel