offtopic: r200 lockups and context switches
Hello All,Sorry for possible offtopic, but I have a question related to Radeon 9250 card lockups. I am doing an experimental research project on graphics engine resource management based on the r200 driver. I have modified the drm implementation so that all the commands sent to the ring on behalf of a process are being queued in the kernel (xxx_RING macros redefinition), and when a user-level process emits state it marks it in the command buffer so that DRM side can distinguish it from other commands. Then, I have a kernel thread which takes commands from client queues and dispatches them to the GPU. When it detects a radeon_cp_cmdbuf coming from a different client than before, it emits the necessary state the corresponding client relies upon (just like in r200_dri.so does, but in kernel). I tried to optimize context switches by 'remembering' what has been sent last to the GPU for every hardware state 'atom' in the scheduler thread, and when it comes to context switch, sending only the atoms that actually differ. However, it resulted in quite frequent lockups (4 windows with an app drawing 1 spinning triangles lock it up after ten seconds) compared to the version which emits full state every time which is quite more robust. Which 'atoms' are necessary to emit every time even if exactly the same command sequence was emitted for this atom by the previous client? The lockups I am experiencing are real hardware lockups, because I debugged ring head tail position and it does not change. Is it possible to detect hardware lockup and reset hardware automatically, by the way? I've read that Longhorn display drivers for existing hardware are capable of something like that. Once again, sorry for offtopic.Thank you.Mikhail Bautin
Re: offtopic: r200 lockups and context switches
On 1/26/06, Michael Bautin [EMAIL PROTECTED] wrote: Hello All, Sorry for possible offtopic, but I have a question related to Radeon 9250 card lockups. I am doing an experimental research project on graphics engine resource management based on the r200 driver. I have modified the drm implementation so that all the commands sent to the ring on behalf of a process are being queued in the kernel (xxx_RING macros redefinition), and when a user-level process emits state it marks it in the command buffer so that DRM side can distinguish it from other commands. Then, I have a kernel thread which takes commands from client queues and dispatches them to the GPU. When it detects a radeon_cp_cmdbuf coming from a different client than before, it emits the necessary state the corresponding client relies upon (just like in r200_dri.so does, but in kernel). I tried to optimize context switches by 'remembering' what has been sent last to the GPU for every hardware state 'atom' in the scheduler thread, and when it comes to context switch, sending only the atoms that actually differ. However, it resulted in quite frequent lockups (4 windows with an app drawing 1 spinning triangles lock it up after ten seconds) compared to the version which emits full state every time which is quite more robust. Which 'atoms' are necessary to emit every time even if exactly the same command sequence was emitted for this atom by the previous client? this sounds like the state atom ordering bug. Apparently radeon hardware is particular about the ordering of atoms in some cases. Unfortunately there doesn't seem to be a general rule for what this is. Alex The lockups I am experiencing are real hardware lockups, because I debugged ring head tail position and it does not change. Is it possible to detect hardware lockup and reset hardware automatically, by the way? I've read that Longhorn display drivers for existing hardware are capable of something like that. Once again, sorry for offtopic. Thank you. Mikhail Bautin --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid3432bid#0486dat1642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: offtopic: r200 lockups and context switches
Michael Bautin wrote: The lockups I am experiencing are real hardware lockups, because I debugged ring head tail position and it does not change. Is it possible to detect hardware lockup and reset hardware automatically, by the way? I've read that Longhorn display drivers for existing hardware are capable of something like that. Yes, this is possible. Catalyst driver does that since some time. As you've noted, if the ring doesn't advance the chip has locked up. You can then reset it, though you will lose pretty much all state (?). This would actually be a nice thing to have. There once was a similar patch floating around, though it wasn't actually for that problem, instead intended to kill apps which never release the lock (and it just happens that the lock is often held when the chip has locked up and the app is waiting for buffers). Roland --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnkkid=103432bid=230486dat=121642 -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel