Re: DRI2 and lock-less operation

Keith Whitwell Mon, 26 Nov 2007 09:38:18 -0800

Kristian Høgsberg wrote:
> On Nov 22, 2007 4:23 AM, Keith Whitwell <[EMAIL PROTECTED]> wrote:
> ...
>>> My guess for one way is to store a buffer object with the current state
>>> emission in it, and submit it with the superioctl maybe, and if we have
>>> lost context emit it before the batchbuffer..
>> The way drivers actually work at the moment is to emit a full state as a
>> preamble to each batchbuffer.  Depending on the hardware, this can be
>> pretty low overhead, and it seems that the trend in hardware is to make
>> this operation cheaper and cheaper.  This works fine without the lock.
>>
>> There is another complimentary trend to support one way or another
>> multiple hardware contexts (obviously nvidia have done this for years),
>> meaning that effectively the hardware (effectively) does the context
>> switches.  This is probably how most cards will end up working in the
>> future, if not already.
>>
>> Neither of these need a lock for detecting context switches.
> 
> Sure enough, but the problem is that without the lock userspace can't
> say "oops, I lost the context, let me prepend this state emission
> preamble to the batchbuffer." in a race free way.  If we want
> conditional state emission, we need to make that decision in the
> kernel.


The cases I describe above don't try to do this, but if you really 
wanted to, the way to do it would be to have userspace always emit the 
preamble but pass two offsets to the kernel, one at the start of the 
preamble, the other after it.  Then the kernel can choose.

I don't think there's a great deal to be gained from this optimization, 
though.


> 
> For example, the super ioctl could send the state emission code as a
> separate buffer and also include the expected context handle.  This
> lets the kernel compare the context handle supplied in the super ioctl
> with the most recently active context handle, and if they differ, the
> kernel queues the state emission buffer first and then the rendering
> buffer.  If the context handles match, the kernel just queues the
> rendering batch buffer.
> 
> However, this means that user space must prepare the state emission
> code for each submission, whether or not it will actually be used.
> I'm not sure if this is too much overhead or if it's negligible?

I think both preparing it on CPU and executing it on GPU are likely to 
be pretty negligible, but some experimentation on a system with just a 
single app running should show this quickly one way or another.

Keith


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
--
_______________________________________________
Dri-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: DRI2 and lock-less operation

Reply via email to