Re: Swapbuffers [was: Re: DRI2 and lock-less operation]

Keith Whitwell Wed, 28 Nov 2007 04:30:59 -0800

Stephane Marchesin wrote:
> On 11/28/07, *Keith Whitwell* <[EMAIL PROTECTED] 
> <mailto:[EMAIL PROTECTED]>> wrote:
> 
> 
>     In my ideal world, the entity which knows and cares about cliprects
>     should be the one that does the swapbuffers, or at least is in control
>     of the process.  That entity is the X server.
> 
>     Instead of tying ourselves into knots trying to figure out how to get
>     some other entity a sufficiently up-to-date set of cliprects to make
>     this work (which is what was wrong with DRI 1.0), maybe we should try
>     and figure out how to get the X server to efficiently orchestrate
>     swapbuffers.
> 
>     In particular it seems like we have:
> 
>     1) The X server knows about cliprects.
>     2) The kernel knows about IRQ reception.
>     3) The kernel knows how to submit rendering commands to hardware.
>     4) Userspace is where we want to craft rendering commands.
> 
>     Given the above, what do we think about swapbuffers:
> 
>             a) Swapbuffers is a rendering command
>             b) which depends on cliprect information
>             c) that needs to be fired as soon as possible after an IRQ
>     receipt.
> 
>     So:
>             swapbuffers should be crafted from userspace (a, 4)
>             ... by the X server (b, 1)
>             ... and should be actually fired by the kernel (c, 2, 3)
> 
> 
> Well, on nvidia hw, you don't even need to fire from the kernel (thanks 
> to a special fifo command that waits for vsync).
> So I'd love it if going through the kernel for swapbuffers was 
> abstracted by the interface.


As I suggested elsewhere, I think that you're probably going to need 
this even on nvidia hardware.

>     I propose something like:
> 
>     0) 3D client submits rendering to the kernel and receives back a fence.
> 
>     1) 3D client wants to do swapbuffers.  It sends a message to the X
>     server asking it "please do me a swapbuffers after this fence has
>     completed".
> 
>     2) X server crafts (somehow) commands implementing swapbuffers for this
>     drawable under the current set of cliprects and passes it to the kernel
>     along with the fence.
> 
>     3) The kernel keeps that batchbuffer to the side until
>             a) the commands associated with the fence have been
>     submitted to hardware.
>             b) the next vblank IRQ arrives.
> 
>     when both of these are true, the kernel simply submits the prepared
>     swapbuffer commands through the lowest latency path to hardware.
> 
>     But what happens if the cliprects change?  The 100% perfect solution
>     looks like:
> 
>     The X server knows all about cliprect changes, and can use fences or
>     other mechanisms to keep track of which swapbuffers are
>     outstanding.  At
>     the time of a cliprect change, it must create new swapbuffer commandsets
>     for all pending swapbuffers and re-submit those to the kernel.
> 
>     These new sets of commands must be tied to the progress of the X
>     server's own rendering command stream so that the kernel fires the
>     appropriate one to land the swapbuffers to the correct destination as
>     the X server's own rendering flies by.
> 
> 
> Yes that was the basis for my thinking as well. By inserting the 
> swapbuffers into the normal flow of X commands, we remove the need for 
> syncing with the X server at swapbuffer time.

The very simplest way would be just to have the X server queue it up 
like normal blits and not even involve the kernel.  I'm not proposing 
this.  I believe such an approach will fail for the sync-to-vblank case 
due to latency issues - even (I suspect) with hardware-wait-for-vblank.

Rather, I'm describing a mechanism that allows a pre-prepared swapbuffer 
command to be injected into the X command stream (one way or another) 
with the guarantee that it will encode the correct cliprects, but which 
will avoid stalling the command queue in the meantime.


>     In the simplest case, where the kernel puts commands onto the one true
>     ring as it receives them, the kernel can simply discard the old
>     swapbuffer command.  Indeed this is true also if the kernel has a
>     ring-per-context and uses one of those rings to serialize the X server
>     rendering and swapbuffers commands. 
> 
> 
> Come on, admit that's a hack to get 100'000 fps in glxgears :)

I'm not talking about discarding the whole swap operation, just the 
version of the swap command buffer that pertained to the old cliprects. 
  Every swap is still being performed.

You do raise a good point though -- we currently throttle the 3d driver 
based on swapbuffer fences.  There would need to be some equivalent 
mechanism to achieve this.

> 
>     Note that condition 3a) above is always true in the current i915.o
>     one-true-ring/single-fifo approach to hardware serialization.
> 
> 
> Yes, I think those details of how to wait should be left 
> driver-dependent and abstracted in user space. So that we have the 
> choice of calling the kernel, doing it from user space only, relying on 
> a single fifo or whatever.
> 
> 
>     I think the above can work and seems more straight-forward than many of
>     the proposed alternatives.
> 
> 
> This is what I want to do too. Especially since in the nvidia case we 
> don't have the issue of routing vblank interrupts to user space for that.

Hmm -- I'm not suggesting routing interrupts to userspace??

> So, the only issue I'm worried about is the latency induced by this 
> approach. When the DRM does the swaps you can ensure it'll get executed 
> pretty fast. If X has been stuffing piles of commands into its command 
> buffer, it might not be so fast. What this means is that 3D might be 
> slowed down by 2D rendering (think especially of the case of EXA 
> fallbacks which will sync your fifo). In that case, ensuring a 
> no-fallback EXA would become relevant in achieving smooth 3D 
> performance.

One point of the proposal is that it potentially gives you a way to jump 
swapbuffers ahead of commands queued up by X.  In fact you get a 
continuum of possibilities, all behind the same interface:

Step 1:
        - X server just emits the blits to its normal command queue.
        - Probably works quite well for non-vblank cases.
        - Can also have the X server decide whether to page flip, in which case 
it can synchronously update all its 2d state, which is nice.

Step 2:
        - X server prepares a swapbuffer command buffer and hands it to the 
kernel.
        - Kernel appends it to X command queue after next IRQ.
        - On cliprect changes, X server sends updated swapbuffer to kernel, 
kernel discards old version.

Level 3:
        - As above, but:
        - On IRQ recept, kernel fires pre-prepared swapbuffer commands to a 
low-latency path that preempts X server rendering.
        - On cliprect changes, X server sends updated swapbuffer to kernel. 
Kernel keeps old versions until it knows they are no longer needed. 
After swapbuffers completes, kernel discards all varients.


This is all per-hardware.  If it is simple to create swapbuffers in 
kernel space (ie just blits), then the X server can just submit a 
codified version of the blit if that works out easier.

Level 3 is starting to look pretty complex, but it is at least an 
evolution of current thinking -- the kernel as dma scheduler.

It may be that by the time this is fully flushed out, it will be just as 
complex as any other proposal -- it's heading that way above.

Ultimately I'm fairly agnostic about this - I see it as an operating 
system detail that I can isolate in a fairly small section of the code I 
care about.  However if I do stop and think about it, it seems like the 
direction above is the one I would pursue first.


  But at least it solves the issue of sluggish OpenGL window
> moves and resizes (/me looks at the nvidia binary driver behaviour).

I think there's still a problem with resizes in that we want to prevent 
the flickering that occurs when the X server notices a resize and does 
*something* to the window contents.  Playing with window gravity and 
other state doesn't seem to prevent this in the general case.

Keith


-------------------------------------------------------------------------
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Re: Swapbuffers [was: Re: DRI2 and lock-less operation]

Reply via email to