Clip Lists

2007-11-27 Thread Soeren Sandmann
"Stephane Marchesin" <[EMAIL PROTECTED]> writes:

> I fail to see how this works with a lockless design. How do you ensure the X
> server doesn't change cliprects between the time it has written those in the
> shared ring buffer and the time the DRI client picks them up and has the
> command fired and actually executed ? Do you lock out the server during that
> time ?

The scheme I have been advocating is this:

- A new extension is added to the X server, with a
  PixmapFromBufferObject request.

- Clients render into a private back buffer object, for which they
  used the new extension to generate a pixmap.

- When a client wishes to copy something to the frontbuffer (for
  whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
  plain old XCopyArea() with the generated pixmap. The X server is
  then responsible for any clipping necessary.

This scheme puts all clip list management in the X server. No
cliprects in shared memory or in the kernel would be required. And no
locking is required since the X server is already processing requests
in sequence.

To synchronize with vblank, a new SYNC counter is introduced that
records the number of vblanks since some time in the past. The clients
can then issue SyncAwait requests before any copy they want
synchronized with vblank. This allows the client to do useful
processing while it waits, which I don't believe is the case now.

As an additional benefit, the PixmapFromBufferObject request would
also be useful as the basis of a better shared-memory feature that
allows the memory to be used as target for accelerated rendering, and
to be DMA'ed from or mapped into GTT tables.


Soren

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Clip Lists

2007-11-27 Thread Keith Packard

On Wed, 2007-11-28 at 06:19 +0100, Soeren Sandmann wrote:

> - When a client wishes to copy something to the frontbuffer (for
>   whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
>   plain old XCopyArea() with the generated pixmap. The X server is
>   then responsible for any clipping necessary.

Using a plain old XCopyArea will make scheduling this in the kernel
quite a bit harder; if the kernel knows it's doing a swap buffer, then
it can interrupt ongoing rendering and do the copy at higher priority,
precisely when the vblank interrupt lands. Plus, you've just added the
latency of a pair of context switches to the frame update interval.

Also, placing any user-mode code in the middle of the interrupt->blt
logic will occasionally cause tearing on the screen; having the kernel
push the blt just-in-time means we'd have reliable swaps (well, down to
the context switch time in the graphics hardware).

I like the simplicity, and we'll certainly be wanting the
pixmap-from-object API for lots of other fun stuff, but buffer swaps may
still need more magic than we can manage at this point.

I also wonder what the effects of a compositing manager are in this
environment -- ideally, your 'buffer swap' would be just a renaming of
two buffers, and not involve any data copying at all. Keeping all of
that hidden behind an abstract API will let us move from copying the
data to swizzling pointers without breaking existing apps.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part
-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Clip Lists

2007-11-28 Thread Stephane Marchesin
On 28 Nov 2007 06:19:39 +0100, Soeren Sandmann <[EMAIL PROTECTED]> wrote:
>
> "Stephane Marchesin" <[EMAIL PROTECTED]> writes:
>
> > I fail to see how this works with a lockless design. How do you ensure
> the X
> > server doesn't change cliprects between the time it has written those in
> the
> > shared ring buffer and the time the DRI client picks them up and has the
> > command fired and actually executed ? Do you lock out the server during
> that
> > time ?
>
> The scheme I have been advocating is this:
>
> - A new extension is added to the X server, with a
>   PixmapFromBufferObject request.
>
> - Clients render into a private back buffer object, for which they
>   used the new extension to generate a pixmap.
>
> - When a client wishes to copy something to the frontbuffer (for
>   whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
>   plain old XCopyArea() with the generated pixmap. The X server is
>   then responsible for any clipping necessary.
>
> This scheme puts all clip list management in the X server. No
> cliprects in shared memory or in the kernel would be required. And no
> locking is required since the X server is already processing requests
> in sequence.


Yes, that is the idea I want to do for nvidia hardware.
Although I'm not sure if we can/want to implement it in term of X primitives
or a new X extension.


To synchronize with vblank, a new SYNC counter is introduced that
> records the number of vblanks since some time in the past. The clients
> can then issue SyncAwait requests before any copy they want
> synchronized with vblank. This allows the client to do useful
> processing while it waits, which I don't believe is the case now.


Since we can put a "wait until vblank on crtc #X" command to a fifo on
nvidia hardware, the vblank issue is non-existent for us. We get precise
vblank without CPU intervention.

Stephane
-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Clip Lists

2007-11-28 Thread Keith Whitwell
Stephane Marchesin wrote:
> 
> 
> On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann* <[EMAIL PROTECTED] 
> > wrote:
> 
> "Stephane Marchesin" <[EMAIL PROTECTED]
> > writes:
> 
>  > I fail to see how this works with a lockless design. How do you
> ensure the X
>  > server doesn't change cliprects between the time it has written
> those in the
>  > shared ring buffer and the time the DRI client picks them up and
> has the
>  > command fired and actually executed ? Do you lock out the server
> during that
>  > time ?
> 
> The scheme I have been advocating is this:
> 
> - A new extension is added to the X server, with a
>   PixmapFromBufferObject request.
> 
> - Clients render into a private back buffer object, for which they
>   used the new extension to generate a pixmap.
> 
> - When a client wishes to copy something to the frontbuffer (for
>   whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
>   plain old XCopyArea() with the generated pixmap. The X server is
>   then responsible for any clipping necessary.
> 
> This scheme puts all clip list management in the X server. No
> cliprects in shared memory or in the kernel would be required. And no
> locking is required since the X server is already processing requests
> in sequence. 
> 
> 
> Yes, that is the idea I want to do for nvidia hardware.
> Although I'm not sure if we can/want to implement it in term of X 
> primitives or a new X extension.
>  
> 
> To synchronize with vblank, a new SYNC counter is introduced that
> records the number of vblanks since some time in the past. The clients
> can then issue SyncAwait requests before any copy they want
> synchronized with vblank. This allows the client to do useful
> processing while it waits, which I don't believe is the case now.
> 
> 
> Since we can put a "wait until vblank on crtc #X" command to a fifo on 
> nvidia hardware, the vblank issue is non-existent for us. We get precise 
> vblank without CPU intervention.

You still have some issues...

The choice is: do you put the wait-until-vblank command in the same fifo 
as the X server rendering or not?

If yes -- you end up with nasty latency for X as its rendering is 
blocked by swapbuffers.

If no -- you face the question of what to do when cliprects change.

The only way to make 'no' work is to effectively block the X server from 
changing cliprects while such a command is outstanding -- which leads 
you back to latency issues - probably juddery window moves when 3d is 
active.

I don't think hardware gives you a way out of jail for swapbuffers in 
the presence of changing cliprects.

Keith

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Clip Lists

2007-11-28 Thread Stephane Marchesin
On 11/28/07, Keith Whitwell <[EMAIL PROTECTED]> wrote:
>
> Stephane Marchesin wrote:
> >
> >
> > On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann* <[EMAIL PROTECTED]
> > > wrote:
> >
> > "Stephane Marchesin" <[EMAIL PROTECTED]
> > > writes:
> >
> >  > I fail to see how this works with a lockless design. How do you
> > ensure the X
> >  > server doesn't change cliprects between the time it has written
> > those in the
> >  > shared ring buffer and the time the DRI client picks them up and
> > has the
> >  > command fired and actually executed ? Do you lock out the server
> > during that
> >  > time ?
> >
> > The scheme I have been advocating is this:
> >
> > - A new extension is added to the X server, with a
> >   PixmapFromBufferObject request.
> >
> > - Clients render into a private back buffer object, for which they
> >   used the new extension to generate a pixmap.
> >
> > - When a client wishes to copy something to the frontbuffer (for
> >   whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
> >   plain old XCopyArea() with the generated pixmap. The X server is
> >   then responsible for any clipping necessary.
> >
> > This scheme puts all clip list management in the X server. No
> > cliprects in shared memory or in the kernel would be required. And
> no
> > locking is required since the X server is already processing
> requests
> > in sequence.
> >
> >
> > Yes, that is the idea I want to do for nvidia hardware.
> > Although I'm not sure if we can/want to implement it in term of X
> > primitives or a new X extension.
> >
> >
> > To synchronize with vblank, a new SYNC counter is introduced that
> > records the number of vblanks since some time in the past. The
> clients
> > can then issue SyncAwait requests before any copy they want
> > synchronized with vblank. This allows the client to do useful
> > processing while it waits, which I don't believe is the case now.
> >
> >
> > Since we can put a "wait until vblank on crtc #X" command to a fifo on
> > nvidia hardware, the vblank issue is non-existent for us. We get precise
> > vblank without CPU intervention.
>
> You still have some issues...
>
> The choice is: do you put the wait-until-vblank command in the same fifo
> as the X server rendering or not?
>
> If yes -- you end up with nasty latency for X as its rendering is
> blocked by swapbuffers.


Yes, I want to go for that simpler approach first and see if the blocking
gets bad (I can't really say until I've tried).

Stephane
-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Clip Lists

2007-11-28 Thread Keith Whitwell
Stephane Marchesin wrote:
> 
> 
> On 11/28/07, *Keith Whitwell* <[EMAIL PROTECTED] 
> > wrote:
> 
> Stephane Marchesin wrote:
>  >
>  >
>  > On 28 Nov 2007 06:19:39 +0100, *Soeren Sandmann*
> <[EMAIL PROTECTED] 
>  > mailto:[EMAIL PROTECTED]>>> wrote:
>  >
>  > "Stephane Marchesin" <[EMAIL PROTECTED]
> 
>  >  >> writes:
>  >
>  >  > I fail to see how this works with a lockless design. How
> do you
>  > ensure the X
>  >  > server doesn't change cliprects between the time it has
> written
>  > those in the
>  >  > shared ring buffer and the time the DRI client picks them
> up and
>  > has the
>  >  > command fired and actually executed ? Do you lock out the
> server
>  > during that
>  >  > time ?
>  >
>  > The scheme I have been advocating is this:
>  >
>  > - A new extension is added to the X server, with a
>  >   PixmapFromBufferObject request.
>  >
>  > - Clients render into a private back buffer object, for which
> they
>  >   used the new extension to generate a pixmap.
>  >
>  > - When a client wishes to copy something to the frontbuffer (for
>  >   whatever reason - glXSwapBuffers(), glCopyPixels(), etc),
> it uses
>  >   plain old XCopyArea() with the generated pixmap. The X
> server is
>  >   then responsible for any clipping necessary.
>  >
>  > This scheme puts all clip list management in the X server. No
>  > cliprects in shared memory or in the kernel would be
> required. And no
>  > locking is required since the X server is already processing
> requests
>  > in sequence.
>  >
>  >
>  > Yes, that is the idea I want to do for nvidia hardware.
>  > Although I'm not sure if we can/want to implement it in term of X
>  > primitives or a new X extension.
>  >
>  >
>  > To synchronize with vblank, a new SYNC counter is introduced that
>  > records the number of vblanks since some time in the past.
> The clients
>  > can then issue SyncAwait requests before any copy they want
>  > synchronized with vblank. This allows the client to do useful
>  > processing while it waits, which I don't believe is the case
> now.
>  >
>  >
>  > Since we can put a "wait until vblank on crtc #X" command to a
> fifo on
>  > nvidia hardware, the vblank issue is non-existent for us. We get
> precise
>  > vblank without CPU intervention.
> 
> You still have some issues...
> 
> The choice is: do you put the wait-until-vblank command in the same fifo
> as the X server rendering or not?
> 
> If yes -- you end up with nasty latency for X as its rendering is
> blocked by swapbuffers.
> 
> 
> Yes, I want to go for that simpler approach first and see if the 
> blocking gets bad (I can't really say until I've tried).

I'm all for experiments such as this.

Although I have a strong belief how it will turn out, nothing is better 
at changing these sorts of beliefs than actual results.

Keith

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Clip Lists

2007-11-28 Thread Soeren Sandmann
Keith Packard <[EMAIL PROTECTED]> writes:

> On Wed, 2007-11-28 at 06:19 +0100, Soeren Sandmann wrote:
> 
> > - When a client wishes to copy something to the frontbuffer (for
> >   whatever reason - glXSwapBuffers(), glCopyPixels(), etc), it uses
> >   plain old XCopyArea() with the generated pixmap. The X server is
> >   then responsible for any clipping necessary.
> 
> Using a plain old XCopyArea will make scheduling this in the kernel
> quite a bit harder; if the kernel knows it's doing a swap buffer, then
> it can interrupt ongoing rendering and do the copy at higher priority,
> precisely when the vblank interrupt lands. 

Waiting for vblank is only really interesting for full-screen,
non-clipped clients. In the non-fullscreen case, XCopyArea() type
blitting will actually look better than out-of-band blitting beit
avoids the strange effect where the contents of a window seems to lag
behind the frame when you drag it around. And the clipped case is
slowly going away as compositing becomes the norm.

So in the interesting case the X server would see a fullscreen
CopyArea that was preceded by some sort of wait-for-vblank request
(using SYNC or some other mechanism). The X server then calls a

copy_contents (buf_obj src, 
   buf_obj dest,
   bool wait_blank);

interface in the kernel which will schedule the blit at next vblank
and send a notification back to the server when the blit has finished
and the source buffer can be reused.

Page flipping could be done similarly. A new request

SwapArea (Drawable src, Drawable dest, 
  int src_x, int src_y,
  int width, int height,
  int dest_x, int dest_y)

would be added to the server, and a corresponding

swap_contents (buf_obj obj1, 
   buf_obj obj2,
   bool wait_blank);

to the kernel for the full-screen case. The kernel interface would use
pointer swizzling and page-flips if possible.

If clip rectangles appear while these commands are outstanding, the X
server would just wait for them to complete. Going from "no clip" to
"clip" will be much, much less common than going from "clip" to
"different clip".

So basically, using XCopyArea() doesn't rule out having the
implementation in the kernel.

> Plus, you've just added the latency of a pair of context switches to
> the frame update interval.

I don't think it is realistic to completely avoid switching to the
server in the course of a frame update for these reasons:

- If the client is a compositing manager, then it wouldn't be drawing
  at all unless it was getting damage events from the server

- Clients often get input events at a frequency on the order of 100Hz.

- GL pixel copying routines such as glXSwapBuffer() need to generate
  damage events. With XCopyArea() we get them almost for free.



Soren

-
SF.Net email is sponsored by: The Future of Linux Business White Paper
from Novell.  From the desktop to the data center, Linux is going
mainstream.  Let it simplify your IT future.
http://altfarm.mediaplex.com/ad/ck/8857-50307-18918-4
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel