Linus Torvalds wrote:
> Keith,
>  I've got a silly question for you..
> 
> Why do you need a kernel driver at all for the R200?

I go into your mail below, but the only good answer I have is:
        1) To allow us to mmap the framebuffer, agp and mmio regions (or to handle 
mmio 
for us without us mapping it)
        2) Backwards compatibility.  The ddx module is shared with the radeon & wants 
to talk to a kernel module.  This can be worked around.
        
> There are a few things that the kernel can do for you:
> 
>  - Locking.
> 
>       However, there are better (and faster) locks available in user
>       space, namely the "futex" interface. They take some getting used
>       to, but you can have some _truly_ low-cost locking using them.
> 
>       Example library can be found at:
>               http://www.kernel.org/pub/linux/kernel/people/rusty/futex-2.0.tar.gz

I'm not sure how these are so much better in concept than the concept behind 
our existing lock.  Both seem to have a  userspace fast path (with a locked 
cycle) and a syscall/ioctl slow path on contention.

The implementation of our lock has various workstation-leftovers like 
infrastructure for real virtualization of the hardware (kernel does context 
switching on lock contention), which aren't really used.

>  - Interrupts
> 
>       You don't use these right now, and as far as I can tell the main
>       reason for using them would be to just synchronize page flipping
>       with the framerate. No?

Correct.

>  - IOIO and IOMEM access
> 
>       iopl() gives access to IOIO
>       mmap() and AGP driver gives access to IOMEM/AGP
> 
>       IOIO is actualy slightly slower in CPL3 than in CPL0, but it's
>       slower in CPU cycles, not in IO cycles. And since IO cycles
>       definitely dominate in IOIO (by orders of magnitude), this isn't
>       likely to be an issue.
> 
>       And IOMEM is the same speed, since the only overhead for
>       user space is the TLB, and AGP mappings use the TLB even in kernel
>       space (vmalloc).

I'm not sure how this works.  Does the agp module have a facility to allow the 
client to mmap the card mmio region & the framebuffer?    I wasn't aware of this.

>  - Global datastructures
> 
>       I think you do the aging right now globally or something.
> 
>       What else? Right now you cache some stuff globally (the ring tail
>       ptr etc), but that isn't necessary: you can re-create the
>       information on demand after a lock aquisition (since it is only
>       needed when contention happens).

Contention gives us a hint to check if the cliprects have changed.  There's a 
fairly ugly mechanism for retrieving the new cliprects (drop hw lock, get a 
spin-type lock, send a request, get a reply, drop the spin-lock, re-aquire the 
hw lock).  However - the check to see if this is necessary is cheap and the 
cliprects aren't required that often anyway.


> So from what I can tell, a trusted entity doesn't strictly _need_ any
> kernel support.
> 
> Yes, kernel support (or indirect rendering) is needed for untrusted
> applications, but it might actually be interesting to see what a
> direct-rendering all-user-land implementation looks like. It has some
> debugging advantages, and it may actually make sense to start from a
> totally trusted app that goes as fast as humanly possible, and then when
> that has been optimized to death look at just where the interfaces make
> the most sense..

This is closer & closer to the Utah direct rendering model (not that I'm 
complaining...)  In that model synchronization was achieved by having the X 
server be the only entity to touch the mmio region, but the client had direct 
access to a (large) dma buffer which it could ask the X server (via extended 
X11 protocol) to dispatch for it.  The X server would take care of cliprect 
issues.

This actually worked pretty well, but was limited to a single direct client 
(second & subsequent clients would go indirect, maybe sw-indirect, I forget). 
  A little bit of work could extend that fairly easily to multiple clients.

It also required that the direct client be run as root in order to mmap the 
framebuffer & dma region.

I think it's probably time to start considering a rewrite/redesign of the 3d 
infrastructure based around a minimalist approach.  There's just so much 
leftover code hanging around I have to ask what can be salvaged.

Keith


----------------------------------------------------------------------------
                   Bringing you mounds of caffeinated joy
                      >>>     http://thinkgeek.com/sf    <<<

_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to