On Mon, 2009-06-29 at 17:57 +0200, Jerome Glisse wrote: 
> On Mon, 2009-06-29 at 12:21 +0200, Michel Dänzer wrote:
> > On Wed, 2009-06-24 at 20:17 +0200, Jerome Glisse wrote: 
> > > On Wed, 2009-06-24 at 13:25 +0200, Michel Dänzer wrote:
> > > > On Wed, 2009-06-24 at 10:21 +1000, Dave Airlie wrote:
> > > > > From: Dave Airlie <airl...@redhat.com>
> > > > > 
> > > > > This adds color tiling support for buffers in VRAM, it enables
> > > > > a color tiled fbcon and a color tiled X frontbuffer.
> > > > > 
> > > > > It changes the API:
> > > > > adds two new parameters to the object creation API (is this better 
> > > > > than
> > > > >  a set/get tiling?) we probably still need a get but not sure for 
> > > > > what yet.
> > > > > relocs are required for 2D DST_PITCH_OFFSET and SRC_PITCH_OFFSET 
> > > > > type-0,
> > > > > and 3D COLORPITCH registers.
> > > > > 
> > > > > TTM:
> > > > > adds a new check_tiling call to TTM, gets called at fault and around
> > > > > bo moves.
> > > > > 
> > > > > Issues:
> > > > > Can we integrate endian swapping in with this?
> > > > 
> > > > Not sure about that in gernal; unless I'm missing something, it would
> > > > require moving BOs from TT to VRAM for CPU mappings, and I don't think
> > > > that's a good idea.
> > > > 
> > > > It might be useful for scanout buffers though. Maybe another object
> > > > creation parameter which specifies the CPU byte swapping vs. the GPU
> > > > byte order (little endian) could work, then we could use surface
> > > > registers for VRAM mappings or GPU byte swapping bits for GPU access
> > > > to/from TT.
> > > 
> > > I think we should let user ask at gem map ioctl time if userspace wants
> > > an surface backed mapping or not, and gem map will reply with a success
> > > or failure. So if object is in vram and there is a surface reg available
> > > it will succeed, if object is in system ram it will report to userspace
> > > that there is not automatic untiling and that userspace is on its own
> > > to untile the buffer.
> > > 
> > > For the X server that the front buffer is mapped first and never
> > > unmapped, it should get a surface (assuming no other process already
> > > stole all the surface). For pixmap i think be better of not using
> > > tiling for time being (or macro tiling only benchmark below).
> > > 
> > > Mesa, map/unmap things and should be able to untile on its own for
> > > front/zbuffer (we need to add texture but i am not sure it's worth
> > > it, see benchmark below).
> > > 
> > > Using gem map ioctl makes me wonder if we want to specify tiling at
> > > bo creation time. If we got several users, each one with its X server
> > > we might exhaust the number of surface register.
> > 
> > Over the weekend I discovered that unfortunately the byte swapping bits
> > of the 3D engine in my PowerBook's RV350 seem to be ineffective. So I
> > thought some more about how byte order could be handled, and I've come
> > up with the following:
> > 
> >       * Specify byte-swapping between CPU and GPU access at BO creation
> >         time.
> >       * Swap bytes accordingly when moving BOs between TT (CPU byte
> >         order) and VRAM (GPU byte order).
> >       * If byte swapping is required, use surface registers for CPU
> >         mappings while BOs reside in VRAM. (Re-)Allocate surface
> >         registers on page faults.
> >       * If byte swapping is required, only allow GPU access in VRAM (can
> >         be handled in userspace I think).
> > 
> > That is assuming we can at least sensibly swap bytes when moving BOs -
> > hopefully the byte-swapping bits of DMA transfers work... otherwise, I
> > don't see any other solution but to leave byte order completely up to BO
> > users, which would probably preclude 16 bpp in X.
> > 
> > 
> > Could a scheme like this be useful for tiling as well?
> 
> The more i think about surface the more i think we should forget them,
> it really sounds like a lot of pain to allocate surface on fault and i
> fear we will have enough concurrent map to run out of surface reg and
> thus lead to surface reg allocation contention issue which will kill
> performance badly.

FWIW, it might be possible to mitigate that somewhat using something
like the surface register allocator in radeon_state.c, and possibly even
moving BOs with compatible surface attributes next to each other in
VRAM.

> Of course we don't have a good infrastructure in X to deal with tiling
> or swapping (assuming wfb is too slow and not an option).

I'm not even sure it would be possible to handle these things with wfb,
as the wrapped memory access callbacks don't get any explicit
information about which drawable or region of it they're dealing with.
Even assuming it would be possible, it would no doubt be very slow (e.g.
for blits there would be one callback for each *byte*) and would incur
overhead even in cases that wouldn't require any special treatment at
all. So I really don't see wfb as a feasible solution for this.

> Still, i think we better teach userspace to deal with that rather than
> hopping that we will have some hw does the magic for us.

I've been working on a userspace only solution for byte order. It's
basically working, but it's exposing bad assumptions in various X client
code. :} Also, it requires the upcoming pixman and xserver releases, and
as noted above it can't handle 16 bpp visuals in X.


-- 
Earthling Michel Dänzer           |                http://www.vmware.com
Libre software enthusiast         |          Debian, X and DRI developer

------------------------------------------------------------------------------
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to