What follows is the collected requirements for the new DRI memory
manager. This list is the product of several discussions between Brian,
Keith, Allen, and myself several months ago. After the list, I have
included some of my thoughts on the big picture that I see from these
requirements.
1. Single-copy textures
Right now each texture exists in two or three places. There is a copy
in on-card or AGP memory, in system memory (managed by the driver), and
in application memory. Any solution should be able to eliminate one or
two of those copies.
If the driver-tracked copy in system memory is eliminated, care must be
taken when the texture needs to be removed from on-card / AGP memory.
Additionally, changes to the texture image made via glCopyTexImage must
not be lost.
It may be possible to eliminate one copy of the texture using
APPLE_client_storage. A portion of this could be done purely in Mesa.
If the user supplied image matches the internal format of the texture,
then the driver can use the application's copy of the texture in place
of the driver's copy.
Modulo implementation difficulties, it may even be possible to use the
pages that hold the texture as backing store for a portion of the AGP
aperture. The is the only way to truly achieve single-copy textures.
The implementation may prove too difficult on existing x86 systems to be
worth the effort. This functionality is available in MacOS 10.1, so the
same difficulties may not exist on Linux PPC.
2. Share texture memory among multiple OpenGL contexts
Texture memory is currently shared by all OpenGL contexts. That is,
when an OpenGL context switch happens it is not necessary to reload all
textures. The texture manager needs to continue to use a paged memory
model (as opposed to a segmented memory model).
3. Accommodate other OpenGL buffers
The allocator should also be used for allocating vertex buffers, render
targets (pbuffers, back-buffers, depth-buffers, etc.), and other
buffers. This can be useful beyond supporting SGIX_pbuffer,
ARB_vertex_array_objects, and optimized display lists. Dynamically
allocating per-context depth and back-buffers will allow multiple Z
depths be used at a time (i.e., 16-bit depth-buffer for one window and
24-bit depth-buffer for another) and super-sampling FSAA.
4. Support texture pseudo-render targets
Accelerating some OpenGL functions, such as glCopyTexImage,
SGIS_generate_mipmaps, and ARB_render_texture, may require special
support and consideration.
5. Additional AGP related issues
There may be cases where textures need to be moved back-and-forth
between AGP and on-card memory. For example, a texture might reside in
AGP memory, and an operation may be requested that requires that the
texture be in on-card memory.
6. Additional texture formats and layouts
Compressed, 1D, 3D, cube map, and non-power-of-two textures need to be
supported in addition to "traditional" 2D power-of-two textures.
7. Allen Akin's pinned-texture proposal
If we ever expose memory management to the user (beyond texture
priorities) we want to be sure our allocator is designed with this in mind.
8. Device independence
As much as possible, the source code for the memory manager should live
somewhere device independent. This is both for the benefit of newly
developed drivers and for maintaining existing drivers.
* My Thoughts *
There are really only two radical departures from the existing memory
manager. The first is using the memory manager for non-texture memory
objects. The second, which is partially a result of the first, is the
need to "pin" objects. It would not do to have one context kick another
context's depth-buffer out of memory!
My initial thought on how to accomplish this was to move the allocator
into the kernel. There would be a low-level allocator that could be
used for non-texture buffers and a way to create textures (from data).
In the texture case, the kernel would only allocate memory when a
texture was used. In stead of using the actual texture address in
drawing command streams, the user-level driver would insert texture IDs.
The kernel would use these IDs to map to real texture addresses.
The benefit is that all memory management would be handled by a single
omniscient execution context (the kernel). The downside is that it
would move a LOT of code into the kernel. It would be almost entirely
OS and device independent, but there would likely be a lot of it.
After talking with Jeff Hartmann in IRC on 1/13, I started thinking
about all of this again. Jeff had some serious reservations about
moving that volume of code into the kernel, and he believed that all of
the requirements could be met by a purely user-space implementation.
After thinking about things some more, I'm starting to agree.
What follows is a fairly random series of thoughts on how a user-space
memory manager could be made to work.
I believe that everything could be done by breaking each memory space
down into blocks (as is currently done) and tracking two values, either
implicitly or explicitly, with each block. The first value is some sort
of swap-out priority. This is currently implicitly tracked by the list
ordering in the SAREA. The other value is basically a semaphore, but it
could be implemented as a simple can-swap bit.
Blocks that have active depth-buffer would never have can-swap set.
Blocks that have "normal" textures, back-buffer, render-target textures,
and puffers would have their can-swap bit conditionally set. Each of
these types of blocks would have the can-swap bit cleared under the
following situations:
- Normal textures - While a rendering operation is queued that
will use the texture.
- SGIS_generate_mipmaps textures - While the blits are in progress
to create the filtered mipmaps.
- glCopyTexImage textures - While the blit to copy image data to
the texture is in progress and while the data in the texture has
not been copied to some sort of backing store.
- pbuffers - While rendering operations to the pbuffer are in
progress. pbuffers have a mechanism to tell an application when
the contents of the pbuffer have been "lost." This could be
exploited by the memory manager. One caveat is when a pbuffer
is bound to a texture (ARB_render_texture). While the pbuffer
is bound to a texture, its contents cannot be lost. Can the
contents be "swapped out" to some sort of backing store, like
with glCopyTexImage targets?
- Back-buffers - In unextended GLX, back-buffers can never be
swapped. However, if OML_sync_control is available, a "double
buffered" visual may want to have many virtual back-buffers.
Each time glXSwapBuffersMscOML (essentially an asynchronous
glXSwapBuffers call) is made, a new back-buffer is allocated as
the rendering target. Once a back-buffer is copied to the
front-buffer (i.e., the queued buffer-swap completes), the
back-buffer can be swapped-out.
There may be other situations where can-swap is cleared, but that's all
I could think of. Similar rules would exist for vertex buffers (for
ARB_vertex_array_object, EXT_compiled_vertex_array, optimized display
lists, etc.).
Only a single bit per block is needed in the SAREA. That bit is the
union of the bits for each object that is part of that block. This
union must be calculated by the user-space driver. This presents a
possible problem of user-space clients failing to update the can-swap
bits for some reason (process hung on blocking IO call?). The current
implementation avoids this problem by forcing all bocks to be swappable
at all times.
At this point I'm left with a few questions.
1. In a scheme like this, how could processes be forced to update the
can-swap bits on blocks that they own?
2. What is the best way for processes to be notified of events that
could cause can-swap bits to change (i.e., rendering completion,
asynchronous buffer-swap completion, etc.)? Signals from the kernel?
Polling "age" variables?
3. If some sort of signal based notification is used, could it be used
to implement NV_fence and / or APPLE_fence?
4. How could the memory manager handle objects that span multiple
blocks? In other words, could the memory manager be made to prefer
to swap-out blocks that wholly contain all of the objects that
overlap the block? Are there other useful metrics? Prefer to
swap-out blocks that are half full over blocks that are completely
full?
5. What other things I have I missed that might prevent this system
from working? :)
-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel
- Re: [Dri-devel] The next round of texture memory management.... Ian Romanick
- Re: [Dri-devel] The next round of texture memory manage... magenta
- Re: [Dri-devel] The next round of texture memory ma... Allen Akin
- Re: [Dri-devel] The next round of texture memory ma... Ian Romanick
- RE: [Dri-devel] The next round of texture memor... Jeff Hartmann
- Re: [Dri-devel] The next round of texture m... Ian Romanick