What follows is the collected requirements for the new DRI memory manager. This list is the product of several discussions between Brian, Keith, Allen, and myself several months ago. After the list, I have included some of my thoughts on the big picture that I see from these requirements.

1. Single-copy textures

Right now each texture exists in two or three places. There is a copy in on-card or AGP memory, in system memory (managed by the driver), and in application memory. Any solution should be able to eliminate one or two of those copies.

If the driver-tracked copy in system memory is eliminated, care must be taken when the texture needs to be removed from on-card / AGP memory. Additionally, changes to the texture image made via glCopyTexImage must not be lost.

It may be possible to eliminate one copy of the texture using APPLE_client_storage. A portion of this could be done purely in Mesa. If the user supplied image matches the internal format of the texture, then the driver can use the application's copy of the texture in place of the driver's copy.

Modulo implementation difficulties, it may even be possible to use the pages that hold the texture as backing store for a portion of the AGP aperture. The is the only way to truly achieve single-copy textures. The implementation may prove too difficult on existing x86 systems to be worth the effort. This functionality is available in MacOS 10.1, so the same difficulties may not exist on Linux PPC.

2. Share texture memory among multiple OpenGL contexts

Texture memory is currently shared by all OpenGL contexts. That is, when an OpenGL context switch happens it is not necessary to reload all textures. The texture manager needs to continue to use a paged memory model (as opposed to a segmented memory model).

3. Accommodate other OpenGL buffers

The allocator should also be used for allocating vertex buffers, render targets (pbuffers, back-buffers, depth-buffers, etc.), and other buffers. This can be useful beyond supporting SGIX_pbuffer, ARB_vertex_array_objects, and optimized display lists. Dynamically allocating per-context depth and back-buffers will allow multiple Z depths be used at a time (i.e., 16-bit depth-buffer for one window and 24-bit depth-buffer for another) and super-sampling FSAA.

4. Support texture pseudo-render targets

Accelerating some OpenGL functions, such as glCopyTexImage, SGIS_generate_mipmaps, and ARB_render_texture, may require special support and consideration.

5. Additional AGP related issues

There may be cases where textures need to be moved back-and-forth between AGP and on-card memory. For example, a texture might reside in AGP memory, and an operation may be requested that requires that the texture be in on-card memory.

6. Additional texture formats and layouts

Compressed, 1D, 3D, cube map, and non-power-of-two textures need to be supported in addition to "traditional" 2D power-of-two textures.

7. Allen Akin's pinned-texture proposal

If we ever expose memory management to the user (beyond texture priorities) we want to be sure our allocator is designed with this in mind.

8. Device independence

As much as possible, the source code for the memory manager should live somewhere device independent. This is both for the benefit of newly developed drivers and for maintaining existing drivers.

* My Thoughts *

There are really only two radical departures from the existing memory manager. The first is using the memory manager for non-texture memory objects. The second, which is partially a result of the first, is the need to "pin" objects. It would not do to have one context kick another context's depth-buffer out of memory!

My initial thought on how to accomplish this was to move the allocator into the kernel. There would be a low-level allocator that could be used for non-texture buffers and a way to create textures (from data). In the texture case, the kernel would only allocate memory when a texture was used. In stead of using the actual texture address in drawing command streams, the user-level driver would insert texture IDs. The kernel would use these IDs to map to real texture addresses.

The benefit is that all memory management would be handled by a single omniscient execution context (the kernel). The downside is that it would move a LOT of code into the kernel. It would be almost entirely OS and device independent, but there would likely be a lot of it.

After talking with Jeff Hartmann in IRC on 1/13, I started thinking about all of this again. Jeff had some serious reservations about moving that volume of code into the kernel, and he believed that all of the requirements could be met by a purely user-space implementation. After thinking about things some more, I'm starting to agree.

What follows is a fairly random series of thoughts on how a user-space memory manager could be made to work.

I believe that everything could be done by breaking each memory space down into blocks (as is currently done) and tracking two values, either implicitly or explicitly, with each block. The first value is some sort of swap-out priority. This is currently implicitly tracked by the list ordering in the SAREA. The other value is basically a semaphore, but it could be implemented as a simple can-swap bit.

Blocks that have active depth-buffer would never have can-swap set. Blocks that have "normal" textures, back-buffer, render-target textures, and puffers would have their can-swap bit conditionally set. Each of these types of blocks would have the can-swap bit cleared under the following situations:

- Normal textures - While a rendering operation is queued that
will use the texture.
- SGIS_generate_mipmaps textures - While the blits are in progress
to create the filtered mipmaps.
- glCopyTexImage textures - While the blit to copy image data to
the texture is in progress and while the data in the texture has
not been copied to some sort of backing store.
- pbuffers - While rendering operations to the pbuffer are in
progress. pbuffers have a mechanism to tell an application when
the contents of the pbuffer have been "lost." This could be
exploited by the memory manager. One caveat is when a pbuffer
is bound to a texture (ARB_render_texture). While the pbuffer
is bound to a texture, its contents cannot be lost. Can the
contents be "swapped out" to some sort of backing store, like
with glCopyTexImage targets?
- Back-buffers - In unextended GLX, back-buffers can never be
swapped. However, if OML_sync_control is available, a "double
buffered" visual may want to have many virtual back-buffers.
Each time glXSwapBuffersMscOML (essentially an asynchronous
glXSwapBuffers call) is made, a new back-buffer is allocated as
the rendering target. Once a back-buffer is copied to the
front-buffer (i.e., the queued buffer-swap completes), the
back-buffer can be swapped-out.

There may be other situations where can-swap is cleared, but that's all I could think of. Similar rules would exist for vertex buffers (for ARB_vertex_array_object, EXT_compiled_vertex_array, optimized display lists, etc.).

Only a single bit per block is needed in the SAREA. That bit is the union of the bits for each object that is part of that block. This union must be calculated by the user-space driver. This presents a possible problem of user-space clients failing to update the can-swap bits for some reason (process hung on blocking IO call?). The current implementation avoids this problem by forcing all bocks to be swappable at all times.

At this point I'm left with a few questions.

1. In a scheme like this, how could processes be forced to update the
can-swap bits on blocks that they own?
2. What is the best way for processes to be notified of events that
could cause can-swap bits to change (i.e., rendering completion,
asynchronous buffer-swap completion, etc.)? Signals from the kernel?
Polling "age" variables?
3. If some sort of signal based notification is used, could it be used
to implement NV_fence and / or APPLE_fence?
4. How could the memory manager handle objects that span multiple
blocks? In other words, could the memory manager be made to prefer
to swap-out blocks that wholly contain all of the objects that
overlap the block? Are there other useful metrics? Prefer to
swap-out blocks that are half full over blocks that are completely
full?
5. What other things I have I missed that might prevent this system
from working? :)



-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to