I've been thinking about memory management again, and decided to try a
fresh start.  This is a distillation of my thoughts based on two
premises:

1) Use of an ARB_vbo derived API and semantics.
2) An initial implementation focus on an AGP-only manager.

In this email I try and outline the API and its consequences which
should be fairly hardware independent, and apply to both AGP and Video
RAM managers.  In a followup email I'll outline an AGP implementation
which deviates sharply from the current usage of AGP in the DRI and
also from what is possible with Video RAM.

API: ARB_vbo and NV_fence
-------------------------

So, as far as an API goes, I propose that we adopt terminology and
concepts from the ARB_vbo buffer objects as that design pretty well
encapsulates the constraints we're facing.

These buffers should defined in such a way that they can be shared
between clients for PBuffer implementation, and reference counted for
automatic freeing if clients exit uncleanly.

The calls would look something like:

    CreateBuffer(GLuint size)
    BufferData(void *ptr)
    BufferSubData(...)
    MapBuffer()
    UnmapBuffer()
    DestroyBuffer()

Semantics are more or less what we expect from the ARB buffer objects.
I'm only doing things at a handwaving level here, but I would expect
many of the subtleties of ARB_vertex_buffer_object to have a place
here too - so for instance calling BufferData with a NULL pointer
would be a shorthand for discarding the current buffer contents.

I'd like to keep sizes constrained a bit at this point, hence the
specification of size in CreateBuffer().  But that may change to more
closely match ARB_vbo semantics.

In addition, the following call is needed for the driver to validate
the buffer contents are usable, and get the current offset:

    GLuint ValidateBuffer( struct buffer * )

This returns the AGP offset of the buffer, uploading it if necessary,
and ejecting existing video/agp memory contents, defragmenting memory
space or any other operation as necessary.

Finally, there is a need to synchronize memory management with the
progress of the hardware through its command queues.  In the spirit of
co-opting hardware-inspired GL extensions, I propose we look closely
at NV_fence for this functionality:

   SetFence()
   TestFence()
   FinishFence()

There would need to be some plumbing behind the scenes to associate
fences with in-use buffer objects, and tricks to improve the
performance of fences beyond the obvious implementation of an IRQ per
SetFence() call.

So far, things look fairly uncontroversial.  And just as complex as
ever, with all the unanswered questions about eviction, etc.  However,
a couple of constraints help simplify the general problem
considerably:

    1) ValidateBuffer requires the lock to be held.

    2) To improve performance, the offset returned by ValidateBuffer
       remains valid (even after unlocking) until one of two events
       occurs:

        a) a subsequent call to ValidateBuffer() returns an error code
        indicating that memory has been re-shuffled.

        b) a subsequent contended Lock() indicates by a flag in the
        SAREA that memory has been re-shuffled.

      Thus steady state rendering will not require calls to
      ValidateBuffer or its associated ioctl.

    3) There is no facility for pinning graphics memory outside locked
       regions.  While the pointer returned by MapBuffer is valid
       outside locked regions, the assumption should be that this is
       because the buffer is not resident in graphics memory but has
       been created or back-copied to host memory.  Life is a lot
       easier for the memory manager if there is no way for
       unpriviledged clients to pin graphics memory.  (See below for
       more).

To clarify, a naive (unoptimized) usage pattern would be:

    LOCK_HARDWARE()
    bboffset = ValidateBuffer( i915->backbuffer )
    vboffset = ValidateBuffer( i915->vertex_buffer )
    tex0offset = ValidateBuffer( i915->tex_buffer[0] )

    if (!bboffset || !vboffset || !texoffset)
        FALLBACK_PATH()
    else {
        emit_state_packets();
        emit_3d_prim();
        SetFence()
    }
    UNLOCK_HARDWARE()


In the event of defragmentation of the address space, buffers
validated previously within a Locked region may become invalid and
validation of buffers will have to restart.  I'll gloss over that for
now.


Vertex Buffers
--------------

Note that there is quite a bit of handwaving going on regarding the
vertex buffer.

Specifically: Is it mapped into the GART?  How do vertices for the
next lot of primitives get into it?

While the lock is not held, we cannot guarantee that buffers won't be
shuffled in memory by other contexts, so we cannot guarantee you can
write to the AGP memory when the lock isn't held.

If necessary, I propose pinning a fixed size region and running the
old DRM DMA buffer mechanism out of that region, rather than allowing
clients to create small pinned regions scattered across and
fragmenting the managed memory space.

As an alternative to pinning or DMA buffers, a circular ring of VBO's
could be used, in a Map(buffer[n]), write vertices to buffer, Unmap,
Lock, Validate, Draw, Unlock, Map(buffer[n+1]) fashion. This should
work OK in the AGP implementation I propose later.

At this stage, still an open question how best to handle this.

This is mainly a problem for cards without hardware vertex shaders or
TNL, like the i915.  For cards with full TNL we would expect vertices
to be treated as VBOs and even immediate-mode vertices to be presented
to the driver as VBOs.

Pinning - Not For Unpriviledged Use
-----------------------------------

There is no intention at this stage to support a pinning operation for
general buffers in video or agp memory.  There are well-defined
exceptions - the front buffer and any other buffers accessed by
hardware on its own volition, such as:
        
         - The front buffer
         - Video overlays
         - The hardware cursor
         - The i810 status page
         - Any other writeback areas.

And if necessary, as described above, to allow clients access to some
video or agp memory outside locked regions:

         - DRI DMA buffer region

Note that most of the first group may change size or position over
time in response to rotation events, but the DMA buffer region will be
effectively inviolate.

Back Buffers
------------

Note in particular that the backbuffer is just another buffer object
and must be validated like textures, etc.  The intention is that when
back or depth buffers aren't needed, memory pressure will eventually
force them out of the GART table, freeing up space for more textures
or pixmap cache.

Pixmap Cache
------------

Also pixmap cache should be just another memory buffer, that the X
server will have to validate before accessing, assuming XAA can be
convinced to play along with this.

Clients have access to full hardware memory
-------------------------------------------

Note finally that the requirement to take the lock before validating
and the absense of a pinning operation for general purpose buffers
means that it is possible for a single client to access and use the
whole memory pool upto the hardware limits - there is no situation
where another client can prevent (eg by fragmentation) a client that
needs access to a lot of video memory from acheiving that, up to
hardware limits (minus the few pinned buffers mentioned above).


Hardware Synchronization
------------------------

This is fairly undeveloped above, but I feel the intent is clear and
the details will become obvious during implementation.

One question is whether there needs to be a fence issued quite so
often as in the example above, or could the driver do it periodically
plus by the infrastructure in certain special circumstances - memory
defragmentation, buffer flips and contended locks for instance.






-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
--
_______________________________________________
Dri-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to