****SPAM(5.1)**** Re: Memory management - another proposal.
Spam detection software, running on the system darkcity.gna.ch, has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see postmaster for details. Content preview: On Sun, 2006-01-08 at 18:15 +, Keith Whitwell wrote: Back BuffersNote in particular that the backbuffer is just another buffer object and must be validated like textures, etc. The intention is that when back or depth buffers aren't needed, memory pressure will eventually force them out of the GART table, freeing up space for more textures or pixmap cache. [...] Content analysis details: (5.1 points, 5.0 required) pts rule name description -- -- 2.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address [84.74.135.195 listed in dnsbl.sorbs.net] 3.1 RCVD_IN_XBLRBL: Received via a relay in Spamhaus XBL [84.74.135.195 listed in sbl-xbl.spamhaus.org] 1.7 RCVD_IN_NJABL_DUL RBL: NJABL: dialup sender did non-local SMTP [84.74.135.195 listed in combined.njabl.org] -1.7 AWLAWL: From: address is in the auto white-list ---BeginMessage--- On Sun, 2006-01-08 at 18:15 +, Keith Whitwell wrote: Back Buffers Note in particular that the backbuffer is just another buffer object and must be validated like textures, etc. The intention is that when back or depth buffers aren't needed, memory pressure will eventually force them out of the GART table, freeing up space for more textures or pixmap cache. With the Composite extension in mind, per-context renderbuffers should be supported if not the default. Pixmap Cache Also pixmap cache should be just another memory buffer, that the X server will have to validate before accessing, assuming XAA can be convinced to play along with this. I doubt it can, but EXA should. FWIW, I like the direction of your proposals, although I share Roland's doubts about treating AGP separately from video RAM, even in the beginning. -- Earthling Michel Dänzer | Debian (powerpc), X and DRI developer Libre software enthusiast| http://svcs.affero.net/rm.php?r=daenzer ---End Message---
Re: Memory management - another proposal.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Keith Whitwell wrote: To clarify, a naive (unoptimized) usage pattern would be: LOCK_HARDWARE() bboffset = ValidateBuffer( i915-backbuffer ) vboffset = ValidateBuffer( i915-vertex_buffer ) tex0offset = ValidateBuffer( i915-tex_buffer[0] ) if (!bboffset || !vboffset || !texoffset) FALLBACK_PATH() else { emit_state_packets(); emit_3d_prim(); SetFence() } UNLOCK_HARDWARE() There is a problem with this design. You really need to validate a set of buffers rather than one buffer at a time. Imagine the case where those 3 buffers will exactly fill all of memory. Placing the first buffer incorrectly will make it impossible to place the other buffers without invalidating the first. Other than that, this is pretty similar to what anholt, keithp, and I came up with a few weeks ago. I just haven't started banging away at it yet. I found it very easy to be distracted from coding projects over the holiday. :) I believe that we used the name CommiteBuffers instead of ValidateBuffer, but the idea is the same. I've also come around to just putting everything in the kernel. I've been convinced (finally) that the perceived benefits of having the actual allocator in user mode (avoiding some trips into the kernel, having the allocation policy in user mode) don't outweigh the problems that it creates. In the event of defragmentation of the address space, buffers validated previously within a Locked region may become invalid and validation of buffers will have to restart. I'll gloss over that for now. I'm not 100% sure what you mean by this. Are you saying that you'd have to loop on the ValidateBuffer calls until some steady state is achieved? In the worst case, that loop would never terminate. :( If you know at CommitBuffers which buffers are needed, you can try to trivially commit the buffers (i.e., don't shuffle buffers or kick anything out). If that fails, you shuffle buffers and kick buffers out of memory until there is a free block large enough to hold all of the required buffers. Then repeat the trivial commit (it *must* succeed at this point). -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.6 (GNU/Linux) iD8DBQFDwpDjX1gOwKyEAw8RAn60AJ9B5gYdiMrvc4f3apRFQPCg//TmwwCfaEbX J4uabn186nqgbuyr+7c8lq8= =bnG2 -END PGP SIGNATURE- --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Memory management - another proposal.
Ian Romanick wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Keith Whitwell wrote: To clarify, a naive (unoptimized) usage pattern would be: LOCK_HARDWARE() bboffset = ValidateBuffer( i915-backbuffer ) vboffset = ValidateBuffer( i915-vertex_buffer ) tex0offset = ValidateBuffer( i915-tex_buffer[0] ) if (!bboffset || !vboffset || !texoffset) FALLBACK_PATH() else { emit_state_packets(); emit_3d_prim(); SetFence() } UNLOCK_HARDWARE() There is a problem with this design. You really need to validate a set of buffers rather than one buffer at a time. Imagine the case where those 3 buffers will exactly fill all of memory. Placing the first buffer incorrectly will make it impossible to place the other buffers without invalidating the first. Yes, I gloss over this a little. The intention is that Validate buffers can fail once and the process is required to start again - giving the manager a chance to start from scratch -- it's a little kludgey. Alternately you could of course wrap that kludginess up in a single function which the driver passes a list of buffers. Other than that, this is pretty similar to what anholt, keithp, and I came up with a few weeks ago. I just haven't started banging away at it yet. I found it very easy to be distracted from coding projects over the holiday. :) I believe that we used the name CommiteBuffers instead of ValidateBuffer, but the idea is the same. Oh... Sorry I missed out that conversation... Should I have been listening somewhere? I've also come around to just putting everything in the kernel. I've been convinced (finally) that the perceived benefits of having the actual allocator in user mode (avoiding some trips into the kernel, having the allocation policy in user mode) don't outweigh the problems that it creates. I do a little bit to avoid kernel trips with the conditions where an offset from ValidateBuffer can be reused. In steady-state rendering, there should be no need to call into the kernel at all. In the event of defragmentation of the address space, buffers validated previously within a Locked region may become invalid and validation of buffers will have to restart. I'll gloss over that for now. I'm not 100% sure what you mean by this. Are you saying that you'd have to loop on the ValidateBuffer calls until some steady state is achieved? In the worst case, that loop would never terminate. :( This is all about the single buffer parameter vs. a list. I think you've convinced me that a list is a better api. If you know at CommitBuffers which buffers are needed, you can try to trivially commit the buffers (i.e., don't shuffle buffers or kick anything out). If that fails, you shuffle buffers and kick buffers out of memory until there is a free block large enough to hold all of the required buffers. Then repeat the trivial commit (it *must* succeed at this point). That's exactly it. Keith --- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Memory management - another proposal.
I've been thinking about memory management again, and decided to try a fresh start. This is a distillation of my thoughts based on two premises: 1) Use of an ARB_vbo derived API and semantics. 2) An initial implementation focus on an AGP-only manager. In this email I try and outline the API and its consequences which should be fairly hardware independent, and apply to both AGP and Video RAM managers. In a followup email I'll outline an AGP implementation which deviates sharply from the current usage of AGP in the DRI and also from what is possible with Video RAM. API: ARB_vbo and NV_fence - So, as far as an API goes, I propose that we adopt terminology and concepts from the ARB_vbo buffer objects as that design pretty well encapsulates the constraints we're facing. These buffers should defined in such a way that they can be shared between clients for PBuffer implementation, and reference counted for automatic freeing if clients exit uncleanly. The calls would look something like: CreateBuffer(GLuint size) BufferData(void *ptr) BufferSubData(...) MapBuffer() UnmapBuffer() DestroyBuffer() Semantics are more or less what we expect from the ARB buffer objects. I'm only doing things at a handwaving level here, but I would expect many of the subtleties of ARB_vertex_buffer_object to have a place here too - so for instance calling BufferData with a NULL pointer would be a shorthand for discarding the current buffer contents. I'd like to keep sizes constrained a bit at this point, hence the specification of size in CreateBuffer(). But that may change to more closely match ARB_vbo semantics. In addition, the following call is needed for the driver to validate the buffer contents are usable, and get the current offset: GLuint ValidateBuffer( struct buffer * ) This returns the AGP offset of the buffer, uploading it if necessary, and ejecting existing video/agp memory contents, defragmenting memory space or any other operation as necessary. Finally, there is a need to synchronize memory management with the progress of the hardware through its command queues. In the spirit of co-opting hardware-inspired GL extensions, I propose we look closely at NV_fence for this functionality: SetFence() TestFence() FinishFence() There would need to be some plumbing behind the scenes to associate fences with in-use buffer objects, and tricks to improve the performance of fences beyond the obvious implementation of an IRQ per SetFence() call. So far, things look fairly uncontroversial. And just as complex as ever, with all the unanswered questions about eviction, etc. However, a couple of constraints help simplify the general problem considerably: 1) ValidateBuffer requires the lock to be held. 2) To improve performance, the offset returned by ValidateBuffer remains valid (even after unlocking) until one of two events occurs: a) a subsequent call to ValidateBuffer() returns an error code indicating that memory has been re-shuffled. b) a subsequent contended Lock() indicates by a flag in the SAREA that memory has been re-shuffled. Thus steady state rendering will not require calls to ValidateBuffer or its associated ioctl. 3) There is no facility for pinning graphics memory outside locked regions. While the pointer returned by MapBuffer is valid outside locked regions, the assumption should be that this is because the buffer is not resident in graphics memory but has been created or back-copied to host memory. Life is a lot easier for the memory manager if there is no way for unpriviledged clients to pin graphics memory. (See below for more). To clarify, a naive (unoptimized) usage pattern would be: LOCK_HARDWARE() bboffset = ValidateBuffer( i915-backbuffer ) vboffset = ValidateBuffer( i915-vertex_buffer ) tex0offset = ValidateBuffer( i915-tex_buffer[0] ) if (!bboffset || !vboffset || !texoffset) FALLBACK_PATH() else { emit_state_packets(); emit_3d_prim(); SetFence() } UNLOCK_HARDWARE() In the event of defragmentation of the address space, buffers validated previously within a Locked region may become invalid and validation of buffers will have to restart. I'll gloss over that for now. Vertex Buffers -- Note that there is quite a bit of handwaving going on regarding the vertex buffer. Specifically: Is it mapped into the GART? How do vertices for the next lot of primitives get into it? While the lock is not held, we cannot guarantee that buffers won't be shuffled in memory by other contexts, so we cannot guarantee you can write to the AGP memory when the lock isn't held. If necessary, I propose pinning a fixed size region and running the old DRM DMA buffer mechanism out of that region, rather than allowing clients to create small pinned