****SPAM(5.1)**** Re: Memory management - another proposal.

2006-01-10 Thread Michel Dänzer
Spam detection software, running on the system darkcity.gna.ch, has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
postmaster for details.

Content preview:  On Sun, 2006-01-08 at 18:15 +, Keith Whitwell wrote:
Back BuffersNote in particular that the backbuffer is just
  another buffer object  and must be validated like textures, etc. The
  intention is that when  back or depth buffers aren't needed, memory
  pressure will eventually  force them out of the GART table, freeing up
  space for more textures  or pixmap cache. [...] 

Content analysis details:   (5.1 points, 5.0 required)

 pts rule name  description
 -- --
 2.0 RCVD_IN_SORBS_DUL  RBL: SORBS: sent directly from dynamic IP address
[84.74.135.195 listed in dnsbl.sorbs.net]
 3.1 RCVD_IN_XBLRBL: Received via a relay in Spamhaus XBL
[84.74.135.195 listed in sbl-xbl.spamhaus.org]
 1.7 RCVD_IN_NJABL_DUL  RBL: NJABL: dialup sender did non-local SMTP
[84.74.135.195 listed in combined.njabl.org]
-1.7 AWLAWL: From: address is in the auto white-list


---BeginMessage---
On Sun, 2006-01-08 at 18:15 +, Keith Whitwell wrote:
 
 Back Buffers
 
 
 Note in particular that the backbuffer is just another buffer object
 and must be validated like textures, etc.  The intention is that when
 back or depth buffers aren't needed, memory pressure will eventually
 force them out of the GART table, freeing up space for more textures
 or pixmap cache.

With the Composite extension in mind, per-context renderbuffers should
be supported if not the default.


 Pixmap Cache
 
 
 Also pixmap cache should be just another memory buffer, that the X
 server will have to validate before accessing, assuming XAA can be
 convinced to play along with this.

I doubt it can, but EXA should.


FWIW, I like the direction of your proposals, although I share Roland's
doubts about treating AGP separately from video RAM, even in the
beginning.


-- 
Earthling Michel Dänzer  | Debian (powerpc), X and DRI developer
Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer
---End Message---


Re: Memory management - another proposal.

2006-01-09 Thread Ian Romanick
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Keith Whitwell wrote:

 To clarify, a naive (unoptimized) usage pattern would be:
 
 LOCK_HARDWARE()
 bboffset = ValidateBuffer( i915-backbuffer )
 vboffset = ValidateBuffer( i915-vertex_buffer )
 tex0offset = ValidateBuffer( i915-tex_buffer[0] )
 
 if (!bboffset || !vboffset || !texoffset)
 FALLBACK_PATH()
 else {
 emit_state_packets();
 emit_3d_prim();
 SetFence()
 }
 UNLOCK_HARDWARE()

There is a problem with this design.  You really need to validate a set
of buffers rather than one buffer at a time.  Imagine the case where
those 3 buffers will exactly fill all of memory.  Placing the first
buffer incorrectly will make it impossible to place the other buffers
without invalidating the first.

Other than that, this is pretty similar to what anholt, keithp, and I
came up with a few weeks ago.  I just haven't started banging away at it
yet.  I found it very easy to be distracted from coding projects over
the holiday. :)  I believe that we used the name CommiteBuffers
instead of ValidateBuffer, but the idea is the same.

I've also come around to just putting everything in the kernel.  I've
been convinced (finally) that the perceived benefits of having the
actual allocator in user mode (avoiding some trips into the kernel,
having the allocation policy in user mode) don't outweigh the problems
that it creates.

 In the event of defragmentation of the address space, buffers
 validated previously within a Locked region may become invalid and
 validation of buffers will have to restart.  I'll gloss over that for
 now.

I'm not 100% sure what you mean by this.  Are you saying that you'd have
to loop on the ValidateBuffer calls until some steady state is achieved?
 In the worst case, that loop would never terminate. :(

If you know at CommitBuffers which buffers are needed, you can try to
trivially commit the buffers (i.e., don't shuffle buffers or kick
anything out).  If that fails, you shuffle buffers and kick buffers out
of memory until there is a free block large enough to hold all of the
required buffers.  Then repeat the trivial commit (it *must* succeed at
this point).
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDwpDjX1gOwKyEAw8RAn60AJ9B5gYdiMrvc4f3apRFQPCg//TmwwCfaEbX
J4uabn186nqgbuyr+7c8lq8=
=bnG2
-END PGP SIGNATURE-


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Memory management - another proposal.

2006-01-09 Thread Keith Whitwell

Ian Romanick wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Keith Whitwell wrote:



To clarify, a naive (unoptimized) usage pattern would be:

   LOCK_HARDWARE()
   bboffset = ValidateBuffer( i915-backbuffer )
   vboffset = ValidateBuffer( i915-vertex_buffer )
   tex0offset = ValidateBuffer( i915-tex_buffer[0] )

   if (!bboffset || !vboffset || !texoffset)
   FALLBACK_PATH()
   else {
   emit_state_packets();
   emit_3d_prim();
   SetFence()
   }
   UNLOCK_HARDWARE()



There is a problem with this design.  You really need to validate a set
of buffers rather than one buffer at a time.  Imagine the case where
those 3 buffers will exactly fill all of memory.  Placing the first
buffer incorrectly will make it impossible to place the other buffers
without invalidating the first.


Yes, I gloss over this a little.  The intention is that Validate buffers can 
fail once and the process is required to start again - giving the manager a 
chance to start from scratch -- it's a little kludgey.  Alternately you could of 
course wrap that kludginess up in a single function which the driver passes a 
list of buffers.



Other than that, this is pretty similar to what anholt, keithp, and I
came up with a few weeks ago.  I just haven't started banging away at it
yet.  I found it very easy to be distracted from coding projects over
the holiday. :)  I believe that we used the name CommiteBuffers
instead of ValidateBuffer, but the idea is the same.


Oh...  Sorry I missed out that conversation...  Should I have been listening 
somewhere?



I've also come around to just putting everything in the kernel.  I've
been convinced (finally) that the perceived benefits of having the
actual allocator in user mode (avoiding some trips into the kernel,
having the allocation policy in user mode) don't outweigh the problems
that it creates.


I do a little bit to avoid kernel trips with the conditions where an offset from 
ValidateBuffer can be reused.  In steady-state rendering, there should be no 
need to call into the kernel at all.





In the event of defragmentation of the address space, buffers
validated previously within a Locked region may become invalid and
validation of buffers will have to restart.  I'll gloss over that for
now.



I'm not 100% sure what you mean by this.  Are you saying that you'd have
to loop on the ValidateBuffer calls until some steady state is achieved?
 In the worst case, that loop would never terminate. :(


This is all about the single buffer parameter vs. a list.  I think you've 
convinced me that a list is a better api.



If you know at CommitBuffers which buffers are needed, you can try to
trivially commit the buffers (i.e., don't shuffle buffers or kick
anything out).  If that fails, you shuffle buffers and kick buffers out
of memory until there is a free block large enough to hold all of the
required buffers.  Then repeat the trivial commit (it *must* succeed at
this point).


That's exactly it.

Keith


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Memory management - another proposal.

2006-01-08 Thread Keith Whitwell

I've been thinking about memory management again, and decided to try a
fresh start.  This is a distillation of my thoughts based on two
premises:

1) Use of an ARB_vbo derived API and semantics.
2) An initial implementation focus on an AGP-only manager.

In this email I try and outline the API and its consequences which
should be fairly hardware independent, and apply to both AGP and Video
RAM managers.  In a followup email I'll outline an AGP implementation
which deviates sharply from the current usage of AGP in the DRI and
also from what is possible with Video RAM.

API: ARB_vbo and NV_fence
-

So, as far as an API goes, I propose that we adopt terminology and
concepts from the ARB_vbo buffer objects as that design pretty well
encapsulates the constraints we're facing.

These buffers should defined in such a way that they can be shared
between clients for PBuffer implementation, and reference counted for
automatic freeing if clients exit uncleanly.

The calls would look something like:

CreateBuffer(GLuint size)
BufferData(void *ptr)
BufferSubData(...)
MapBuffer()
UnmapBuffer()
DestroyBuffer()

Semantics are more or less what we expect from the ARB buffer objects.
I'm only doing things at a handwaving level here, but I would expect
many of the subtleties of ARB_vertex_buffer_object to have a place
here too - so for instance calling BufferData with a NULL pointer
would be a shorthand for discarding the current buffer contents.

I'd like to keep sizes constrained a bit at this point, hence the
specification of size in CreateBuffer().  But that may change to more
closely match ARB_vbo semantics.

In addition, the following call is needed for the driver to validate
the buffer contents are usable, and get the current offset:

GLuint ValidateBuffer( struct buffer * )

This returns the AGP offset of the buffer, uploading it if necessary,
and ejecting existing video/agp memory contents, defragmenting memory
space or any other operation as necessary.

Finally, there is a need to synchronize memory management with the
progress of the hardware through its command queues.  In the spirit of
co-opting hardware-inspired GL extensions, I propose we look closely
at NV_fence for this functionality:

   SetFence()
   TestFence()
   FinishFence()

There would need to be some plumbing behind the scenes to associate
fences with in-use buffer objects, and tricks to improve the
performance of fences beyond the obvious implementation of an IRQ per
SetFence() call.

So far, things look fairly uncontroversial.  And just as complex as
ever, with all the unanswered questions about eviction, etc.  However,
a couple of constraints help simplify the general problem
considerably:

1) ValidateBuffer requires the lock to be held.

2) To improve performance, the offset returned by ValidateBuffer
   remains valid (even after unlocking) until one of two events
   occurs:

a) a subsequent call to ValidateBuffer() returns an error code
indicating that memory has been re-shuffled.

b) a subsequent contended Lock() indicates by a flag in the
SAREA that memory has been re-shuffled.

  Thus steady state rendering will not require calls to
  ValidateBuffer or its associated ioctl.

3) There is no facility for pinning graphics memory outside locked
   regions.  While the pointer returned by MapBuffer is valid
   outside locked regions, the assumption should be that this is
   because the buffer is not resident in graphics memory but has
   been created or back-copied to host memory.  Life is a lot
   easier for the memory manager if there is no way for
   unpriviledged clients to pin graphics memory.  (See below for
   more).

To clarify, a naive (unoptimized) usage pattern would be:

LOCK_HARDWARE()
bboffset = ValidateBuffer( i915-backbuffer )
vboffset = ValidateBuffer( i915-vertex_buffer )
tex0offset = ValidateBuffer( i915-tex_buffer[0] )

if (!bboffset || !vboffset || !texoffset)
FALLBACK_PATH()
else {
emit_state_packets();
emit_3d_prim();
SetFence()
}
UNLOCK_HARDWARE()


In the event of defragmentation of the address space, buffers
validated previously within a Locked region may become invalid and
validation of buffers will have to restart.  I'll gloss over that for
now.


Vertex Buffers
--

Note that there is quite a bit of handwaving going on regarding the
vertex buffer.

Specifically: Is it mapped into the GART?  How do vertices for the
next lot of primitives get into it?

While the lock is not held, we cannot guarantee that buffers won't be
shuffled in memory by other contexts, so we cannot guarantee you can
write to the AGP memory when the lock isn't held.

If necessary, I propose pinning a fixed size region and running the
old DRM DMA buffer mechanism out of that region, rather than allowing
clients to create small pinned