RE: [Dri-devel] The next round of texture memory management...

Jeff Hartmann Sun, 19 Jan 2003 22:10:23 -0800

>
> The potential problem is there are somethings that can't be tracked by a
> simple "age."  The one thing I can think of is back-buffers.  An
> application might have several buffer-swap operations that are blocked
> waiting for a certain vertical blank number.  There could be other
> rendering operations that are sent after the buffer-swap that will
> complete BEFORE the blit for the buffer-swap is queued.  I can't see a
> reasonable way to assign an age to those back-buffers.
>
> Since this is the only case I can think of, there may be a different way
> to handle it.


        Well then it looks like storing the type of memory is important and
something we need to look at I guess.

>
> > 2. We consider the block or group of blocks as an entire
> "unit", everything
> > is done on units, not individual pieces of the blocks.  That
> prevents people
> > swapping out the first page of a group of textures and someone having to
> > wait for just that block to come back.
>
> I believe that the block should be unit used.  If each block has a group
> ID (the IDs that you talk about below) and a sequence number, we can do
> some very nice optimizations.  Imagine a case where we have two textures
> that use 51% of the available texture space.  Performance would DIE if
> we had to bring in the entire texture every single time.  We can do a
> little optimization and only bring in 2% of texture memory each time
> instead of 102%.

        Just a slight comment here, if the memory has actually made it out of the
agp aperture no matter how big the page is the cost of getting it back is
the same.  Course I really like the idea of storing sequence numbers though,
gives room for lots of flexibility.  And for situations where we haven't
fully made it out of the agp aperture, the cost of the blit is much smaller.

> As we get a bit father along we'll need to decide exactly what
> information we want to store with each block to help make swap-out
> decisions.
>
> We could let each process make that descision.  With each block it
> stores three values.  The first value is the cost of restoring a block.
>   The second is the normalized probability that the block will be needed
> during the current frame, and the third is the probability that will be
> needed in the next frame.  Values of zero for the probability mean "I
> don't know."  The cost value could probably be inferred from the status
> bits and the fullness value.
>
> With these values it becomes pretty simple for the kernel to select
> candidate blocks to reclaim.

        Just a small clarification here, the decision should be done in user space,
you tell the kernel to swap such and such block out.  The kernel has no
brains when it comes to memory managing decisions.  The reasons for this are
two fold, kernel code tends to not change as easily as user space code, and
the whole initial reason of not wanting to do too much complex stuff in the
kernel.

> I don't think having a clean-up thread will be very helpful.  When there
> is activity happening, it will likely be happening at a very high rate.
>   I think that the thread would either sleep all the time (with nothing
> to do) or be continuously woken up on-demand.

        After thinking about this a little more I think your right, that was a
bogus idea.

> One their own (a) and (b) are too simple.  That won't give us enough
> functionality to do all the things we want.  The memory bit-map has the
> advantage that we can pick the largest free block and try to reclaim
> memory around that free block until it is large enough to satisfy the
> request.  After having tried to do that with the existing linked list
> approach, I can honestly say that a linked list is a very poor
> datastructure for that purpose.
>
> On the flip side, the bit-map doesn't store much useful information
> about the allocated blocks.  That makes it difficult to select which
> memory to reclaim when memory is very full.
>
> The problem is that BOTH of these situations are important performance
> cases!

        Okay I have done some thinking, and I think I have a pretty good solution.
Under normal cases we try and get a completely unused portion of memory by
using the bit vector freelist.  If one is unavailable and we can't grow agp
memory for some reason we fall back onto another data structure, which is a
priority heap (priority queue or binary heap depending on the data
structures book you look at.)
        That should have good performance and makes more sense then a linked list I
think, as far as size is concerned.  We could implement a heap with just an
array of index values into the pagelist.  We might get some vampiric(sp?)
performance issues from just about every operation on the heap being log n.
However our heap should be of a small enough height so this won't make too
much a difference over a linked list.  We should make this a min style heap,
where age is the value used for the key in the heap.  That way the top of
the heap is always the texture block or group with the minimum age (longest
since it was last used by the card).  We want to avoid ever doing a search
of the heap, so we store the index into the heap with the pagelist.  When we
rearrange something in the heap we make sure we update these indices.
Perhaps we want to do a different selection then on longest since last use,
but I think that might give us reasonable performance.  I suppose MRU would
be trivial if its a max heap, so that sort of thing would be pretty easy to
implement with this data structure as well.  I know you get different
answers depending on who you ask, but whats the replacement strategy you
would recommend?
        Btw, in case there is any doubt we are storing only used blocks in the bit
vector.
        Also by storing the index of where a block is in the heap if we want to
quickly go from the freelist bit vector to a the position in the heap its a
very simple operation.

>
> > 3. We want an easy method to grow the memory backing an agp
> pool, but also
> > some sort of per client restriction, perhaps just the system wide
> > restrictions will do?  This should be solved by the agp
> extension proposal I
> > made earlier in the week.
>
> What are the "system wide restrictions"?  Available memory?

        Well there are only so many pages of memory that can be physically
allocated.  We need to limit how much a client will request so as not cause
performance of the entire system to go in the toilet.  agpgart already has
some limitations on what it will allow to be allocated and that limit might
be enough.  We might want to allow each client to only add upto 8MB to the
aperture or something like that as well, but those sort of restrictions need
some thinking about.

>
> > 4. We want a simple way to determine if an age allows us to do
> something to
> > a texture which has the BLOCK_CAN_BE_CLOBBERED bit set, storing
> the last age
> > used on a block is all that should be required I think.
>
> For textures and vertex buffers this should be true.
>
> [snip]
>
> >>Okay.  There's a few details of this that I'm not seeing.  I'm sure
> >>they're there, I'm just not seeing them.
> >>
> >>Process A needs to allocate some blocks (or even just a single block)
> >>for a texture.  It scans the list of blocks and finds that not enough
> >>free blocks are available.  It performs some hokus-pokus and determines
> >>that a block "owned" by process B needs to be freed.  That block has th
> >>BLOCK_CAN_SWAP bit set, but the BLOCK_CAN_BE_CLOBBERED bit is cleared.
> >>
> >>Process A asks the kernel to page the block out.  Then what?  How does
> >>process B find out that its block was stolen and page it back in?
> >
> > Okay here is how I think things could happen:
> >
> > I want to page the block out, I request to the kernel to return
> when this
> > list of pages that I give you have been swapped out and are
> available.  If
> > the kernel can immediately process this request, do it and
> return, if I have
> > to do some dma put the client on a wait queue to be woken up when it
> > happens.
> >
> > The kernel goes ahead and updates the blocks in the SAREA
> saying that they
> > aren't there (marking their id's as zero perhaps)
> >
> > Process B comes along an sees its textures aren't resident and
> needs them,
> > it asks the kernel to make them resident somewhere, it doesn't
> care where.
> > It passes some ID's to the kernel and asks the kernel to make
> them resident.
> > The kernel puts the process on a waitqueue or returns in a
> similar fashion
> > to the first request.
>
> Up to this point, I follow you.  Here is my problem.  Say we have 16KB
> blocks.  Say process B had a single block that had 4 vertex buffers in.
>   The first 3 are 1KB and the last one is 2KB (and hangs over into the
> next block).  This first block is the one that process A selected to
> swap-out.
>
> Is the ID number assigned to the first block (the one that was
> swapped-out) and the second block (that wasn't swapped-out) the same?
> It seems like it should be (and that would enable the kernel to shuffle
> things around and keep blocks contiguous), but it seems like it would be
> difficult (or at least irritating) to keep all the block IDs correct (as
> subregions in the blocks are allocated and freed by a process).
>
> When process B goes to sleep waiting for its blocks to come back, what
> locks will it hold?  If it doesn't hold any, how do we prevent process A
> from coming back and stealing the second block?

        We don't need to hold any locks when we sleep, but since we have to ask the
kernel for the block back it can decide who to wake up and who to put to
sleep.  We can just wake in a FIFO fashion from the wait queue.  We can also
perhaps make a simple decision in the kernel.  If it sees that there is alot
of contention on one area then it could "recommend" another area in some
fashion to put its textures.

        Here could be an easy way to accomplish this I think:  We have an ioctl
which pages in a block id, not just a single page.  It writes back to user
space the page number where the block set now lives.  That way if the kernel
sees ALOT of contention for an area of memory it might try to put it
somewhere else.  However perhaps we could do something like this in user
space too......  Actually now that I think about it, the less the kernel
does to mess up user space the better.  I still like the whole idea of
managing everything by the block id though, and keeping sequence numbers in
the blocks in some fashion.

        Perhaps that whole idea of letting the kernel recommend putting something
somewhere else is completely bogus.  I was tempted to completely erase it,
but I leave it this email in case it is good fodder for discussion.

>
> > Whenever we get the lock with contention we must do some sort of quick
> > scanning.  We might want to speed up the process somehow,
> perhaps some sort
> > of hashing by texture number to a dirty flag.  Actually that is
> probably the
> > best implementation.  If we reserve 64k of address space to be our dirty
> > flags (backed only when accessed) we can make the dirty flags a
> bit vector.
> > Considering the texture or block id as an index into this vector we can
> > rapidly find out if our list of textures has been "fooled" with.  This
> > prevents us from scanning the entire list, which could be slow.
>
> That would be easy enough.  When a process wants to issue rendering it
> would grab the lock, check the bits for each of the objects (textures,
> vertex buffers, etc.) it wants to use.  It would test the bit.  If the
> bit is set, that means "partially not here."  The process would then
> check the blocks that actually map to the object it wants to use.  In
> the memory lay out above, it process B wants to render from one of the
> 1KB vertex buffers, it only has to make sure that the first block is
> paged in.  If the second block is out, it doesn't matter.  It then
> issues the rendering command, updates the fence, releases the lock.
>
> > I should also point out that the id's will be reused.  We will always
> > attempt to use the smallest id available for use.  This way
> using it as an
> > index into a shared memory area isn't so bad.  That way we
> avoid using lots
> > of memory for nothing when we only have a few texture blocks.
>
> How would we dole out IDs?  Would there be a kernel call to get /
> release a set of IDs?

        Probably, however there are methods to do this without kernel intervention.
The good ole' bit vector is great for doling out keys. ;)  Thats how agpgart
handles its keys actually.  Oah btw did I mention how much I like bit
vectors. ;)  Some little bit of wisdom about everything looks like a nail
when you walk around with a hammer immediately comes to mind. ;)

-Jeff



-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

RE: [Dri-devel] The next round of texture memory management...

Reply via email to