Jeff Hartmann wrote:

That may not be possible.  Right now the blocks are tracked in the
SAREA, and that puts an upper limit on the number of block available.
On a 64MB memory region, the current memory manager ends up with 64KB
blocks, IIRC.  As memories get bigger (both on-card and AGP apertures),
the blocks will get bigger.  Also right now each block only requires 4
bytes in the SAREA.  Any changes that would be made for a new memory
manager would make each block require more space, thereby reducing the
number of blocks that could fit in the SAREA.

Even if we increase the size of the SAREA, a system with 128MB of
on-card memory and 128MB AGP aperture would require ~65000 blocks (if
each block covered 4KB).
	Don't worry too much about this, we can create an entirely new SAREA to
hold the memory manager.  It can also be rather large, I'm thinking about
128KB or so wouldn't be a problem at all.  This will be non swappable
memory, but thats not too big a deal.  Here is what I'm thinking of as the
general block format right now, it might not be perfect:
That works. It should also be possible to have it vary its size depending on the amount of memory to be managed.

[code segment snipped]

struct memory_block {
	u32	age_variable;
	u32	status;
};

	Where the age variable is device dependant, but I would imagine in most
cases is a monotonically increasing unsigned 32-bit number.  There needs to
be a device driver function to check if an age has happened on the hardware.
I don't think having an age variable in the shared area is necessary or sufficient. That's what my original can-swap bit was all about. Each item that is in a block would have its own age variable / fence. When all of the age variable / fence conditions were satisfied, the can-swap bit would be set.

	The status variable has some room, only the bottom 28-bits are defined at
the moment.  The first 4 bits are some status bits.  If BLOCK_CAN_SWAP is
set, we can swap this block, swapping requires the driver to call the kernel
to swap out this block using some agp method where the contents are
preserved.  Can be accomplished by card DMA.  If BLOCK_LINKS_TO_NEXT is set
we are part of a group of blocks, which must be treated as a unit.  If
BLOCK_CAN_BE_CLOBBERED is set, the driver can just overwrite this block of
memory.  If BLOCK_IS_CACHABLE is set we can readback from this block in a
fast way, so fallbacks can directly use this block.
That's interesting. I hadn't considered having kernel intervention to actually page out blocks. I had alway been on the assumption that all blocks in AGP or on-card memory were either locked or throw-away.

Just like with regular virtual memory, I think we only need to "page out" pages that we're going to use. I don't think we should need to page out an entire set of linked pages. Initially we may want to, though. It wouldn't help much with on-card memory, but with AGP memory (where we can change mappings), we should be able to do some tricks to avoid having to do full re-loads. It's also possible that only a subset of the blocks belonging to an object will have been modified.

Perhaps what we really need to know for each block is:

1. Is the block modified (i.e., by glCopyTexImage)?
2. What pages in system memory back the block? That is, where are the parts of the texture in system memory that represent the block in AGP / on-card memory?

Hmm...starts to fell like a regular virtual memory system...

> The BLOCK_LOG2 stuff is
a way to pack the usage of this block of memory in just a few bits.  We pack
log2 - 1, where we only accept usages of 2 bytes or more.  Using 2 bytes
could be considered empty.  We can store upto block usage sizes of 64k in
this manner.  I think that we want 64kb to be our maximum size for a block.
That's probably finer granularity than we need. We could probably get away with "empty", "mostly empty", "half full", "mostly full", and "full". Admittedly, that only saves one bit, but it removes the 64KB limit.

One thing this is missing is some way to prioritize which blocks are to be swapped out. Right now the blocks are stored in a LRU linked list, but I don't think that's necessarilly the best way (the explicit linked list) to go.

	The bits 27:8 would be a 20-bit number representing a block id.  Each one
would be unique, so the driver could keep track of what blocks represent a
texture.  A 20-bit number should be sufficent, since that gives us like 2
million values to work with.
>
	This is a pretty good start for a block format I think.  We want to make
the memory management SAREA have a lock of its own, shouldn't be a big deal
to extend the drm to provide us with one.  Or perhaps we use the normal
device lock when we do any management, I haven't decided yet.  There are
some issues to really think about here.

	This sort of implementation needs the kernel to be able to swap out a block
from agp memory.  The kernel should reserve a portion of the agp aperture
for this purpose.  Probably on the order of 2-4 MB.  Each allocation of the
agp aperture should be no smaller then 1MB in size, to prevent agpgart from
having to deal with too many blocks of memory.  It will also have to be no
smaller then the agp_page_shift, in case someone is using 4MB agp pages.
The kernel will blit with a card specified function the designated block
from its current position to its final position in the block of agp memory
to be swapped.  When the ENTIRE block is full, then the kernel will call
agpgart to swap that region out of the agp aperture.  The kernel will keep
track of what each swapped out block contains in some manner, or might brute
force scan the shared memory area containing the swapped out blocks.
Okay. There's a few details of this that I'm not seeing. I'm sure they're there, I'm just not seeing them.

Process A needs to allocate some blocks (or even just a single block) for a texture. It scans the list of blocks and finds that not enough free blocks are available. It performs some hokus-pokus and determines that a block "owned" by process B needs to be freed. That block has th BLOCK_CAN_SWAP bit set, but the BLOCK_CAN_BE_CLOBBERED bit is cleared.

Process A asks the kernel to page the block out. Then what? How does process B find out that its block was stolen and page it back in?

	There will be a non backed shared memory area that contains all the swapped
out pages, the swapped pool it probably a good thing to call it.  Basically
its a shared memory area, of say 1MB in size that doesn't have any pages
backing it.  It will have a kernel no page function that populates it if
needed.  Basically it will only have information in it if things are swapped
out of the aperture.

	There needs to be a kernel function which moves a block of memory into
cacheable space.  We could do with with PCI dma, or some magic conversion of
unbound agp pages.  This could be made safe, and wouldn't be a big deal with
the new agpgart vm stuff.  That way the block of agp memory could be
accessed by a fallback or some other function that needs to directly read
the texture.  Readback from normal agp memory is horrible, something on the
order of 60MB/sec.
The conversion would probably be better. It would also play nice with ARB_vertex_array_objects.

Also, how does this all work without AGP? There still are a fair number of PCI cards out there. :)

A lot of this is also very Linux specific. What can we do to make as much of this as possible OS independent? I don't think our BSD friends will be very happy if we leave them in the cold. :) Linux is most people's first priority, but it's not the /only/ priority...



-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will
allow you to extend the highest allowed 128 bit encryption to all your clients even if they use browsers that are limited to 40 bit encryption. Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to