Jeff Hartmann wrote:
That works. It should also be possible to have it vary its size depending on the amount of memory to be managed.That may not be possible. Right now the blocks are tracked in the SAREA, and that puts an upper limit on the number of block available. On a 64MB memory region, the current memory manager ends up with 64KB blocks, IIRC. As memories get bigger (both on-card and AGP apertures), the blocks will get bigger. Also right now each block only requires 4 bytes in the SAREA. Any changes that would be made for a new memory manager would make each block require more space, thereby reducing the number of blocks that could fit in the SAREA.Even if we increase the size of the SAREA, a system with 128MB of on-card memory and 128MB AGP aperture would require ~65000 blocks (if each block covered 4KB).Don't worry too much about this, we can create an entirely new SAREA to hold the memory manager. It can also be rather large, I'm thinking about 128KB or so wouldn't be a problem at all. This will be non swappable memory, but thats not too big a deal. Here is what I'm thinking of as the general block format right now, it might not be perfect:
[code segment snipped]
I don't think having an age variable in the shared area is necessary or sufficient. That's what my original can-swap bit was all about. Each item that is in a block would have its own age variable / fence. When all of the age variable / fence conditions were satisfied, the can-swap bit would be set.struct memory_block { u32 age_variable; u32 status; }; Where the age variable is device dependant, but I would imagine in most cases is a monotonically increasing unsigned 32-bit number. There needs to be a device driver function to check if an age has happened on the hardware.
That's interesting. I hadn't considered having kernel intervention to actually page out blocks. I had alway been on the assumption that all blocks in AGP or on-card memory were either locked or throw-away.The status variable has some room, only the bottom 28-bits are defined at the moment. The first 4 bits are some status bits. If BLOCK_CAN_SWAP is set, we can swap this block, swapping requires the driver to call the kernel to swap out this block using some agp method where the contents are preserved. Can be accomplished by card DMA. If BLOCK_LINKS_TO_NEXT is set we are part of a group of blocks, which must be treated as a unit. If BLOCK_CAN_BE_CLOBBERED is set, the driver can just overwrite this block of memory. If BLOCK_IS_CACHABLE is set we can readback from this block in a fast way, so fallbacks can directly use this block.
Just like with regular virtual memory, I think we only need to "page out" pages that we're going to use. I don't think we should need to page out an entire set of linked pages. Initially we may want to, though. It wouldn't help much with on-card memory, but with AGP memory (where we can change mappings), we should be able to do some tricks to avoid having to do full re-loads. It's also possible that only a subset of the blocks belonging to an object will have been modified.
Perhaps what we really need to know for each block is:
1. Is the block modified (i.e., by glCopyTexImage)?
2. What pages in system memory back the block? That is, where are the parts of the texture in system memory that represent the block in AGP / on-card memory?
Hmm...starts to fell like a regular virtual memory system...
> The BLOCK_LOG2 stuff is
That's probably finer granularity than we need. We could probably get away with "empty", "mostly empty", "half full", "mostly full", and "full". Admittedly, that only saves one bit, but it removes the 64KB limit.a way to pack the usage of this block of memory in just a few bits. We pack log2 - 1, where we only accept usages of 2 bytes or more. Using 2 bytes could be considered empty. We can store upto block usage sizes of 64k in this manner. I think that we want 64kb to be our maximum size for a block.
One thing this is missing is some way to prioritize which blocks are to be swapped out. Right now the blocks are stored in a LRU linked list, but I don't think that's necessarilly the best way (the explicit linked list) to go.
The bits 27:8 would be a 20-bit number representing a block id. Each one would be unique, so the driver could keep track of what blocks represent a texture. A 20-bit number should be sufficent, since that gives us like 2 million values to work with.
>
Okay. There's a few details of this that I'm not seeing. I'm sure they're there, I'm just not seeing them.This is a pretty good start for a block format I think. We want to make the memory management SAREA have a lock of its own, shouldn't be a big deal to extend the drm to provide us with one. Or perhaps we use the normal device lock when we do any management, I haven't decided yet. There are some issues to really think about here. This sort of implementation needs the kernel to be able to swap out a block from agp memory. The kernel should reserve a portion of the agp aperture for this purpose. Probably on the order of 2-4 MB. Each allocation of the agp aperture should be no smaller then 1MB in size, to prevent agpgart from having to deal with too many blocks of memory. It will also have to be no smaller then the agp_page_shift, in case someone is using 4MB agp pages. The kernel will blit with a card specified function the designated block from its current position to its final position in the block of agp memory to be swapped. When the ENTIRE block is full, then the kernel will call agpgart to swap that region out of the agp aperture. The kernel will keep track of what each swapped out block contains in some manner, or might brute force scan the shared memory area containing the swapped out blocks.
Process A needs to allocate some blocks (or even just a single block) for a texture. It scans the list of blocks and finds that not enough free blocks are available. It performs some hokus-pokus and determines that a block "owned" by process B needs to be freed. That block has th BLOCK_CAN_SWAP bit set, but the BLOCK_CAN_BE_CLOBBERED bit is cleared.
Process A asks the kernel to page the block out. Then what? How does process B find out that its block was stolen and page it back in?
The conversion would probably be better. It would also play nice with ARB_vertex_array_objects.There will be a non backed shared memory area that contains all the swapped out pages, the swapped pool it probably a good thing to call it. Basically its a shared memory area, of say 1MB in size that doesn't have any pages backing it. It will have a kernel no page function that populates it if needed. Basically it will only have information in it if things are swapped out of the aperture. There needs to be a kernel function which moves a block of memory into cacheable space. We could do with with PCI dma, or some magic conversion of unbound agp pages. This could be made safe, and wouldn't be a big deal with the new agpgart vm stuff. That way the block of agp memory could be accessed by a fallback or some other function that needs to directly read the texture. Readback from normal agp memory is horrible, something on the order of 60MB/sec.
Also, how does this all work without AGP? There still are a fair number of PCI cards out there. :)
A lot of this is also very Linux specific. What can we do to make as much of this as possible OS independent? I don't think our BSD friends will be very happy if we leave them in the cold. :) Linux is most people's first priority, but it's not the /only/ priority...
-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com - A 128-bit supercerts will
allow you to extend the highest allowed 128 bit encryption to all your clients even if they use browsers that are limited to 40 bit encryption. Get a guide here:http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0030en
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel