I believe the design had even simplified beyond what's written below. Basically, there are just two types of instructions in the log: allocate and free. "Steal" is just a special case of "free" (but only for the context whose memory is stolen!), and I think I worked around needing the other opcodes.
---- Begin old message ---- I've spent the last week and a half working on the client-side portion of texmem-0-0-2. I've completed most of it, the last 10% has proven to be more than the usual 90% the work.
The problem is centered around the way block IDs and sequence numbers work. There are a huge number of cases where part of a sequence of blocks is paged out (imagine the middle block of a sequence of 3 is paged out, and the data that took its place is pinned) or replaced. Part of the problem is that, by necessity, multiple object (textures, vertex arrays, etc.) can be stored in a single block. That complicates juggling things around when blocks are paged in. Every time I found a solution for one set of cases, I found two more. All the while the code got more and more complicated.
Finally, today, I had enough. When an implementation gets too complicated, the design must be flawed.
I looked at the design from a high level. Basically, the current design tries to implement a paged MMU in software. Organizing the "global" store as blocks, paging blocks in and out one at a time, and shuffling the blocks around to fit are necessary design choices for a hardware implementation, but maybe aren't the best choices for a software implementation. After having that epiphany, I began to look elsewhere for a design model.
I cruised around the net, mostly randomly, until I came to freshmeat.net. I came across a release announcement for JFS, and randomly thought about how journaling filesystems work. I worked through a few ideas in my head, and I came to a new plan. :)
One of the main design points in the existing texture manager and the originally-proposed texmem-0-0-2 design is that the shared memory area is fixed in size. Having a small, fixed SAREA forces a block oriented design. Why does the SAREA have to be fixed size?
The idea is to have two shared memory areas. The first area is a log (or is it a journal? I can never remember which is which) where a process inserts "instructions" for each allocation, free, or memory-steal that it makes. When a process wants to make an memory operation it replays the log then performs that operation. After performing the operation it adds it to the log.
The second SAREA is a snap-shot of the state of memory. This is only used in two cases. It is used to initialize state when a new process opens a GL context. The other case is when a process gets too far behind in processing the log. Basically, if a process detects that the oldest entry left in the log is newer than the last entry it processed, it has to re-initialize its state.
That's the 10,000 mile (or 16,093.4 km, if you prefer) view.
The log would be small, probably two pages. There would be 5 types of instructions. Each instruction would be one or two words. The low order 3 bits would specify the instruction. The remainder would be the data.
1. Op-code 000: Allocate a block of memory. The remainder of the first word is the size of the allocation. The second word is the offset of the allocation.
2. Op-code 001: Free a block of memory. The remainder of the first word is the size of the memory block to free. The second word is the offset of the free.
3. Op-code 010: "Steal" a block of memory from another process. The remainder of the first word is the size of the memory block to steal. The second word is the offset of the stolen block.
4. Op-code 011: Add a page to the other shared area. The remainder of the first word is the address of the page.
5. Op-code 100: Remove a page from the other shared area. The remainder of the first word is the address of the page.
In addition to those two pages, there would be an additional shared page that would contain a 64-bit counter of the log position. The low bits of the counter specify the position of the next entry in the log to fill. Each process can then compare the global counter with its local copy to determine when the log has wrapped around. This additional page also contains the base pointer(s) for the other shared area.
There would probably also be a counter that represents the last update of the second SAREA. This would prevent processes from having to block on the mutex to update the global memory state.
The second SAREA is a linked list of memory puddles. The core of each memory puddle is an array of 'sizes'. A region in the puddle is free if the size is positive, and the region is allocated if the size is negative. The array is a buffer-gap data structure, common to text editors. Entries in the "gap" are zero. I've done some experiments, and this makes allocations, frees, and block coallescing very fast. It also just BEGS for SIMD optimization. ;) Sample data structures are below.
This would also change the kernel interfaces. Instead of paging out blocks, ranges are paged out. Each range is tagged with a unique ID, which is stored in the puddle. The only part that is still in question is how to page out and track parts of a range.
typedef struct node { struct node * succ; struct node * pred; } node_t;
#define CACHE_LINE_SIZE 8
/* This structure needs to be exactly 4 cache lines. */ struct puddle { node_t linkage; int gap_start; int gap_end; CARD32 base_offset; CARD32 pad[CACHE_LINE_SIZE - 5];
CARD32 fence[CACHE_LINE_SIZE]; CARD32 size[CACHE_LINE_SIZE]; struct { CARD16 id; CARD16 flags; } status[ CACHE_LINE_SIZE ]; };
That's the 1,000 mile (or 1,609.3 km, if you prefer) view.
The good news is that most of the code that I've already written (>80%) could carry over to this new design. I wrote a nice cache/slab allocator that could be used to manage allocating puddles from pages in the SAREA. I wrote a nice buffer-gap based puddle manager to that could be used for the rest.
---- End old message ----
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
--
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel