I'm making good progress on the syslink code. The system call keeps morphing but in the next commit it should settle down.
I ran into a bit of a problem with the 'syslink route node' concept when I decided I wanted to be able to attach a UDP broadcast socket to a syslink route node. Originally the concept was to attach individual connections only, but I realized that it made little sense not to take advantage of hardware-switched broadcast mechanisms since the leafs of most clusters were likely to be operating over switched networks. Attaching a broadcast socket to a syslink route node requires an aligned subset of the route node's address space to be associated with the socket (in order to represent the network's subnet space and be able to directly map some of the address bits to the IP space). After some fooling around I decided to adapt the SUBR_BLIST code to the task and have committed SUBR_ALIST as a result. This will allow the syslink route node to trivially attach mixed-size subnets (down to single entities). I hope to have the syslink mesh infrastructure done in about another week or so and will then start working on the protocols and kernel device/filesystem namespace interfacing. -- SUBR_BLIST is a bitmap allocator which I originally developed for FreeBSD which manages swap space allocations. It is implemented using a radix tree with hinting. SUBR_ALIST is the same thing, but designed for power-of-2 allocations and guarentees alignment to the allocation size. e.g. a 4KB allocation would be aligned to a 4KB boundary. It turns out the SUBR_ALIST has other interesting characteristics that make it suitable for the cluster filesystem design. And, in fact, the design characteristics also make the ALIST allocator suitable for managing physical memory, particularly in its ability to allow us to cut up memory into chunks which are properly aligned no matter what the page size used. It would be possible to manage page segment mappings entirely dynamically if we wanted to go that direction (not on my list, though). -- The current filesystem design revolves around the filesystem layer, HAMMER, and the storage layer, ANVIL. ANVIL will be responsible for very large block reservations (e.g. 1MB to 1TB 'blocks' or even larger). These blocks will be self-identifying and assigned to filesystems and/or logical block devices. The key feature of ANVIL will be that since these blocks are self-identifying, they can be moved across to different media without upper layers knowing or caring. For example, you could construct a filesystem based on ANVIL blocks you allocate from a single disk drive, and on a live system migrate some or all of those ANVIL blocks to another disk drive. You could physically move a disk drive from one machine to another and the cluster mesh would recognize it, no copying or reconfiguring required. In order to properly detect ANVIL blocks on physical media, the blocks must be aligned in a way that a scan of the physical media is able to locate them without any other knowledge. This will be accomplished by doing a power-of-2 scan. For example, a 256GB hard drive would be scanned by checking the block at the 128GB mark, then all the 64GB marks, and so on to locate the ANVIL block headers on the media and build an allocation/management map in kernel memory. In addition, it will be possible to allocate (and deallocate) ANVIL blocks on the fly (for example, to resize a filesystem). Just like cutting up an address space into variable-sized subnets, ANVIL blocks can also be variable sized as long as they adhere to the alignment requirements. The ANVIL layer will scan available physical media for ANVIL blocks and piece them together to higher level entities such as filesystems. One of the really nice things about ANVIL is that it won't care where it gets the pieces from. It turns out that the ALIST allocator is the perfect solution to this problem because the ALIST alignment requirements are precisely the same as the ANVIL alignment requirements. For example, lets say we had a 1TB hard drive and wanted to allow 1GB-1TB ANVIL blocks to be allocated out of it. The ALIST allocation map would thus have to support up to 1024 1GB blocks. Out of this 1TB drive we might allocate, say, a few dozen 1GB blocks, a few dozen 16GB blocks, maybe a large 128GB block, and so forth. The ALIST allocator would be used to do this and eat up only around 384 bytes of kernel memory to manage the 1024 'block' space. -- For HAMMER -- the filesystem design I am working on, I am looking into being able to use ALISTs for extent management within the filesystem. This research is very early, though, and I don't know how useful ALISTs will be in HAMMER due to the time domain snapshot mechanisms I want to implement. -Matt Matthew Dillon <[EMAIL PROTECTED]>