Hi, i've submitted a proposal for gsoc on melange, but i'm reposting it here since it looks easier to get feedback than on the web interface.
HAMMER2 ------- HAMMER2 is a file system developed by Matthew Dillon for DragonFlyBSD. The main purpose of this new filesystem is to support replication clustering and the core design is heavily influenced by this goal, but the filesystem does support a lot of features which are useful in a single node context. The goal of this proposal is to port a *subset* of HAMMER2 to OpenBSD and stabilize the port in a single node context. Supporting clustering is out of scope of this project. Its primary features are copy on write writable snapshots, on-the-fly compression (currently only LZ4 and zeros suppression). The data integrity is checked with per block CRC. Offline (safe) and online (subject to hash collision issues) block de-duplication and multiple disks support are planned but not yet implemented. One of the strength of this filesystem is that despite the huge number of features it implements the code is still comparatively simple with regards to the others existing filesystem in the segment (the number of lines of code is in the same order than UFS at this time) About porting ------------- One might consider that it is too early to port the filesystem now and it is better to wait for it to be finished. But porting early before the filesystem is used in production can be a win too because it is still possible to make architectural changes to improve portability. Moreover, porting the filesystem to another operating system might help to attract new developers to work on specific part of the filesystem like deduplication or the copy mechanism and speed up its development. The core filesystem is here and mostly done. There is something to work on which is not subject to change drastically and void the porting work. Goals ----- The goal is to have a working filesystem at the end of the summer, and be able to create, mount, read/write without panic on a single master filesystem. Non goals ----------- * boot loader support * clustering support. This is a major task on its own, and needs porting of more subsystems in the kernel. It is also the less advanced part of the filesystem, currently in development. Planned work ------------ * Weeks 1-2 Port the core of the filesystem. The chain API is the main in-memory data structure of HAMMER2. It is a COW tree of chains representing the toplogical structure of the files in the filesystem. A chain can represent an inode, a directory entry, or an indirect block. This is the most portable part of the filesystem since it doesn't depend on any external API. The main task is converting the locking directives from DragonFly to OpenBSD and deal with the slight kernel API differences, like malloc. * Weeks 3 Hammer2 has an abstraction on top of the kernel block io layer. Alongside of the chain API, the block io abstraction is another major part of the filesystem. It is responsible to handle buffer mapping from the buffer cache and read/write data to disk at the request of the chain frontend code. The third week would be dedicated to the porting of the io layer and the io clustering layer. Like with most kernel code, there is no test suite, but spending the time to write a few tests and get this code and the chain API build and run standalone might worth it for the next part of the project. Together they are the most complex code in the filesystem and if it works, then remaining bugs will be easier to track down. * Weeks 4 Port the ioctl api, and the bulkfree scan. The bulkfree scan is the code responsible to garbage collect the un-referenced blocks. * Week 5 Port the userspace tools. At this point, it should be possible to create a filesystem * Week 6-7-8 Convert the frontend to the openBSD vfs API. DragonFly VFS has diverged a lot from earlier BSD VFS, with substantial changes in the locking and semantic of the namecache/vnode locking and interactions. The frontend code which contains the vnode operations and the vfs operations is about 5kloc. This is a major task. At this point, it should be possible to use the filesystem. * Remaining time I'm not sure i'm able to plan this long. The first thing to do is to test the filesystem in depth, and fix the issues which arise. It's not possible to plan the time it'll need but i suspect it will be quite some work. After that, if i have more time, there are some possibility. Probably the best think is to track down one of the known bug of hammer2 and try to solve it, for instance the issue with hardlink (if matt dillon has not solved it at this point). Another possibility is working on a small feature for hammer2, like deduplication, but i'd like to be have a usable filesystem before trying to add something new. Thanks for reading, Joris