Suparna Bhattacharya wrote: > I'm looking at whether we can do most of it at VFS level
Do you plan to reserve space as "blocks, somewhere", or as "these specific on-disk locations" ? In ABISS, we did something of the latter kind (in order to make large contiguous allocations also on FAT), and it turned out to be a big mess, because ABISS needed too much support from the file system driver. So we just scrapped that bit :-) > Of course, I haven't looked at how ABISS does delayed alloc -- > do you have a patch snippet I can look at ? I just made a release. The kernel patch is in abiss-7/kernel/abiss.patch It's all in one big patch, sorry. The main purpose of this is to see what we can achieve, so it's not very polished. The main parts: we added a new page flag, PG_delalloc, which basically tells everyone to stay away from that page. There are two purposes: (a) to make sure no allocation happens unless explicitly requested, and (b) prevent the page from being written back while it is still in ABISS' playout buffer. The reason for (b) is that the page gets locked during writeback, which could cause delays if the ABISS-using application then decides to access the page. The "hands off" code is mainly in fs/buffer.c, in the functions __block_commit_write (set the page dirty, then go away), cont_prepare_write (for FAT, do nothing), block_prepare_write (for ext2, do nothing), and then fs/mpage.c:mpage_writepages (skip pages marked for delayed allocation). cont_prepare_write also needs to handle the special case where it has to fill holes in a file. In this case, it simply overrides delayed allocation. This bit will need more work. Since ABISS prefetches pages, cont_prepare_write and cont_prepare_write may now see pages that are already up to date, so they must not zero them. The prefetching happens in fs/abiss/sched_lib.c:abiss_read_page, and writeback in abiss_put_page. We also experimented with leaving the writeback to MM, but that led to OOM far too often. The current solution works quite smoothly even if we tax the system hard. In order to keep things simple, I didn't try to make delayed allocation do anything for writers that don't use ABISS. The life cycle of a page is about as follows: when an application reads or writes a file, ABISS maintains a playout buffer for it, that typically reaches a few hundred kB ahead of the current file position. Pages are prefetched and locked in the playout buffer. The playout buffer is dimensioned that when file data enters the playout buffer, there is enough time for the data to be in memory by the time the application reaches it. ABISS just calls readpage to get the data, which either causes it to be read from disk, or the page to be zeroed, if we're beyond EOF or at a hole. The application accesses the page through the normal VFS functions, so in the case of writing, the prepare/commit process happens. Once the application has accessed the page, and moves the playout buffer beyond it, the page is released and written back to disk. Prefetching and writeback is done in a separate kernel thread, so the application does not get delayed. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina [EMAIL PROTECTED] / /_http://www.almesberger.net/____________________________________________/ - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html