Suparna Bhattacharya wrote:
> I'm looking at whether we can do most of it at VFS level

Do you plan to reserve space as "blocks, somewhere", or as "these
specific on-disk locations" ? In ABISS, we did something of the
latter kind (in order to make large contiguous allocations also on
FAT), and it turned out to be a big mess, because ABISS needed too
much support from the file system driver. So we just scrapped that
bit :-)

> Of course, I haven't looked at how ABISS does delayed alloc -- 
> do you have a patch snippet I can look at ?

I just made a release. The kernel patch is in
abiss-7/kernel/abiss.patch  It's all in one big patch, sorry.
The main purpose of this is to see what we can achieve, so it's
not very polished.

The main parts: we added a new page flag, PG_delalloc, which
basically tells everyone to stay away from that page. There are
two purposes: (a) to make sure no allocation happens unless
explicitly requested, and (b) prevent the page from being written
back while it is still in ABISS' playout buffer. The reason for
(b) is that the page gets locked during writeback, which could
cause delays if the ABISS-using application then decides to
access the page.

The "hands off" code is mainly in fs/buffer.c, in the functions
__block_commit_write (set the page dirty, then go away),
cont_prepare_write (for FAT, do nothing),
block_prepare_write  (for ext2, do nothing),
and then fs/mpage.c:mpage_writepages (skip pages marked for
delayed allocation).

cont_prepare_write also needs to handle the special case where
it has to fill holes in a file. In this case, it simply overrides
delayed allocation. This bit will need more work.

Since ABISS prefetches pages, cont_prepare_write and
cont_prepare_write may now see pages that are already up to date,
so they must not zero them.

The prefetching happens in fs/abiss/sched_lib.c:abiss_read_page,
and writeback in abiss_put_page. We also experimented with
leaving the writeback to MM, but that led to OOM far too often.
The current solution works quite smoothly even if we tax the
system hard.

In order to keep things simple, I didn't try to make delayed
allocation do anything for writers that don't use ABISS.

The life cycle of a page is about as follows: when an application
reads or writes a file, ABISS maintains a playout buffer for it,
that typically reaches a few hundred kB ahead of the current file
position. Pages are prefetched and locked in the playout buffer.
The playout buffer is dimensioned that when file data enters the
playout buffer, there is enough time for the data to be in memory
by the time the application reaches it.

ABISS just calls readpage to get the data, which either causes it
to be read from disk, or the page to be zeroed, if we're beyond
EOF or at a hole.

The application accesses the page through the normal VFS functions,
so in the case of writing, the prepare/commit process happens.

Once the application has accessed the page, and moves the playout
buffer beyond it, the page is released and written back to disk.
Prefetching and writeback is done in a separate kernel thread, so
the application does not get delayed.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Buenos Aires, Argentina     [EMAIL PROTECTED] /
/_http://www.almesberger.net/____________________________________________/
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to