In message <[EMAIL PROTECTED]>, "Peter J. Braam" writes:
> Hi
>
> I wondered if someone could explain what is happening in
> generic_file_read:
>
> More generically, I'd like to understand how in a file system I can
> "get" a page and use it to copy date into/out of it. How do I then put
> the page away?
I think my example from Cryptfs (or Wrapfs, or any other of my stackable
f/s) might help you.
My file systems use generic_file_read as their read routine. Let's take the
simple case of the first time to get a page, meaning it's not in memory or
in cache anywhere. In the VFS, generic_file_read essentially calls
do_generic_file_read, which does in a loop:
- find hash of the page
- try to find a page w/o locking it __find_page_nolock.
- initially there won't be a page or cached page, so it allocates a page
(page_cache_alloc()) and puts it in the cache (__add_to_page_cache()).
The page is allocated already locked.
- call *your* file system's readpage routine (which must exist, b/c you've
defined your f/s to use generic_file_read, instead of your own read
routine. This means that your readpage must assume that the page is
allocated and locked. No matter how your readpage is called, you'll get a
locked page.
- After returning from your readpage function, the VFS calls
page_cache_release, which frees the page, but does not remove it from LRU
caches. (I find the name 'page_cache_release' a bit confusing.) This
means that your readpage routine should have done all the necessary
actions prior to the very last free'ing of the page: this may include
setting uptodate/locked/whatever bits, removing from LRU caches, and more.
Now let's go into my cryptfs_readpage function. Remember that my situation
may be (slightly) more complicated then yours. My stackable file systems
must both emulate a VFS and a lower level f/s. My stackable f/ss act as a
VFS to the lower level f/s (say, ext2fs), and at the same time they look
like a lower level f/s to the real VFS.
This is what I do in cryptfs_readpage():
- find a page_hash() of the lower-level inode, for the same offset. This is
part of how I emulate a VFS to the lower-level f/s. The VFS looked for a
page hash at a given offset, so I repeat the same operation on the
lower-level inode/filesystem (which I sometimes call the "hidden" inode or
filesystem).
- find and lock a page at the lower-level, for the same offset. Remember
that the VFS called me w/ a locked page, so here I'm preparing a
lower-level f/s page and locking it, before calling the lower-level
file-system's readpage().
- if I cannot find such a page, I allocate it in kernel space, and add it to
the page cache (add_to_page_cache)
- I call the the readpage() routine of the lower-level f/s, and make sure I
have valid data (wait_on_page).
At this point, I have two pages: the hidden_page is the one I retrieved from
the lower level f/s, and the 'page' which was passed to me by the VFS. In
cryptfs, the hidden_page is encrypted, so now I'm decrypting the data from
the hidden_page and into the page which was passed to me.
- I use page_address() to "map" a page's data into kernel memory, so I can
copy and manipulate it as any other "char *" buffer. I map both pages,
then I call my decryption routine to decode the hidden_page into the
current page. This is done of course with the locks held on both pages.
- now that I have valid, decrypted data into the page that I got from the
VFS. I unlock it, set the uptodate flag on, and wake-up anyone who might
be waiting on it.
- finally, right before I return, I call __free_page() (which is the same as
page_cache_release) on the hidden_page. Since the VFS will do the same on
my page, I must free the hidden page which I allocated.
In all this fun, sooner or later, your flushpage routine may be called (via
truncate_inode_pages) from multiple places, such as iput(), vmtruncate() and
more. Some are invoked [in]directly by your f/s code or the VFS, while
other times are the result of a kernel thread that cleans up old unused
pages (LRU). All this means that your flushpage() function must do a few
more things, and emulate on the lower-level what truncate_inode_pages does
to cryptfs's pages:
- find the corresponding hidden page and lock it. the f/s's flushpage()
routine gets a locked page, but must not unlock it, b/c the VFS will
unlock the page.
- call the flushpage routine of the lower-level f/s
- clear the uptodate flag of the hidden_page, remove it from the lru cache
(lru_cache_del), call remove_inode_page on it as well, unlock the
hidden_page, and free it. These actions are mostly what
truncate_inode_pages does to your page, therefore cryptfs must do the same
to the lower-level f/s.
The above explanation is a simplified version of what really goes on, and
what my stackable f/s modules do. I didn't explain the other cases, nor the
interaction with other parts of the same file system or the VFS. Anyway,
it's a good start... :-)
If you have more questions, let fsdevel know and I'll see if I can help. (I
hope I explained everything correctly. Someone correct me otherwise.) For
the whole story, anyone is welcome to look at my actual sources:
http://www.cs.columbia.edu/~ezk/research/software/
Cheers,
Erez.