Re: page_cache: how does generic_file_read really work?

Erez Zadok Fri, 17 Sep 1999 13:12:37 -0700
In message <[EMAIL PROTECTED]>, "Peter J. Braam" writes:
> Hi
> 
> I wondered if someone could explain what is happening in
> generic_file_read:
> 
> More generically, I'd like to understand how in a file system I can
> "get" a page and use it to copy date into/out of it.  How do I then put
> the page away? 

I think my example from Cryptfs (or Wrapfs, or any other of my stackable
f/s) might help you.

My file systems use generic_file_read as their read routine.  Let's take the
simple case of the first time to get a page, meaning it's not in memory or
in cache anywhere.  In the VFS, generic_file_read essentially calls
do_generic_file_read, which does in a loop:

- find hash of the page

- try to find a page w/o locking it __find_page_nolock.

- initially there won't be a page or cached page, so it allocates a page
  (page_cache_alloc()) and puts it in the cache (__add_to_page_cache()).
  The page is allocated already locked.

- call *your* file system's readpage routine (which must exist, b/c you've
  defined your f/s to use generic_file_read, instead of your own read
  routine.  This means that your readpage must assume that the page is
  allocated and locked.  No matter how your readpage is called, you'll get a
  locked page.

- After returning from your readpage function, the VFS calls
  page_cache_release, which frees the page, but does not remove it from LRU
  caches.  (I find the name 'page_cache_release' a bit confusing.)  This
  means that your readpage routine should have done all the necessary
  actions prior to the very last free'ing of the page: this may include
  setting uptodate/locked/whatever bits, removing from LRU caches, and more.


Now let's go into my cryptfs_readpage function.  Remember that my situation
may be (slightly) more complicated then yours.  My stackable file systems
must both emulate a VFS and a lower level f/s.  My stackable f/ss act as a
VFS to the lower level f/s (say, ext2fs), and at the same time they look
like a lower level f/s to the real VFS.

This is what I do in cryptfs_readpage():

- find a page_hash() of the lower-level inode, for the same offset.  This is
  part of how I emulate a VFS to the lower-level f/s.  The VFS looked for a
  page hash at a given offset, so I repeat the same operation on the
  lower-level inode/filesystem (which I sometimes call the "hidden" inode or
  filesystem).

- find and lock a page at the lower-level, for the same offset.  Remember
  that the VFS called me w/ a locked page, so here I'm preparing a
  lower-level f/s page and locking it, before calling the lower-level
  file-system's readpage().

- if I cannot find such a page, I allocate it in kernel space, and add it to
  the page cache (add_to_page_cache)

- I call the the readpage() routine of the lower-level f/s, and make sure I
  have valid data (wait_on_page).

At this point, I have two pages: the hidden_page is the one I retrieved from
the lower level f/s, and the 'page' which was passed to me by the VFS.  In
cryptfs, the hidden_page is encrypted, so now I'm decrypting the data from
the hidden_page and into the page which was passed to me.

- I use page_address() to "map" a page's data into kernel memory, so I can
  copy and manipulate it as any other "char *" buffer.  I map both pages,
  then I call my decryption routine to decode the hidden_page into the
  current page.  This is done of course with the locks held on both pages.

- now that I have valid, decrypted data into the page that I got from the
  VFS.  I unlock it, set the uptodate flag on, and wake-up anyone who might
  be waiting on it.

- finally, right before I return, I call __free_page() (which is the same as
  page_cache_release) on the hidden_page.  Since the VFS will do the same on
  my page, I must free the hidden page which I allocated.

In all this fun, sooner or later, your flushpage routine may be called (via
truncate_inode_pages) from multiple places, such as iput(), vmtruncate() and
more.  Some are invoked [in]directly by your f/s code or the VFS, while
other times are the result of a kernel thread that cleans up old unused
pages (LRU).  All this means that your flushpage() function must do a few
more things, and emulate on the lower-level what truncate_inode_pages does
to cryptfs's pages:

- find the corresponding hidden page and lock it.  the f/s's flushpage()
  routine gets a locked page, but must not unlock it, b/c the VFS will
  unlock the page.

- call the flushpage routine of the lower-level f/s

- clear the uptodate flag of the hidden_page, remove it from the lru cache
  (lru_cache_del), call remove_inode_page on it as well, unlock the
  hidden_page, and free it.  These actions are mostly what
  truncate_inode_pages does to your page, therefore cryptfs must do the same
  to the lower-level f/s.


The above explanation is a simplified version of what really goes on, and
what my stackable f/s modules do.  I didn't explain the other cases, nor the
interaction with other parts of the same file system or the VFS.  Anyway,
it's a good start... :-)

If you have more questions, let fsdevel know and I'll see if I can help.  (I
hope I explained everything correctly.  Someone correct me otherwise.)  For
the whole story, anyone is welcome to look at my actual sources:

        http://www.cs.columbia.edu/~ezk/research/software/

Cheers,
Erez.
Re: page_cache: how does generic_file_read really work?

Reply via email to