:In the balancing part, definately. FreeBSD seems to be the only
:system that has the balancing right. I'm planning on integrating
:some of the balancing tactics into Linux for the 2.5 kernel, but
:I'm not sure how to integrate the inode and dentry cache into the
:balancing scheme ...
:I'm curious about the other things though ... FreeBSD still seems
:to have the early 90's abstraction layer from Mach and the vnode
:cache doesn't seem to grow and shrink dynamically (which can be a
:big win for systems with lots of metadata activity).
:
:So while it's true that FreeBSD's VM balancing seems to be the
:best one out there, I'm not quite sure about the rest of the VM...
:
:regards,
:
:Rik
Well, the approach we take is that of a two-layered cache.
The vnode, dentry (namei for FreeBSD), and inode caches
in FreeBSD are essentially throw-away caches of data
represented in an internal form. The VM PAGE cache 'backs'
these caches loosely by caching the physical on-disk representation
of inodes, and directory entries (see note 1 at bottom).
This means that even though we limit the number of the namei
and inode structures we keep around in the kernel, the data
required to reconstitute those structures is 'likely' to
still be in the VM PAGE cache, allowing us to pretty much
throw away those structures on a whim. The only cost is that
we have to go through a filesystem op (possibly not requiring I/O)
to reconstitute the internal structure.
For example, take the namei cache. The namei cache allows
the kernel to bypass big pieces of the filesystem when doing
path name lookups. If a path is not in the namei cache the
filesystem has to do a directory lookup. But a directory
lookup could very well access pages in the VM PAGE cache
and thus still not actually result in a disk I/O.
The inode cache works the same way ... inodes can be thrown
away at any time and most of the time they can be reconstituted
from the VM PAGE cache without an I/O.
The vnode cache works slightly differently. VNodes that are
not in active use can be thrown away and reconstituted at a later
time from either the inode cache or the VM PAGE cache
(or if not then require a disk I/O to get at the stat information).
There is a caviat for the vnode cache, however. VNodes are tightly
integrated with VM Objects which in turn help place hold VM pages
in the VM PAGE cache. Thus when you throw away an inactive vnode
you also have to throw away any cached VM PAGES representing the
cached file or directory data represented by that vnode.
Nearly all installations of FreeBSD run out of physical memory long
before they run out of vnodes, so this side effect is almost never
an issue. On some extremely rare occassions it is possible that
the system will have plenty of free memory but hit its vnode cache
limit and start recycling vnodes, causing it to recycle cache pages
even when there is plenty of free memory available. But this is
very rare.
The key point to all of this is that we put most of our marbles in
the VM PAGE cache. The namei and inode caches are there simply for
convenience so we don't have to 'lock' big portions of the underlying
VM PAGE cache.
The VM PAGE cache is pretty much an independant entity. It does not know
or care *what* is being cached, it only cares how often the data is
being accessed and whether it is clean or dirty. It treats all the
data nearly the same.
note (1): Physical directory blocks have historically been cached in
the buffer cache, using kernel MALLOC space, not in the VM PAGE cache.
buffer-cache based MALLOC space is severely limited (only a few megabytes)
compared to what the VM PAGE cache can offer. In FreeBSD a
'sysctl -w vfs.vmiodirenable=1' will cause physical directory blocks to
be cached in the VM PAGE Cache, just like files are cached. This is
not the default but it will be soon, and many people already turn this
sysctl on.
-
I should also say that there is a *forth* cache not yet mentioned which
actually has a huge effect on the VM PAGE cache. This fourth cache
relates to pages *actively* mapped into user space. A page mapped into
user space is wired (cannot be ripped out of the VM PAGE cache) and also
has various other pmap-related tracking structures (which you are familiar
with, Rik, so I won't expound on that too much). If the VM PAGE cache
wants to get rid of an idle page that is still mapped to a user process,
it has to unwire it first which means it has to get rid of the user
mappings - a pmap*() call from vm/vm_pageout.c and vm/vm_page.c
accomplishes this. This fourth cache (the active user mappings of pages)
is also a throw away cache, though one with the side effect of making
VM PAGE cache pages available for loading into user process's memory maps.
-Matt
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message