Re: Random disk cache expiry

Terry Lambert Sun, 26 Jan 2003 22:26:19 -0800

Sean Hamilton wrote:
> In my case I have a webserver serving up a few dozen files of about 10 MB
> each. While yes it is true that I could purchase more memory, and I could
> purchase more drives and stripe them, I am more interested in the fact that
> this server is constantly grinding away because it has found a weakness in
> the caching algorithm.


This is a problem in Linux, nd in SVR4, as well.  THis is the
main reason that scheduler classes were implemented for SVR4,
and the main reason the "fixed" scheduler class was implemented
in SVR4.0.2/SVR4.2, as of  UnixWare.  The system guarantees time
to the programs running under this scheduling class, and what
other software pages out, the program then has the CPU time to
page back in.  This was UnixWare's response to "move mouse,
wiggle cursor", when the X server performance went in the toilet.

The reason the UnixWare X server performance was in the toilet is
that the UnixWare "ld" program mmap's the .o files, and then seeks
around in them to resolve symbols, rather than reading the parts
in and building in-core data structures.

The result is predictable: the linker, when linking any sufficiently
large set of object files, thrashes all other pages out of core, as
it moves around accessing pages in these files.

The obvious answer to this problem -- and you problem -- is to
implement a working set size quota on a per-vnode basis, as I
previously suggested.  By doing this, you cause the pages in the
large objects to replace the old pages in the large objects, and
your other data does not get thrashed out of core.

THis works great on UnixWare, but requires modification of the VM
system in order to provide counting for the number of buffers that
are hubg off a given file object, with the result being that the
code is not binary compatible (this is why the modification never
made it into UnixWare, even though it was tested, and found to solve
the "ld/X server problem" without leading to more thrashing, and
do it without needing to introduce a new scheduling class, and make
less CPU time available to other applications on the system to let
the X server have sufficient time to thrash its pages back in).

This approach will also work for your problem, which is that your
several 10M files thrash everything else out of the cache, including
the executable pages.

Note that this approach need not have any effect under normal
conditionas, until available memory hits some low watermark,
causing it to trigger, and therefore we are talking a single
"if" test in the page replacement path, in the normal non-trigger
case, to implement it.


> After further thought, I propose something much simpler: when the kernel is
> hinted that access will be sequential, it should stop caching when there is
> little cache space available, instead of throwing away old blocks, or be
> much more hesitant to throw away old blocks. Consider that in almost all
> cases where access is sequential, as reading continues, the chances of the
> read being aborted increase: ie, users downloading files, directory tree
> traversal, etc. Since the likelihood of the first byte reading the first
> byte is very high, and the next one less high, and the next less yet, etc,
> it seems to make sense to tune the caching algorithm to accomodate this.

This is much harder to implement.  Specifically, the sequential
nature is hueristically detected, and it is this hueristic, not
the madvise, which is at issue.  If this hueristic did *not* get
triggered, then you would lose your read-ahead.  Therefore it's
not something that can be easily turned off.

Second, the VM and buffer cache are unified in FreeBSD.  THis
means that you can not "reserve" buffer cache entries that are
then not cached in VM objects, in order to cause the entries to
turn over.  Even if you were able to do this, through some serious
kludge, you would not be able to differentiate the things that
needed to be thrown out to make room for the transient pages,
which leaves you in the sme boat you were in before.


> While discussing disks, I have a minor complaint: at least on IDE systems,
> when doing something like an untar, the entire system is painfully
> unresponsive, even though CPU load is low. I presume this is because when an
> executable is run, it needs to sit and wait for the disk. Wouldn't it make
> sense to give very high disk priority to executables? Isn't that worth the
> extra seeks?

Actually, there reason for this is that the data portion of tagged
writes can not disconnect from the drive in ATA, and therefore the
tagged command queueing on reads does not help you for writes.  The
ATA protocol is broken.  If you switched to a SCSI disk, you would
see this problem go away.  This was recently discussed in detail in
the context of a discussion with a MAXTOR drive engineer in a thread
on the FreeBSD-FS mailing list.

In the limit, though, all disk requests come about as a result of
executables making demands.

Another serious issue that can cause this depends on it being a
particular case, specifically, the untarring of the "ports" tree.
If this is what you are talking about, the prblem has to do with
bredth-first vs. depth-first storage, and therefore caching, on
packing vs. unpacking of the tar archive.  One approach to making
this less of a problem is to reorder the archive index (and it's
contents, to match, if you are using serial media, like tape, to
access the archive).  This has been discussed before in great detail,
only the peple who care about it taking a long time have (so far)
been unwilling to write the "archive optmizer". 8-).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Random disk cache expiry

Reply via email to