Re: Random disk cache expiry

2003-02-03 Thread Fergal Daly


On Friday 31 January 2003 03:47, David Schultz wrote:
> You have found an optimal replacement algorithm for the case of
> repeated sequential reads.  In fact, if you know in advance what
> the access pattern is going to be, it is *always* possible to find
> an optimal replacement algorithm.  Specifically, you always
> replace the block in the cache that will not be used for the
> longest time in the future.
>
> If you want the system to always cache the right thing in the
> general case, the only hint you would need to provide the system
> is a reference string specifying the predicted access pattern.
> (If I were to do this, my first reaction would be to encode it as
> an FSA.)  If that proves to be infeasible, I'm sure there are ways
> to approximate the same thing.  The hard parts, I think, would be
> teaching the VM system to use the new information, and gathering
> statistics from which you form your hints.

I wouldn't even begin to try and deduce the access pattern from statistics, 
that sounds hard. In many cases, the application knows in advance what the 
access pattern will be. I think someone else in the thread has suggested 
having hints attached to the file as one option and I think the app itself 
should be able to signal it's intention to repeatedly read the file.

I think trying to encode generalised access patterns is maybe not such a win. 
You could end up using a lot of memory storing the access pattern and a lot 
of effort encoding things as FSAs. Also the FSA has to be able to pick not 
just the block that won't be used for the longest time but it also has to 
pick a block that is actually in memory and for a more general access pattern 
this is not necessarily easy.

Coping with reading a file straight through, several times is relatively 
common and can be encoded very easily and without much memory and assuming 
the file is not being accessed non-sequentially by another process, then 
picking the correct block is easy.

Anyway, it's all academic for me really as I don't even run BSD and although I 
can write it when I have to, I don't find C anywhere near as much fun as 
Perl. I just thought it was an interesting question. I should probably go and 
get my coat now ;-)

F




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-31 Thread Bruce R. Montague

For those thinking of playing with predictive caching
(likely an area of considerable student endeveour/interest
these days at both filesystem and "web" level):

---
Matthew Dillon:
 > So there is no 'perfect' caching algorithm.  There
 > are simply too many variables even in a well defined
 > environment for even the best system heuristics to
 > cover optimally.
---
David Schultz:
 > If that proves to be infeasible, I'm sure there are
 > ways to approximate the same thing.  The hard parts,
 > I think, would be teaching the VM system to use the
 > new information, and gathering statistics from which
 > you form your hints.

---

Right. It's easy if you know the complete future of
the total system state, which of course you never
will.  Someone interested in this might try to apply
the latest in machine learing techniques, classifiers,
etc., to the online problem. Variants of this are
receiving lots of attention in areas such as gene
sequence prediction. I dunno, but it seems like a lot
of the math ends up pretty similar to economics, and
we all know how well those models work. Kind of funny,
running an economic simulation in your kernel... but
actually getting possible at some level, at least for
research systems with modern machines.  There was a
time when you would be fired for putting floating-point
in an OS.



 http://csl.cse.ucsc.edu/acme.shtml :

"Many cache replacement policies have been invented
and some perform better than others under certain
workload and network-topological conditions. It is
impossible and sub-optimal to manually choose cache
replacement policies for workloads and topologies that
are under continuous change. We use machine learning
algorithms to automatically select the best current
policy or mixtures of policies from a policy (a.k.a
expert) pool to provide an "adaptive caching" service."


 - bruce

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Igor Shmukler
> You have found an optimal replacement algorithm for the case of
> repeated sequential reads.  In fact, if you know in advance what
> the access pattern is going to be, it is *always* possible to find
> an optimal replacement algorithm.  Specifically, you always
> replace the block in the cache that will not be used for the
> longest time in the future.

Did everyone read UBM paper from OSDI? It presents one possible solution for dealing 
with sequentaly accessed files. Why is it not enough (at least to begin with)?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread David Schultz
Thus spake Fergal Daly <[EMAIL PROTECTED]>:
> [EMAIL PROTECTED] (Tim Kientzle) wrote in message 
> news:...
> > Personally, I think there's a lot of merit to _trying_
> 
> There's even more merit in only pretending to try... 

Welcome to my quotes file.

> As you can see, the locking cache is always better than the random one and the 
> file doesn't have to be very big for it to be hugely better.

You have found an optimal replacement algorithm for the case of
repeated sequential reads.  In fact, if you know in advance what
the access pattern is going to be, it is *always* possible to find
an optimal replacement algorithm.  Specifically, you always
replace the block in the cache that will not be used for the
longest time in the future.

If you want the system to always cache the right thing in the
general case, the only hint you would need to provide the system
is a reference string specifying the predicted access pattern.
(If I were to do this, my first reaction would be to encode it as
an FSA.)  If that proves to be infeasible, I'm sure there are ways
to approximate the same thing.  The hard parts, I think, would be
teaching the VM system to use the new information, and gathering
statistics from which you form your hints.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Matthew Dillon

:I think you missed Matt's point, which is well-taken:
:
:Even if everybody accesses it sequentially, if you have 100 processes 
:accessing it sequentially at the *same* time, then it would be to your 
:benefit to leave the "old" pages around because even though *this* 
:process won't access it again, the *next* process very well might, if 
:it just happens to be reading it sequentially as well but is a little 
:further behind on its sequential read.

Right, and if the same 100 processes are accessing N files sequentially
instead of just one, you get a different effect... you might blow out
your cache if the aggregate size of the N files is too large.  But then
if some of those processes are accessing the same file, and other processes
were accessing different files, you might want to cache that file
but possibly not cache others, even though all the files are (for this
example) the same size.  But then what if some of the processes accessing
some of those other files were from slow clients?  You could get away
with not caching those files and then you might possibly be able cache
all the remaining files (being accessed by faster clients). 

And so on, and so forth.  It gets even more complicated when you throw
in read verses write verses read-modify-write accesses, and even more
complicated when you add other load factors (e.g. the sheer number of
connections might reduce the memory available for caching on the fly,
or trying to balance executable pages verses data pages to optimize
paging and program performance).

So there is no 'perfect' caching algorithm.  There are simply too many
variables even in a well defined environment for even the best system
heuristics to cover optimally.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Bruce R. Montague

 Hi, re:

 > If a file's access history were stored as a "hint"
 > associated with the file, then it would
 > be possible to make better up-front decisions about
 > how to allocate cache space.

 I believe at one time this was a hot area, and now
maybe it is back. I vaguely recall a recent PhD in
this area, you might want to contact him and get
some up to date pointers (I think Tom Kroger spent
some time at CMU looking at a lot of file access
traces in this regard). I believe Tom modified 
a Linux filesystem to do just this and did a fair
amount of benchmarking (which, of course, proved
it (or at least his PhD ;) was useful)! 

 In any case you might get some interesting
starting refs if you are interested in this
area...

 http://www.cse.ucsc.edu/~tmk/
 
 http://csl.cse.ucsc.edu/prediction.shtml

 http://csl.cse.ucsc.edu/acme.shtml


 - bruce

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Brian T. Schellenberger



On Thursday 30 January 2003 07:06 pm, Tim Kientzle wrote:
| Matthew Dillon wrote:
| > Your idea of 'sequential' access cache restriction only
| >
| > works if there is just one process doing the accessing.
|
| Not necessarily.  I suspect that there is
| a strong tendency to access particular files
| in particular ways.  E.g., in your example of
| a download server, those files are always
| read sequentially.  You can make similar assertions
| about a lot of files:
:
: For example, if a process
| started to read a 10GB file that has historically been
| accessed sequentially, you could immediately decide
| to enable read-ahead for performance, but also mark
| those pages to be released as soon as they were read by the
| process.

I think you missed Matt's point, which is well-taken:

Even if everybody accesses it sequentially, if you have 100 processes 
accessing it sequentially at the *same* time, then it would be to your 
benefit to leave the "old" pages around because even though *this* 
process won't access it again, the *next* process very well might, if 
it just happens to be reading it sequentially as well but is a little 
further behind on its sequential read.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Matthew Dillon

:Not necessarily.  I suspect that there is
:a strong tendency to access particular files
:in particular ways.  E.g., in your example of
:a download server, those files are always
:read sequentially.  You can make similar assertions
:about a lot of files: manpages, gzip files,
:C source code files, etc, are "always" read
:sequentially.
:
:If a file's access history were stored as a "hint"
:associated with the file, then it would
:be possible to make better up-front decisions about
:how to allocate cache space.  The ideal would be to

This has been tried.  It works up to a point, but
not to the extent that you want it to.  The basic
problem is that past history does not necessarily
predict future behavior.  With the web server example,
different client loads will result in different
access behaviors.  They might still all be sequential,
but the combinations of multiple users will change
the behavior enough that you would not be able to use
the history as a reliable metric to control the cache.

There is also an issue of how to store the 'history'.
It isn't a simple matter of storing when a block was
last accessed.  Analysis of the access history is just
as important and a lot of the type of analysis we humans
do is intuitive and just cannot be replicated by a computer.

Basically it all devolves down into the case where if
you know exactly how something is going to be accessed,
or you need caching to work a certain way in order to
guarentee a certain behavior, the foreknowledge you have
of the access methodologies will allow you to cache the
information manually far better then the system could
cache it heuristically.

:store such hints on disk (maybe as an extended
:attribute?), but it might also be useful to cache
:them in memory somewhere.  That would allow the
:cache-management code to make much earlier decisions
:about how to handle a file.  For example, if a process
:started to read a 10GB file that has historically been
:accessed sequentially, you could immediately decide
:to enable read-ahead for performance, but also mark
:those pages to be released as soon as they were read by the
:process.
:
:FWIW, a web search for "randomized caching" yields
:some interesting reading.  Apparently, there are
:a few randomized cache-management algorithms for
:which the mathematics work out reasonably well,
:despite Terry's protestations to the contrary.  ;-)
:I haven't yet found any papers describing experiences
:with real implementations, though.
:
:If only I had the time to spend poring over FreeBSD's
:cache-management code to see how these ideas might
:actually be implemented... 
:
:Tim Kientzle

It should be noted that was already implement most of the
heuristics you talk about.  We have a heuristic that
detects sequential access patterns, for example, and
enables clustered read-ahead.  The problem isn't detection,
the problem is scale.  These heuristics work wonderfully
at a small scale (i.e. lets read 64K ahead verses trying to
cache 64MB worth of the file).  Just knowing something is
sequential does not allow you to choose how much memory you
should set aside to cache that object, for example.  Automatically
depressing the priority of pages read sequentially after they've been
used can have as terrible a performance impact as it can a positive
one, depending on the size of the object, the number of distinct
objects being accessed in that manner, perceived latency by end users,
number of end users, the speed of their connections (some objects may
be accessed more slowly then others depending on the client's network
bandwidth), and so forth.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Tim Kientzle
Matthew Dillon wrote:


Your idea of 'sequential' access cache restriction only



works if there is just one process doing the accessing.



Not necessarily.  I suspect that there is
a strong tendency to access particular files
in particular ways.  E.g., in your example of
a download server, those files are always
read sequentially.  You can make similar assertions
about a lot of files: manpages, gzip files,
C source code files, etc, are "always" read
sequentially.

If a file's access history were stored as a "hint"
associated with the file, then it would
be possible to make better up-front decisions about
how to allocate cache space.  The ideal would be to
store such hints on disk (maybe as an extended
attribute?), but it might also be useful to cache
them in memory somewhere.  That would allow the
cache-management code to make much earlier decisions
about how to handle a file.  For example, if a process
started to read a 10GB file that has historically been
accessed sequentially, you could immediately decide
to enable read-ahead for performance, but also mark
those pages to be released as soon as they were read by the
process.

FWIW, a web search for "randomized caching" yields
some interesting reading.  Apparently, there are
a few randomized cache-management algorithms for
which the mathematics work out reasonably well,
despite Terry's protestations to the contrary.  ;-)
I haven't yet found any papers describing experiences
with real implementations, though.

If only I had the time to spend poring over FreeBSD's
cache-management code to see how these ideas might
actually be implemented... 

Tim Kientzle


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Brian T. Schellenberger



On Thursday 30 January 2003 05:22 pm, Matthew Dillon wrote:
| Well, here's a counterpoint.  Lets say you have an FTP
| server with 1G of ram full of, say, pirated CDs at 600MB a
| pop.
|
| Now lets say someone puts up a new madonna CD and suddenly
| you have thousands of people from all over the world trying
| to download a single 600MB file.
|
| Lets try another one.  Lets say you have an FTP server with
| 1G of ram full of hundreds of MPEG encoded pirated CDs at
| 50MB a pop and you have thousands of people from all over the
| world trying to download a core set of 25 CDs, which exceeds
| the available ram you have to cache all of them.
|
| What I'm trying to illustrate here is the impossibility of
| what you are asking.  Your idea of 'sequential' access cache
| restriction only works if there is just one process doing the
| accessing.  But if you have 25 processes accessing 25 different
| files sequentially it doesn't work, and how is the system supposed to
| detect the difference between 25 processes accessing 25 50MB files on
| a 1G machine (which doesn't fit in the cache) verses 300 processes
| accessing 15 50MB files on a 1G machine (which does fit). 
| Furthermore, how do you differentiate between 30 processes all
| downloading the same 600MB CD verses 30 processes downloading two
| different 600MB CD's, on a machine with 1G of cache?
|
| You can't.  That's the problem.  There is no magic number between
| 0 and the amount of memory you have where you can say "I am going
| to stop caching this sequential file" that covers even the more
| common situations that come up.  There is no algorithm that can
| detect the above situations before the fact or on the fly.  You
| can analyize the situation after the fact, but by then it is too
| late, and the situation may change from minute to minute.  One minute
| you have 300 people trying to download one CD, the next minute you
| have 20 people trying to download 10 different CD's.
|
|   -Matt


You are absolutely right, and thank you for fullfilling my request to 
pillory me for that bit:

| :Of course the trick here is waving my hands and saying "assume that
| : you know how the file will be accessed in the future."  You ought
| : to pillory me for *that* bit.  Even with hinting there are problems
| : with this whole idea.  Still with some hinting the algorithm could
| : probably be a little more clever.
| :
| :(Actually, Terry Lambert *did* pillory me for that bit, just a bit,
| : when he pointed out the impossibility of knowing whether the file
| : is being used in the same way by other processes.)

Though actually that was awful gentle for a pillory.

All that said, there are *always* pathological cases for any algorithm; 
it is my intuition that somehow giving cache retention "priority" to 
processes that randomly access files would be likely to be an overall 
win.  Not that I'm planning to put my money where my mouth is and write 
some code to test this contention.

If you wanted to get *really* fancy you could add some code that tracked 
hits for given blocks over time and increased priority for blocks that 
were brought in often "recently" -- sort of an extended LRU list that 
wouldn't map a whole block in but would tell you what you'd tossed out 
lately and you could give priorty to blocks that were out of the "list 
cache" (or whatever we would call this secondary list of recently-used 
blocks).

Um, in re-reading that paragraph it seems clear as mud . . .

Whether any of these schemes would be of any practical benefit in any 
common situations is highly dubious; they wuld all increase overhead 
for one thing.  I think that the definitive answer, really, is the one 
you gave earlier:  If you have a specialized application or server 
which you know will make intense use of the file system in a 
predictable way, then you should customize the way it accesses the 
files because no general-purpose algorithm can ever be optimal for all 
cases.



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-30 Thread Matthew Dillon
Well, here's a counterpoint.  Lets say you have an FTP
server with 1G of ram full of, say, pirated CDs at 600MB a
pop.

Now lets say someone puts up a new madonna CD and suddenly
you have thousands of people from all over the world trying
to download a single 600MB file.

Lets try another one.  Lets say you have an FTP server with
1G of ram full of hundreds of MPEG encoded pirated CDs at
50MB a pop and you have thousands of people from all over the
world trying to download a core set of 25 CDs, which exceeds
the available ram you have to cache all of them.

What I'm trying to illustrate here is the impossibility of
what you are asking.  Your idea of 'sequential' access cache
restriction only works if there is just one process doing the
accessing.  But if you have 25 processes accessing 25 different files 
sequentially it doesn't work, and how is the system supposed
to detect the difference between 25 processes accessing 25
50MB files on a 1G machine (which doesn't fit in the cache)
verses 300 processes accessing 15 50MB files on a 1G machine
(which does fit).  Furthermore, how do you differentiate
between 30 processes all downloading the same 600MB CD verses
30 processes downloading two different 600MB CD's, on a machine 
with 1G of cache?

You can't.  That's the problem.  There is no magic number between
0 and the amount of memory you have where you can say "I am going
to stop caching this sequential file" that covers even the more
common situations that come up.  There is no algorithm that can
detect the above situations before the fact or on the fly.  You
can analyize the situation after the fact, but by then it is too late,
and the situation may change from minute to minute.  One minute you
have 300 people trying to download one CD, the next minute you have
20 people trying to download 10 different CD's.

-Matt

:The suggestion here basically boils down to this: if the system could 
:act on hints that somebody will be doing sequential access, then it 
:should be more timid about caching for that file access.  That is to 
:say, it should allow that file to "use up" a smaller number of blocks 
:from the cache (yes, the VM) at a time, and it should favor, if 
:anything, a LIFO scheme instead of the usual FIFO (LRU) scheme.  (That 
:is to say, for the special case of *sequential* access, LRU == FIFO, 
:and yet LIFO is probably more optimal for this case, at least if the 
:file will be re-read later.)
:
:Caching will do more good on files that that will be randomly accessed; 
:an intermediate amount of good on files sequentially accessed but 
:rewound and/or accessed over and over, and if the file system could 
:somehow know (or be hinted) that the file is being sequentially 
:accessed and is unlikely to be accessed again for a good long while it 
:would clearly be better off not caching it at all.
:
:Of course the trick here is waving my hands and saying "assume that you 
:know how the file will be accessed in the future."  You ought to 
:pillory me for *that* bit.  Even with hinting there are problems with 
:this whole idea.  Still with some hinting the algorithm could probably 
:be a little more clever.
:
:(Actually, Terry Lambert *did* pillory me for that bit, just a bit, when 
:he pointed out the impossibility of knowing whether the file is being 
:used in the same way by other processes.)
:
:And . . . also to Terry, yes, I know that my proposal about 
:over-simplifies, but the point is that for sequential access you want 
:to go "gentle" on making the cache of other process' and earlier reads 
:go away.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-27 Thread Brian T. Schellenberger

The suggestion here basically boils down to this: if the system could 
act on hints that somebody will be doing sequential access, then it 
should be more timid about caching for that file access.  That is to 
say, it should allow that file to "use up" a smaller number of blocks 
from the cache (yes, the VM) at a time, and it should favor, if 
anything, a LIFO scheme instead of the usual FIFO (LRU) scheme.  (That 
is to say, for the special case of *sequential* access, LRU == FIFO, 
and yet LIFO is probably more optimal for this case, at least if the 
file will be re-read later.)

Caching will do more good on files that that will be randomly accessed; 
an intermediate amount of good on files sequentially accessed but 
rewound and/or accessed over and over, and if the file system could 
somehow know (or be hinted) that the file is being sequentially 
accessed and is unlikely to be accessed again for a good long while it 
would clearly be better off not caching it at all.

Of course the trick here is waving my hands and saying "assume that you 
know how the file will be accessed in the future."  You ought to 
pillory me for *that* bit.  Even with hinting there are problems with 
this whole idea.  Still with some hinting the algorithm could probably 
be a little more clever.

(Actually, Terry Lambert *did* pillory me for that bit, just a bit, when 
he pointed out the impossibility of knowing whether the file is being 
used in the same way by other processes.)

And . . . also to Terry, yes, I know that my proposal about 
over-simplifies, but the point is that for sequential access you want 
to go "gentle" on making the cache of other process' and earlier reads 
go away.




On Monday 27 January 2003 12:44 am, Tim Kientzle wrote:
| Brian T. Schellenberger wrote:
| > This to me is imminently sensible.
| > In fact there seem like two rules that have come up in this
| > discussion:
| >
| > 1. For sequential access, you should be very hesitant to throw away
| > *another* processes blocks, at least once you have used more than,
| > say, 25% of the cache or potential cache.
| >
| > 2. For sequential access, you should stop caching before you throw
| > away your own blocks.  If it's sequential it is, it seems to me,
| > always a lose to throw away your *own* processes older bllocks on
| > thee same file.
|
| The question isn't "should I throw away blocks or not?".
| Obviously, the ideal is to keep everything in the cache, but
| that's not possible.
|
| The question is "what blocks should be discarded?"  You've
| ruled out quite a few possibilities here.  What blocks _should_
| be discarded?
|
| Tim Kientzle

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Matthew Dillon
M.  Basically what it comes down to is that without foreknowledge
of the data locations being accessed, it is not possible for any 
cache algorithm to adapt to all the myrid ways data might be accessed.
If you focus the cache on one methodology it will probably perform
terribly when presented with some other methodology.

What this means is that for the cases where you *KNOW* how a program
intends to access a dataset larger then main memory, it is better
to have the program explicitly cache/not-cache the data under program
control rather then trying to force the system cache to adapt.

I'll also be nice and decode some of Terry's Jargon for the rest of
the readers.

:will result in significant failure of random page replacement to
:result in cache hits; likewise, going to 85% overage will practically
:guarantee an almost 100% failure rate, as cyclical access with random
:replacement is statistically more likely, in aggregate, to replace
:the pages which are there longer (the probability is iterative and
:additive: it's effectively a permutation).

What Terry is saying is that if you have a dataset that is 2x
the size of your cache, the cache hit rate on that data with random
page replacement is NOT going to be 50%.  This is because with random
page replacement the likelihood of a piece of data being found in
the cache depends on how long the data has been sitting in the cache.
The longer the data has been sitting in the cache, the less likely you
will find it when you need it (because it is more likely to have been
replaced by the random replacement algorithm over time).

So, again, the best caching methodology to use in the case where
you *know* how much data you will be accessing and how you will access
it is to build the caching directly into your program and not depend
on system caching at all (for datasets which are far larger then
main memory).

This is true of all systems, not just BSD.  This is one reason why
databases do their own caching (another is so databases know when an
access will require I/O for scheduling reasons, but that's a different
story).

The FreeBSD VM caching system does prevent one process from exhausting
another process's cached data due to a sequential access, but the
FreeBSD VM cache does not try to outsmart sequential data accesses to
datasets which are larger then available cache space because it's an
insanely difficult (impossible) problem to solve properly without
foreknowledge of what data elements will be accessed when. 

This isn't to say that we can't improve on what we have now.  
I certainly think we can.  But there is no magic bullet that will
deal with every situation.

-Matt


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread David Schultz
Thus spake Tim Kientzle <[EMAIL PROTECTED]>:
> Sean Hamilton proposes:
> 
> >Wouldn't it seem logical to have [randomized disk cache expiration] in
> >place at all times?
> 
> Terry Lambert responds:
> 
> >>:I really dislike the idea of random expiration; I don't understand
> >>:the point, unless you are trying to get better numbers on some
> 
> >>:benchmark.
> 
> Matt Dillon concedes:
> 
> >>... it's only useful when you are cycling through a [large] data set ...
> 
> Cycling through large data sets is not really that uncommon.
> I do something like the following pretty regularly:
>find /usr/src -type f | xargs grep function_name
> 
> Even scanning through a large dataset once can really hurt
> competing applications on the same machine by flushing
> their data from the cache for no gain.  I think this
> is where randomized expiration might really win, by reducing the
> penalty for disk-cache-friendly applications who are competing
> with disk-cache-unfriendly applications.
> 
> There's an extensive literature on randomized algorithms.
> Although I'm certainly no expert, I understand that such
> algorithms work very well in exactly this sort of application,
> since they "usually" avoid worst-case behavior under a broad
> variety of inputs.  The current cache is, in essence,
> tuned specifically to work badly on a system where applications
> are scanning through large amounts of data.  No matter what
> deterministic caching algorithm you use, you're choosing
> to behave badly under some situation.

Yes, if you randomly vary the behavior of the algorithm, you can
guarantee that on average, performance will never be too bad for
any particular input, but it will never be very good, on average,
either.

You can't mathematically prove everything about the memory access
patterns of real-world programs, but LRU seems to do pretty well
in a variety of situations.  It does, however, have its worst
cases.  A random replacement algorithm is very unlikely to do *as*
badly as LRU in LRU's worst case; its performance is consistent
and relatively poor.  Keep in mind that there's a bias in
most real-world programs that favors LRU more than randomness.

So should FreeBSD make it possible to ask for random replacement?
Probably, since it would be helpful for those times when you
*know* that LRU isn't going to do the right thing.  (In the
sequential read special case the OP mentions, random replacement
is better than LRU, but still worse than a deterministic algorithm
that just caches the prefix of the file that will fit in memory.
So in this situation we could do even better in theory.)

Should randomness be part of the default replacement algorithm, as
the OP suggests?  Probably not, since that would be pessimizing
performance in the common case for the sake of improving it in an
uncommon case.

Should the system be able to detect cases where the default
replacement algorithm is failing and dynamically modify its
behavior?  I think that would be really cool, if it is possible...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Terry Lambert
"Brian T. Schellenberger" wrote:
> 2. For sequential access, you should stop caching before you throw away
> your own blocks.  If it's sequential it is, it seems to me, always a
> lose to throw away your *own* processes older bllocks on thee same
> file.

You can not have a block in a vm object which is not a cached
copy of a data block in the backing store object.

You can not access data blocks from the disk, except in the
context of a backing store object.

FreeBSD has a unified VM and buffer cache.

FreeBSD does not copy data off the disk into a buffer, and
then decide whether to copy the buffer/reference it from a
cache: if it is in a buffer, it's in the cache, period.

You are talking about avoiding a copy that doesn't happen.

The real issue is whether, after notification of completion
of the transfer (via wakeup of the blocked "read" system call),
the VM buffer in question is relinked to the other end of the
LRU list, artificially "aging" it.

The answer to this is that you can't do it, because you do
not know that the file system access itself is sequential for
all people who have the file open at the time you make the
request, and you can't know to which buffers you are actually
referring, because this is handled via demand paging, "under
the covers", and therefore not visible to the "read" call at
that level.


> These algorithmic changes seem to me to be more likely to be optimal
> most of the time.

No.  All openers get different struct file's, but they get the
same vp, which means they get the same vm_object_t.


> A random approach *does* reduce the penalty for worst-case scenarios but
> at the cost of reducing the benefit of both "normal" and "best-case"
> scenarios, it seems to me, even more.

I believe the "random page replacement" alrgotihm that was being
referred to here is actually only supposed to be triggered if
the file access has been advised tbe sequential; even so, there
is no win in doing this, as you say (see other posting).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Terry Lambert
Sean Hamilton wrote:
> In my case I have a webserver serving up a few dozen files of about 10 MB
> each. While yes it is true that I could purchase more memory, and I could
> purchase more drives and stripe them, I am more interested in the fact that
> this server is constantly grinding away because it has found a weakness in
> the caching algorithm.

This is a problem in Linux, nd in SVR4, as well.  THis is the
main reason that scheduler classes were implemented for SVR4,
and the main reason the "fixed" scheduler class was implemented
in SVR4.0.2/SVR4.2, as of  UnixWare.  The system guarantees time
to the programs running under this scheduling class, and what
other software pages out, the program then has the CPU time to
page back in.  This was UnixWare's response to "move mouse,
wiggle cursor", when the X server performance went in the toilet.

The reason the UnixWare X server performance was in the toilet is
that the UnixWare "ld" program mmap's the .o files, and then seeks
around in them to resolve symbols, rather than reading the parts
in and building in-core data structures.

The result is predictable: the linker, when linking any sufficiently
large set of object files, thrashes all other pages out of core, as
it moves around accessing pages in these files.

The obvious answer to this problem -- and you problem -- is to
implement a working set size quota on a per-vnode basis, as I
previously suggested.  By doing this, you cause the pages in the
large objects to replace the old pages in the large objects, and
your other data does not get thrashed out of core.

THis works great on UnixWare, but requires modification of the VM
system in order to provide counting for the number of buffers that
are hubg off a given file object, with the result being that the
code is not binary compatible (this is why the modification never
made it into UnixWare, even though it was tested, and found to solve
the "ld/X server problem" without leading to more thrashing, and
do it without needing to introduce a new scheduling class, and make
less CPU time available to other applications on the system to let
the X server have sufficient time to thrash its pages back in).

This approach will also work for your problem, which is that your
several 10M files thrash everything else out of the cache, including
the executable pages.

Note that this approach need not have any effect under normal
conditionas, until available memory hits some low watermark,
causing it to trigger, and therefore we are talking a single
"if" test in the page replacement path, in the normal non-trigger
case, to implement it.


> After further thought, I propose something much simpler: when the kernel is
> hinted that access will be sequential, it should stop caching when there is
> little cache space available, instead of throwing away old blocks, or be
> much more hesitant to throw away old blocks. Consider that in almost all
> cases where access is sequential, as reading continues, the chances of the
> read being aborted increase: ie, users downloading files, directory tree
> traversal, etc. Since the likelihood of the first byte reading the first
> byte is very high, and the next one less high, and the next less yet, etc,
> it seems to make sense to tune the caching algorithm to accomodate this.

This is much harder to implement.  Specifically, the sequential
nature is hueristically detected, and it is this hueristic, not
the madvise, which is at issue.  If this hueristic did *not* get
triggered, then you would lose your read-ahead.  Therefore it's
not something that can be easily turned off.

Second, the VM and buffer cache are unified in FreeBSD.  THis
means that you can not "reserve" buffer cache entries that are
then not cached in VM objects, in order to cause the entries to
turn over.  Even if you were able to do this, through some serious
kludge, you would not be able to differentiate the things that
needed to be thrown out to make room for the transient pages,
which leaves you in the sme boat you were in before.


> While discussing disks, I have a minor complaint: at least on IDE systems,
> when doing something like an untar, the entire system is painfully
> unresponsive, even though CPU load is low. I presume this is because when an
> executable is run, it needs to sit and wait for the disk. Wouldn't it make
> sense to give very high disk priority to executables? Isn't that worth the
> extra seeks?

Actually, there reason for this is that the data portion of tagged
writes can not disconnect from the drive in ATA, and therefore the
tagged command queueing on reads does not help you for writes.  The
ATA protocol is broken.  If you switched to a SCSI disk, you would
see this problem go away.  This was recently discussed in detail in
the context of a discussion with a MAXTOR drive engineer in a thread
on the FreeBSD-FS mailing list.

In the limit, though, all disk requests come about as a result of
executables making demands.

Another serious issue 

Re: Random disk cache expiry

2003-01-26 Thread Terry Lambert
Tim Kientzle wrote:
> Cycling through large data sets is not really that uncommon.
> I do something like the following pretty regularly:
> find /usr/src -type f | xargs grep function_name
> 
> Even scanning through a large dataset once can really hurt
> competing applications on the same machine by flushing
> their data from the cache for no gain.  I think this
> is where randomized expiration might really win, by reducing the
> penalty for disk-cache-friendly applications who are competing
> with disk-cache-unfriendly applications.

Let's set aside for a minute that this discussion does not
-- can not -- apply to directory metadata, and that the limiting
factor in your example is going to be the divorce of inode data
from vnodes in the UFS/UFS2/EXT2 ihash code, or, if there is a
low maximum vnode count, the vnodes themselves, and in any
divorce case or vnode LRU, the association between the cached
data and the FS object will be lost -- effectively losing the
cached data, no matter what, since it can't be recovered, even
if it's in memory, because nothing references it or remembers
which pages it's in.

Let's say, instead, that we are only talking about the data
being grep'ed.

There are a million ways to "win" against such applications;
one obvious one is to establish a "cache time to live" for any
cached data, and not evict if it has not passed its time to
live (effectively establishing two zones, one for cached data,
and one for transient data which would be cached, if you had
the RAM for it).


Doing this randomized drop approach is bogus.  What you are
effectively hoping for is that the pages will not be evicted
before they are hit again, because your shotgun pellets will
not shoot them down.

I can prove mathematically (or you can read it out of Knuth's
Seminumerical Algorithms: Sorting and Searching) that even with
a "perfect" random number generator, as soon as the cache size
is 115% or less of the data set size, you are almost certainly
guaranteed that the pages will be evicted before they are
rereferenced, given sequential access.


> There's an extensive literature on randomized algorithms.
> Although I'm certainly no expert, I understand that such
> algorithms work very well in exactly this sort of application,
> since they "usually" avoid worst-case behavior under a broad
> variety of inputs.  The current cache is, in essence,
> tuned specifically to work badly on a system where applications
> are scanning through large amounts of data.  No matter what
> deterministic caching algorithm you use, you're choosing
> to behave badly under some situation.

As I stated before, this is frequently used to "cheat" benchmarks
that are intended to evict cache content to defeat performance
improvements from caching (bechmarks intended to "cheat" caching).

The "magic" part here, though is that the working set size ends
up being only fractionally larger than the cache size.  For
example, the "Polygraph" benchmark intentionally "busts" the cache
on intermediate servers, to make a political statement:

http://www.web-polygraph.org/docs/workloads/

Network Appliance and other vendors have a random page replacement
policy that scores them much better numbers on this test, simply
by doing random replacement instead of LRU or some other weight
based replacement which the test software is capable of detecting
during the "run up" phase of the test, where th software attempts
to determine the "peak rate" following cache overflow.

While this gives "good numbers" for the benchmark, in a real workload,
this is unlikely to be representative of real performance.  Likewuse,
since their intent is to find the "knee" in the performance curve, and
populate the cache beyond that, then do testing there (thus "proving"
that all connections "should be" end-to-end, rather than permitting
the existance of proxies), this is unlikely to be representative of
a real-world cache overflow condition, since in the real world, the
overflow is unlikely to be limited to "101% of cache capacity", but
will instead not stop there; going above 15% of planned capacity
will result in significant failure of random page replacement to
result in cache hits; likewise, going to 85% overage will practically
guarantee an almost 100% failure rate, as cyclical access with random
replacement is statistically more likely, in aggregate, to replace
the pages which are there longer (the probability is iterative and
additive: it's effectively a permutation).


> Personally, I think there's a lot of merit to _trying_

I think that if this is implemented by someone who doesn't want to
believe Donald Knuth (8-)), they should at least be willing to
test with a data set of 3 times the available cache size, so
that they can see that the algorithm does not result in any
performance improvement.


> randomized disk cache expiry and seeing how it works in practice.

It should work very poorly, according to the math.

Another approach to "get what you want" wo

Re: Random disk cache expiry

2003-01-26 Thread Tim Kientzle
Brian T. Schellenberger wrote:


This to me is imminently sensible.
In fact there seem like two rules that have come up in this discussion:

1. For sequential access, you should be very hesitant to throw away 
*another* processes blocks, at least once you have used more than, say, 
25% of the cache or potential cache.

2. For sequential access, you should stop caching before you throw away 
your own blocks.  If it's sequential it is, it seems to me, always a 
lose to throw away your *own* processes older bllocks on thee same 
file.


The question isn't "should I throw away blocks or not?".
Obviously, the ideal is to keep everything in the cache, but
that's not possible.

The question is "what blocks should be discarded?"  You've
ruled out quite a few possibilities here.  What blocks _should_
be discarded?

Tim Kientzle



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Brian T. Schellenberger



On Sunday 26 January 2003 11:55 pm, Sean Hamilton wrote:
| - Original Message -
| From: "Tim Kientzle" <[EMAIL PROTECTED]>
|
| | Cycling through large data sets is not really that uncommon.
| | I do something like the following pretty regularly:
| | find /usr/src -type f | xargs grep function_name
| |
| | Even scanning through a large dataset once can really hurt
| | competing applications on the same machine by flushing
| | their data from the cache for no gain.  I think this
| | is where randomized expiration might really win, by reducing the
| | penalty for disk-cache-friendly applications who are competing
| | with disk-cache-unfriendly applications.
|
|
| After further thought, I propose something much simpler: when the
| kernel is hinted that access will be sequential, it should stop
| caching when there is little cache space available, instead of
| throwing away old blocks, or be much more hesitant to throw away old
| blocks. 

This to me is imminently sensible.
In fact there seem like two rules that have come up in this discussion:

1. For sequential access, you should be very hesitant to throw away 
*another* processes blocks, at least once you have used more than, say, 
25% of the cache or potential cache.

2. For sequential access, you should stop caching before you throw away 
your own blocks.  If it's sequential it is, it seems to me, always a 
lose to throw away your *own* processes older bllocks on thee same 
file.

These algorithmic changes seem to me to be more likely to be optimal 
most of the time.

A random approach *does* reduce the penalty for worst-case scenarios but 
at the cost of reducing the benefit of both "normal" and "best-case" 
scenarios, it seems to me, even more.


| Consider that in almost all cases where access is sequential,
| as reading continues, the chances of the read being aborted increase:
| ie, users downloading files, directory tree traversal, etc. Since the
| likelihood of the first byte reading the first byte is very high, and
| the next one less high, and the next less yet, etc, it seems to make
| sense to tune the caching algorithm to accomodate this.
|
| While discussing disks, I have a minor complaint: at least on IDE
| systems, when doing something like an untar, the entire system is
| painfully unresponsive, even though CPU load is low. I presume this
| is because when an executable is run, it needs to sit and wait for
| the disk. Wouldn't it make sense to give very high disk priority to
| executables? Isn't that worth the extra seeks?
|
| sh
|
|
| To Unsubscribe: send mail to [EMAIL PROTECTED]
| with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Ashutosh S. Rajekar
On Sun, 26 Jan 2003, Sean Hamilton wrote:
> 
> In my case I have a webserver serving up a few dozen files of about 10 MB
> each. While yes it is true that I could purchase more memory, and I could
> purchase more drives and stripe them, I am more interested in the fact that
> this server is constantly grinding away because it has found a weakness in
> the caching algorithm.
> 
> After further thought, I propose something much simpler: when the kernel is
> hinted that access will be sequential, it should stop caching when there is
> little cache space available, instead of throwing away old blocks, or be
> much more hesitant to throw away old blocks. Consider that in almost all
> cases where access is sequential, as reading continues, the chances of the
> read being aborted increase: ie, users downloading files, directory tree
> traversal, etc. Since the likelihood of the first byte reading the first
> byte is very high, and the next one less high, and the next less yet, etc,
> it seems to make sense to tune the caching algorithm to accomodate this.

Your case seems to be a highly specific one, and would require very 
specific tuning. And then one will be able to find some other "unwanted 
behaviour" once you tune your system for particular behaviour.

-ASR
http://www.crosswinds.net/~rajekar
MERYL STREEP is my obstetrician!


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Sean Hamilton
- Original Message -
From: "Tim Kientzle" <[EMAIL PROTECTED]>
| Cycling through large data sets is not really that uncommon.
| I do something like the following pretty regularly:
| find /usr/src -type f | xargs grep function_name
|
| Even scanning through a large dataset once can really hurt
| competing applications on the same machine by flushing
| their data from the cache for no gain.  I think this
| is where randomized expiration might really win, by reducing the
| penalty for disk-cache-friendly applications who are competing
| with disk-cache-unfriendly applications.

In my case I have a webserver serving up a few dozen files of about 10 MB
each. While yes it is true that I could purchase more memory, and I could
purchase more drives and stripe them, I am more interested in the fact that
this server is constantly grinding away because it has found a weakness in
the caching algorithm.

After further thought, I propose something much simpler: when the kernel is
hinted that access will be sequential, it should stop caching when there is
little cache space available, instead of throwing away old blocks, or be
much more hesitant to throw away old blocks. Consider that in almost all
cases where access is sequential, as reading continues, the chances of the
read being aborted increase: ie, users downloading files, directory tree
traversal, etc. Since the likelihood of the first byte reading the first
byte is very high, and the next one less high, and the next less yet, etc,
it seems to make sense to tune the caching algorithm to accomodate this.

While discussing disks, I have a minor complaint: at least on IDE systems,
when doing something like an untar, the entire system is painfully
unresponsive, even though CPU load is low. I presume this is because when an
executable is run, it needs to sit and wait for the disk. Wouldn't it make
sense to give very high disk priority to executables? Isn't that worth the
extra seeks?

sh


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Tim Kientzle
Sean Hamilton proposes:


Wouldn't it seem logical to have [randomized disk cache expiration] in
place at all times?


Terry Lambert responds:


:I really dislike the idea of random expiration; I don't understand
:the point, unless you are trying to get better numbers on some


>>:benchmark.

Matt Dillon concedes:


... it's only useful when you are cycling through a [large] data set ...


Cycling through large data sets is not really that uncommon.
I do something like the following pretty regularly:
   find /usr/src -type f | xargs grep function_name

Even scanning through a large dataset once can really hurt
competing applications on the same machine by flushing
their data from the cache for no gain.  I think this
is where randomized expiration might really win, by reducing the
penalty for disk-cache-friendly applications who are competing
with disk-cache-unfriendly applications.

There's an extensive literature on randomized algorithms.
Although I'm certainly no expert, I understand that such
algorithms work very well in exactly this sort of application,
since they "usually" avoid worst-case behavior under a broad
variety of inputs.  The current cache is, in essence,
tuned specifically to work badly on a system where applications
are scanning through large amounts of data.  No matter what
deterministic caching algorithm you use, you're choosing
to behave badly under some situation.


Personally, I think there's a lot of merit to _trying_

randomized disk cache expiry and seeing how it works in practice.
(I would also observe here that 5.0 now has a fast, high-quality
source of randomness that seems ideal for exactly such
applications.)  I don't believe that it would _prevent_ applications
from using optimizations such as those that Terry suggests,
while possibly providing reasonable performance under a
broader range of scenarios than are currently supported.

Sounds like a good idea to me.

Tim Kientzle


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Terry Lambert
Matthew Dillon wrote:
> :I really dislike the idea of random expiration; I don't understand
> :the point, unless you are trying to get better numbers on some
> :..
> 
>Well, the basic scenario is something like this:  Lets say you have
>512MB of ram and you are reading a 1GB file sequentially, over and over
>again.  The cache scenario that gives you the greatest performance is
>basically to keep 512MB worth of the file in memory and throw away the
>rest as you read it.

You don't need random expiration for this; in fact, random
expiration will give you 50% worse performance than you expect.

If you are doing this, the proper thing is to map the file window
you want to keep seperately from the rest of the file, or, if it's
small enough you can mmap the whole thing, then to madvise the part
of the file you want to not cache as "sequential", and drop the
cache.  That this is not well supported is a bug in FreeBSD (IMO).


>This is effectively what random disk cache expiration gives you (assuming
>you expire whole tracks worth of data, not just random pages).  i.e. it's
>only useful when you are cycling through a data set that is larger then
>main memory over and over again.

And not very, there.  Because you destroy locality of reference -- or,
as in this case, you assume that there is no locality in the first
place.

I think it works at 50% of the efficiency of the madvise that I
suggest, above, which, at least, gets hits on most of the file,
without the risk of page replacement of the cached area of the
file that you intend to read subsequently (25% hit rate in the
hittable area, assuming this, since page replacement is random,
if the replacement range is 50% of the size of the data set...
i.e.: numbers in real life will vary downward, the larger the
data set).


>Most people deal with this issue simply by beefing up their disk
>subsystem.  After all, you don't have to stripe very many disks (only
>2 or 3) to get 100MB/sec+ in throughput from the disk subsystem.

I think most people don't have this problem, because this is a
very degenerate case, which requires either a serindipitously
incredibly bad design, or intential screwing of caching, in order
to do something like benchmark the disk I/O subsystem seperately
from the case (for legitimate cases) or to "prove" that caching
is not useful because you have an axe to grind (e.g. the proxy
cache "benchmark" suite that does this because the author is a
mathematician that has developed a pessimizing hueristic because
he is philosophically opposed to proxy caching, due to his own
narrow interpretation of "end to end guarantees").

In general, this is not a useful case to try to optimize, and the
people who try to optimize this case are doing it to cheat on a
benchmark, not because there's any treal-world application that
requires it.  8-(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Matthew Dillon

:I really dislike the idea of random expiration; I don't understand
:the point, unless you are trying to get better numbers on some
:..

   Well, the basic scenario is something like this:  Lets say you have
   512MB of ram and you are reading a 1GB file sequentially, over and over
   again.  The cache scenario that gives you the greatest performance is
   basically to keep 512MB worth of the file in memory and throw away the
   rest as you read it.

   This is effectively what random disk cache expiration gives you (assuming
   you expire whole tracks worth of data, not just random pages).  i.e. it's
   only useful when you are cycling through a data set that is larger then
   main memory over and over again.

   Most people deal with this issue simply by beefing up their disk 
   subsystem.  After all, you don't have to stripe very many disks (only
   2 or 3) to get 100MB/sec+ in throughput from the disk subsystem.

-Matt

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Terry Lambert
Matthew Dillon wrote:
> Hi Sean.  I've wanted to have a random-disk-cache-expiration feature
> for a long time.  We do not have one now.  We do have mechanisms in
> place to reduce the impact of sequential cycling a large dataset so
> it does not totally destroy unrelated cached data.

I really dislike the idea of random expiration; I don't understand
the point, unless you are trying to get better numbers on some
benchmark that intentionally determines the cache size, and then
chooses to use a dataset larger than that, to factor out caching
and/or make bogus representations about the value of caching in
the first place.  There are a number of well-known proxy cache
benchmarks that do this, because the person who wrote them wanted
to advocate against the use of proxy caches entirely.

A number of proxy cache vendors use random page replacement to
get around the benchmark tuning itself to get the worst possible
performance when using a proxy cache; with random replacement,
you maintain 25% of your workingselt, instead of 0% with
intentional overflow from the benchmark (or 100%, if you are not
doing bogus things to try and advocate against cahe vendors).


> Due to the way our page queues work, it's not an easy problem to solve.

It's actually pretty trivial, if you are willing to do it on a
per-vnode basis, rather than a per-open instance basis.  This is
called a "working set quota" on VMS, and the code to do this has
been around there for over 20 years now.

The way you do this is to keep a count of pages associated with a
given vnode, and if you are over the quota, and want more pages,
to steal from the end of the per-vnode LRU, rather than going to
a system wide LRU, and competing with other files.

This assumes that you are not ignoring the value of locality of
reference, and hacking the code to make some bogus benghmark run
faster, and are instead replacing pages for performance reasons
-- real performance reasons, not fake ones.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Random disk cache expiry

2003-01-26 Thread Matthew Dillon

:Greetings,
:
:I have a situation where I am reading large quantities of data from disk
:sequentially. The problem is that as the data is read, the oldest cached
:blocks are thrown away in favor of new ones. When I start re-reading data
:from the beginning, it has to read the entire file from disk again. Is there
:some sort of sysctl which could be changed to induce a more random expiry of
:cached disk blocks? Wouldn't it seem logical to have something like this in
:place at all times?
:
:thanks,
:
:sh

Hi Sean.  I've wanted to have a random-disk-cache-expiration feature
for a long time.  We do not have one now.  We do have mechanisms in
place to reduce the impact of sequential cycling a large dataset so
it does not totally destroy unrelated cached data.

Due to the way our page queues work, it's not an easy problem to solve.

You might be able to simulate more proactive cache control by using
O_DIRECT reads for some of the data, and normal reads for the rest of
the data (see the 'fcntl' manual page).  But it might be better simply
to purchase more memory, purchase a faster hard drive, or stripe two
hard drives together.  HDs these days can do 25-50MB/s each so striping
two together should yield 50-100MB/s of sequential throughput.  See
the 'STRIPING DISKS' section in 'man tuning'.

-Matt
Matthew Dillon 
<[EMAIL PROTECTED]>

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message