Re: Random disk cache expiry

Matthew Dillon Thu, 30 Jan 2003 14:22:30 -0800

    Well, here's a counterpoint.  Lets say you have an FTP
    server with 1G of ram full of, say, pirated CDs at 600MB a
    pop.


    Now lets say someone puts up a new madonna CD and suddenly
    you have thousands of people from all over the world trying
    to download a single 600MB file.

    Lets try another one.  Lets say you have an FTP server with
    1G of ram full of hundreds of MPEG encoded pirated CDs at
    50MB a pop and you have thousands of people from all over the
    world trying to download a core set of 25 CDs, which exceeds
    the available ram you have to cache all of them.

    What I'm trying to illustrate here is the impossibility of
    what you are asking.  Your idea of 'sequential' access cache
    restriction only works if there is just one process doing the
    accessing.  But if you have 25 processes accessing 25 different files 
    sequentially it doesn't work, and how is the system supposed
    to detect the difference between 25 processes accessing 25
    50MB files on a 1G machine (which doesn't fit in the cache)
    verses 300 processes accessing 15 50MB files on a 1G machine
    (which does fit).  Furthermore, how do you differentiate
    between 30 processes all downloading the same 600MB CD verses
    30 processes downloading two different 600MB CD's, on a machine 
    with 1G of cache?

    You can't.  That's the problem.  There is no magic number between
    0 and the amount of memory you have where you can say "I am going
    to stop caching this sequential file" that covers even the more
    common situations that come up.  There is no algorithm that can
    detect the above situations before the fact or on the fly.  You
    can analyize the situation after the fact, but by then it is too late,
    and the situation may change from minute to minute.  One minute you
    have 300 people trying to download one CD, the next minute you have
    20 people trying to download 10 different CD's.

                                                -Matt

:The suggestion here basically boils down to this: if the system could 
:act on hints that somebody will be doing sequential access, then it 
:should be more timid about caching for that file access.  That is to 
:say, it should allow that file to "use up" a smaller number of blocks 
:from the cache (yes, the VM) at a time, and it should favor, if 
:anything, a LIFO scheme instead of the usual FIFO (LRU) scheme.  (That 
:is to say, for the special case of *sequential* access, LRU == FIFO, 
:and yet LIFO is probably more optimal for this case, at least if the 
:file will be re-read later.)
:
:Caching will do more good on files that that will be randomly accessed; 
:an intermediate amount of good on files sequentially accessed but 
:rewound and/or accessed over and over, and if the file system could 
:somehow know (or be hinted) that the file is being sequentially 
:accessed and is unlikely to be accessed again for a good long while it 
:would clearly be better off not caching it at all.
:
:Of course the trick here is waving my hands and saying "assume that you 
:know how the file will be accessed in the future."  You ought to 
:pillory me for *that* bit.  Even with hinting there are problems with 
:this whole idea.  Still with some hinting the algorithm could probably 
:be a little more clever.
:
:(Actually, Terry Lambert *did* pillory me for that bit, just a bit, when 
:he pointed out the impossibility of knowing whether the file is being 
:used in the same way by other processes.)
:
:And . . . also to Terry, yes, I know that my proposal about 
:over-simplifies, but the point is that for sequential access you want 
:to go "gentle" on making the cache of other process' and earlier reads 
:go away.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: Random disk cache expiry

Reply via email to